Search results for: microarray data analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 13341

Search results for: microarray data analysis

13161 Wind Speed Data Analysis using Wavelet Transform

Authors: S. Avdakovic, A. Lukac, A. Nuhanovic, M. Music

Abstract:

Renewable energy systems are becoming a topic of great interest and investment in the world. In recent years wind power generation has experienced a very fast development in the whole world. For planning and successful implementations of good wind power plant projects, wind potential measurements are required. In these projects, of great importance is the effective choice of the micro location for wind potential measurements, installation of the measurement station with the appropriate measuring equipment, its maintenance and analysis of the gained data on wind potential characteristics. In this paper, a wavelet transform has been applied to analyze the wind speed data in the context of insight in the characteristics of the wind and the selection of suitable locations that could be the subject of a wind farm construction. This approach shows that it can be a useful tool in investigation of wind potential.

Keywords: Wind potential, Wind speed data, Wavelettransform.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2577
13160 Student Satisfaction Data for Work Based Learners

Authors: Rosie Borup, Hanifa Shah

Abstract:

This paper aims to describe how student satisfaction is measured for work-based learners as these are non-traditional learners, conducting academic learning in the workplace, typically their curricula have a high degree of negotiation, and whose motivations are directly related to their employers- needs, as well as their own career ambitions. We argue that while increasing WBL participation, and use of SSD are both accepted as being of strategic importance to the HE agenda, the use of WBL SSD is rarely examined, and lessons can be learned from the comparison of SSD from a range of WBL programmes, and increased visibility of this type of data will provide insight into ways to improve and develop this type of delivery. The key themes that emerged from the analysis of the interview data were: learners profiles and needs, employers drivers, academic staff drivers, organizational approach, tools for collecting data and visibility of findings. The paper concludes with observations on best practice in the collection, analysis and use of WBL SSD, thus offering recommendations for both academic managers and practitioners.

Keywords: Student satisfaction data, work based learning, employer engagement, NSS.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1439
13159 Granularity Analysis for Spatio-Temporal Web Sensors

Authors: Shun Hattori

Abstract:

In recent years, many researches to mine the exploding Web world, especially User Generated Content (UGC) such as weblogs, for knowledge about various phenomena and events in the physical world have been done actively, and also Web services with the Web-mined knowledge have begun to be developed for the public. However, there are few detailed investigations on how accurately Web-mined data reflect physical-world data. It must be problematic to idolatrously utilize the Web-mined data in public Web services without ensuring their accuracy sufficiently. Therefore, this paper introduces the simplest Web Sensor and spatiotemporallynormalized Web Sensor to extract spatiotemporal data about a target phenomenon from weblogs searched by keyword(s) representing the target phenomenon, and tries to validate the potential and reliability of the Web-sensed spatiotemporal data by four kinds of granularity analyses of coefficient correlation with temperature, rainfall, snowfall, and earthquake statistics per day by region of Japan Meteorological Agency as physical-world data: spatial granularity (region-s population density), temporal granularity (time period, e.g., per day vs. per week), representation granularity (e.g., “rain" vs. “heavy rain"), and media granularity (weblogs vs. microblogs such as Tweets).

Keywords: Granularity analysis, knowledge extraction, spatiotemporal data mining, Web credibility, Web mining, Web sensor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1829
13158 Data Transformation Services (DTS): Creating Data Mart by Consolidating Multi-Source Enterprise Operational Data

Authors: J. D. D. Daniel, K. N. Goh, S. M. Yusop

Abstract:

Trends in business intelligence, e-commerce and remote access make it necessary and practical to store data in different ways on multiple systems with different operating systems. As business evolve and grow, they require efficient computerized solution to perform data update and to access data from diverse enterprise business applications. The objective of this paper is to demonstrate the capability of DTS [1] as a database solution for automatic data transfer and update in solving business problem. This DTS package is developed for the sales of variety of plants and eventually expanded into commercial supply and landscaping business. Dimension data modeling is used in DTS package to extract, transform and load data from heterogeneous database systems such as MySQL, Microsoft Access and Oracle that consolidates into a Data Mart residing in SQL Server. Hence, the data transfer from various databases is scheduled to run automatically every quarter of the year to review the efficient sales analysis. Therefore, DTS is absolutely an attractive solution for automatic data transfer and update which meeting today-s business needs.

Keywords: Data Transformation Services (DTS), ObjectLinking and Embedding Database (OLEDB), Data Mart, OnlineAnalytical Processing (OLAP), Online Transactional Processing(OLTP).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1969
13157 A Modified AES Based Algorithm for Image Encryption

Authors: M. Zeghid, M. Machhout, L. Khriji, A. Baganne, R. Tourki

Abstract:

With the fast evolution of digital data exchange, security information becomes much important in data storage and transmission. Due to the increasing use of images in industrial process, it is essential to protect the confidential image data from unauthorized access. In this paper, we analyze the Advanced Encryption Standard (AES), and we add a key stream generator (A5/1, W7) to AES to ensure improving the encryption performance; mainly for images characterised by reduced entropy. The implementation of both techniques has been realized for experimental purposes. Detailed results in terms of security analysis and implementation are given. Comparative study with traditional encryption algorithms is shown the superiority of the modified algorithm.

Keywords: Cryptography, Encryption, Advanced EncryptionStandard (AES), ECB mode, statistical analysis, key streamgenerator.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4973
13156 A Study of the Adaptive Reuse for School Land Use Strategy: An Application of the Analytic Network Process and Big Data

Authors: Wann-Ming Wey

Abstract:

In today's popularity and progress of information technology, the big data set and its analysis are no longer a major conundrum. Now, we could not only use the relevant big data to analysis and emulate the possible status of urban development in the near future, but also provide more comprehensive and reasonable policy implementation basis for government units or decision-makers via the analysis and emulation results as mentioned above. In this research, we set Taipei City as the research scope, and use the relevant big data variables (e.g., population, facility utilization and related social policy ratings) and Analytic Network Process (ANP) approach to implement in-depth research and discussion for the possible reduction of land use in primary and secondary schools of Taipei City. In addition to enhance the prosperous urban activities for the urban public facility utilization, the final results of this research could help improve the efficiency of urban land use in the future. Furthermore, the assessment model and research framework established in this research also provide a good reference for schools or other public facilities land use and adaptive reuse strategies in the future.

Keywords: Adaptive reuse, analytic network process, big data, land use strategy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 864
13155 A Review and Comparative Analysis on Cluster Ensemble Methods

Authors: S. Sarumathi, P. Ranjetha, C. Saraswathy, M. Vaishnavi, S. Geetha

Abstract:

Clustering is an unsupervised learning technique for aggregating data objects into meaningful classes so that intra cluster similarity is maximized and inter cluster similarity is minimized in data mining. However, no single clustering algorithm proves to be the most effective in producing the best result. As a result, a new challenging technique known as the cluster ensemble approach has blossomed in order to determine the solution to this problem. For the cluster analysis issue, this new technique is a successful approach. The cluster ensemble's main goal is to combine similar clustering solutions in a way that achieves the precision while also improving the quality of individual data clustering. Because of the massive and rapid creation of new approaches in the field of data mining, the ongoing interest in inventing novel algorithms necessitates a thorough examination of current techniques and future innovation. This paper presents a comparative analysis of various cluster ensemble approaches, including their methodologies, formal working process, and standard accuracy and error rates. As a result, the society of clustering practitioners will benefit from this exploratory and clear research, which will aid in determining the most appropriate solution to the problem at hand.

Keywords: Clustering, cluster ensemble methods, consensus function, data mining, unsupervised learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 723
13154 Multidimensional Performance Management

Authors: David Wiese

Abstract:

In order to maximize efficiency of an information management platform and to assist in decision making, the collection, storage and analysis of performance-relevant data has become of fundamental importance. This paper addresses the merits and drawbacks provided by the OLAP paradigm for efficiently navigating large volumes of performance measurement data hierarchically. The system managers or database administrators navigate through adequately (re)structured measurement data aiming to detect performance bottlenecks, identify causes for performance problems or assessing the impact of configuration changes on the system and its representative metrics. Of particular importance is finding the root cause of an imminent problem, threatening availability and performance of an information system. Leveraging OLAP techniques, in contrast to traditional static reporting, this is supposed to be accomplished within moderate amount of time and little processing complexity. It is shown how OLAP techniques can help improve understandability and manageability of measurement data and, hence, improve the whole Performance Analysis process.

Keywords: Data Warehousing, OLAP, Multidimensional Navigation, Performance Diagnosis, Performance Management, Performance Tuning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2076
13153 Sensitivity Analysis in Power Systems Reliability Evaluation

Authors: A.R Alesaadi, M. Nafar, A.H. Gheisari

Abstract:

In this paper sensitivity analysis is performed for reliability evaluation of power systems. When examining the reliability of a system, it is useful to recognize how results change as component parameters are varied. This knowledge helps engineers to understand the impact of poor data, and gives insight on how reliability can be improved. For these reasons, a sensitivity analysis can be performed. Finally, a real network was used for testing the presented method.

Keywords: sensitivity analysis, reliability evaluation, powersystems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2215
13152 Evaluating Efficiency of Nina Distribution Company Using Window Data Envelopment Analysis and Malmquist Index

Authors: Hossein Taherian Far, Ali Bazaee

Abstract:

Achieving continuous sustained economic growth and following economic development can be the target for all countries which are looking for it. In this regard, distribution industry plays an important role in growth and development of any nation. So, estimating the efficiency and productivity of the so called industry and identifying factors influencing it, is very necessary. The objective of the present study is to measure the efficiency and productivity of seven branches of Nina Distribution Company using window data envelopment analysis and Malmquist productivity index from spring 2013 to summer 2015. In this study, using criteria of fixed assets, payroll personnel, operating costs and duration of collection of receivables were selected as inputs and people and net sales, gross profit and percentage of coverage to customers were selected as outputs. Then, the process of performance window data envelopment analysis was driven and process efficiency has been measured using Malmquist index. The results indicate that the average technical efficiency of window Data Envelopment Analysis (DEA) model and fluctuating trend is sustainable. But the average management efficiency in window DEA model is related with negative growth (decline) of about 13%. The mean scale efficiency in all windows, except in the second one which is faced with 8%, shows growth of 18% compared to the first window. On the other hand, the mean change in total factor productivity in all branches of the industry shows average negative growth (decrease) of 12% which are the result of a negative change in technology.

Keywords: Nina Distribution Company branches, window data envelopment analysis, Malmquist productivity index.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1108
13151 Various Advanced Statistical Analyses of Index Values Extracted from Outdoor Agricultural Workers Motion Data

Authors: Shinji Kawakura, Ryosuke Shibasaki

Abstract:

We have been grouping and developing various kinds of practical, promising sensing applied systems concerning agricultural advancement and technical tradition (guidance). These include advanced devices to secure real-time data related to worker motion, and we analyze by methods of various advanced statistics and human dynamics (e.g. primary component analysis, Ward system based cluster analysis, and mapping). What is more, we have been considering worker daily health and safety issues. Targeted fields are mainly common farms, meadows, and gardens. After then, we observed and discussed time-line style, changing data. And, we made some suggestions. The entire plan makes it possible to improve both the aforementioned applied systems and farms.

Keywords: Advanced statistical analysis, wearable sensing system, tradition of skill, supporting for workers, detecting crisis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1570
13150 Grocery Customer Behavior Analysis using RFID-based Shopping Paths Data

Authors: In-Chul Jung, Young S. Kwon

Abstract:

Knowing about the customer behavior in a grocery has been a long-standing issue in the retailing industry. The advent of RFID has made it easier to collect moving data for an individual shopper's behavior. Most of the previous studies used the traditional statistical clustering technique to find the major characteristics of customer behavior, especially shopping path. However, in using the clustering technique, due to various spatial constraints in the store, standard clustering methods are not feasible because moving data such as the shopping path should be adjusted in advance of the analysis, which is time-consuming and causes data distortion. To alleviate this problem, we propose a new approach to spatial pattern clustering based on the longest common subsequence. Experimental results using real data obtained from a grocery confirm the good performance of the proposed method in finding the hot spot, dead spot and major path patterns of customer movements.

Keywords: customer path, shopping behavior, exploratoryanalysis, LCS, RFID

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3074
13149 A Sociocybernetics Data Analysis Using Causality in Tourism Networks

Authors: M. Lloret-Climent, J. Nescolarde-Selva

Abstract:

The aim of this paper is to propose a mathematical model to determine invariant sets, set covering, orbits and, in particular, attractors in the set of tourism variables. Analysis was carried out based on a pre-designed algorithm and applying our interpretation of chaos theory developed in the context of General Systems Theory. This article sets out the causal relationships associated with tourist flows in order to enable the formulation of appropriate strategies. Our results can be applied to numerous cases. For example, in the analysis of tourist flows, these findings can be used to determine whether the behaviour of certain groups affects that of other groups and to analyse tourist behaviour in terms of the most relevant variables. Unlike statistical analyses that merely provide information on current data, our method uses orbit analysis to forecast, if attractors are found, the behaviour of tourist variables in the immediate future.

Keywords: Attractor, invariant set, orbits, tourist variables.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1698
13148 Developing Structured Sizing Systems for Manufacturing Ready-Made Garments of Indian Females Using Decision Tree-Based Data Mining

Authors: Hina Kausher, Sangita Srivastava

Abstract:

In India, there is a lack of standard, systematic sizing approach for producing readymade garments. Garments manufacturing companies use their own created size tables by modifying international sizing charts of ready-made garments. The purpose of this study is to tabulate the anthropometric data which cover the variety of figure proportions in both height and girth. 3,000 data have been collected by an anthropometric survey undertaken over females between the ages of 16 to 80 years from the some states of India to produce the sizing system suitable for clothing manufacture and retailing. The data are used for the statistical analysis of body measurements, the formulation of sizing systems and body measurements tables. Factor analysis technique is used to filter the control body dimensions from the large number of variables. Decision tree-based data mining is used to cluster the data. The standard and structured sizing system can facilitate pattern grading and garment production. Moreover, it can exceed buying ratios and upgrade size allocations to retail segments.

Keywords: Anthropometric data, data mining, decision tree, garments manufacturing, ready-made garments, sizing systems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 862
13147 Flexible Sensor Array with Programmable Measurement System

Authors: Jung-Chuan Chou, Wei-Chuan Chen, Chien-Cheng Chen

Abstract:

This study is concerned with pH solution detection using 2 × 4 flexible sensor array based on a plastic polyethylene terephthalate (PET) substrate that is coated a conductive layer and a ruthenium dioxide (RuO2) sensitive membrane with the technologies of screen-printing and RF sputtering. For data analysis, we also prepared a dynamic measurement system for acquiring the response voltage and analyzing the characteristics of the working electrodes (WEs), such as sensitivity and linearity. In this condition, an array measurement system was designed to acquire the original signal from sensor array, and it is based on the method of digital signal processing (DSP). The DSP modifies the unstable acquisition data to a direct current (DC) output using the technique of digital filter. Hence, this sensor array can obtain a satisfactory yield, 62.5%, through the design measurement and analysis system in our laboratory.

Keywords: Flexible sensor array, PET, RuO2, dynamic measurement, data analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1443
13146 Peakwise Smoothing of Data Models using Wavelets

Authors: D Sudheer Reddy, N Gopal Reddy, P V Radhadevi, J Saibaba, Geeta Varadan

Abstract:

Smoothing or filtering of data is first preprocessing step for noise suppression in many applications involving data analysis. Moving average is the most popular method of smoothing the data, generalization of this led to the development of Savitzky-Golay filter. Many window smoothing methods were developed by convolving the data with different window functions for different applications; most widely used window functions are Gaussian or Kaiser. Function approximation of the data by polynomial regression or Fourier expansion or wavelet expansion also gives a smoothed data. Wavelets also smooth the data to great extent by thresholding the wavelet coefficients. Almost all smoothing methods destroys the peaks and flatten them when the support of the window is increased. In certain applications it is desirable to retain peaks while smoothing the data as much as possible. In this paper we present a methodology called as peak-wise smoothing that will smooth the data to any desired level without losing the major peak features.

Keywords: smoothing, moving average, peakwise smoothing, spatialdensity models, planar shape models, wavelets.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1701
13145 Design and Analysis of Gauge R&R Studies: Making Decisions Based on ANOVA Method

Authors: Afrooz Moatari Kazerouni

Abstract:

In a competitive production environment, critical decision making are based on data resulted by random sampling of product units. Efficiency of these decisions depends on data quality and also their reliability scale. This point leads to the necessity of a reliable measurement system. Therefore, the conjecture process and analysing the errors contributes to a measurement system known as Measurement System Analysis (MSA). The aim of this research is on determining the necessity and assurance of extensive development in analysing measurement systems, particularly with the use of Repeatability and Reproducibility Gages (GR&R) to improve physical measurements. Nowadays in productive industries, repeatability and reproducibility gages released so well but they are not applicable as well as other measurement system analysis methods. To get familiar with this method and gain a feedback in improving measurement systems, this survey would be on “ANOVA" method as the most widespread way of calculating Repeatability and Reproducibility (R&R).

Keywords: Analysis of Variance (ANOVA), MeasurementSystem Analysis (MSA), Part-Operator interaction effect, Repeatability and Reproducibility.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4611
13144 Growing Self Organising Map Based Exploratory Analysis of Text Data

Authors: Sumith Matharage, Damminda Alahakoon

Abstract:

Textual data plays an important role in the modern world. The possibilities of applying data mining techniques to uncover hidden information present in large volumes of text collections is immense. The Growing Self Organizing Map (GSOM) is a highly successful member of the Self Organising Map family and has been used as a clustering and visualisation tool across wide range of disciplines to discover hidden patterns present in the data. A comprehensive analysis of the GSOM’s capabilities as a text clustering and visualisation tool has so far not been published. These functionalities, namely map visualisation capabilities, automatic cluster identification and hierarchical clustering capabilities are presented in this paper and are further demonstrated with experiments on a benchmark text corpus.

Keywords: Text Clustering, Growing Self Organizing Map, Automatic Cluster Identification, Hierarchical Clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1892
13143 Data Preprocessing for Supervised Leaning

Authors: S. B. Kotsiantis, D. Kanellopoulos, P. E. Pintelas

Abstract:

Many factors affect the success of Machine Learning (ML) on a given task. The representation and quality of the instance data is first and foremost. If there is much irrelevant and redundant information present or noisy and unreliable data, then knowledge discovery during the training phase is more difficult. It is well known that data preparation and filtering steps take considerable amount of processing time in ML problems. Data pre-processing includes data cleaning, normalization, transformation, feature extraction and selection, etc. The product of data pre-processing is the final training set. It would be nice if a single sequence of data pre-processing algorithms had the best performance for each data set but this is not happened. Thus, we present the most well know algorithms for each step of data pre-processing so that one achieves the best performance for their data set.

Keywords: Data mining, feature selection, data cleaning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5896
13142 Enhanced Data Access Control of Cooperative Environment used for DMU Based Design

Authors: Wei Lifan, Zhang Huaiyu, Yang Yunbin, Li Jia

Abstract:

Through the analysis of the process digital design based on digital mockup, the fact indicates that a distributed cooperative supporting environment is the foundation conditions to adopt design approach based on DMU. Data access authorization is concerned firstly because the value and sensitivity of the data for the enterprise. The access control for administrators is often rather weak other than business user. So authors established an enhanced system to avoid the administrators accessing the engineering data by potential approach and without authorization. Thus the data security is improved.

Keywords: access control, DMU, PLM, virtual prototype.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1416
13141 EDULOGIC+ - Knowledge Management through Data Analysis in Education

Authors: Alok Sharma, Dr. Harvinder S. Saini, Raviteja Tiruvury

Abstract:

This paper outlines the application of Knowledge Management (KM) principles in the context of Educational institutions. The paper caters to the needs of the engineering institutions for imparting quality education by delineating the instruction delivery process in a highly structured, controlled and quantified manner. This is done using a software tool EDULOGIC+. The central idea has been based on the engineering education pattern in Indian Universities/ Institutions. The data, contents and results produced over contiguous years build the necessary ground for managing the related accumulated knowledge. Application of KM has been explained using certain examples of data analysis and knowledge extraction.

Keywords: Education software system, information system, knowledge management.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1700
13140 An Improved Preprocessing for Biosonar Target Classification

Authors: Turgay Temel, John Hallam

Abstract:

An improved processing description to be employed in biosonar signal processing in a cochlea model is proposed and examined. It is compared to conventional models using a modified discrimination analysis and both are tested. Their performances are evaluated with echo data captured from natural targets (trees).Results indicate that the phase characteristics of low-pass filters employed in the echo processing have a significant effect on class separability for this data.

Keywords: Cochlea model, discriminant analysis, neurospikecoding, classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1440
13139 The Classification Performance in Parametric and Nonparametric Discriminant Analysis for a Class- Unbalanced Data of Diabetes Risk Groups

Authors: Lily Ingsrisawang, Tasanee Nacharoen

Abstract:

The problems arising from unbalanced data sets generally appear in real world applications. Due to unequal class distribution, many researchers have found that the performance of existing classifiers tends to be biased towards the majority class. The k-nearest neighbors’ nonparametric discriminant analysis is a method that was proposed for classifying unbalanced classes with good performance. In this study, the methods of discriminant analysis are of interest in investigating misclassification error rates for classimbalanced data of three diabetes risk groups. The purpose of this study was to compare the classification performance between parametric discriminant analysis and nonparametric discriminant analysis in a three-class classification of class-imbalanced data of diabetes risk groups. Data from a project maintaining healthy conditions for 599 employees of a government hospital in Bangkok were obtained for the classification problem. The employees were divided into three diabetes risk groups: non-risk (90%), risk (5%), and diabetic (5%). The original data including the variables of diabetes risk group, age, gender, blood glucose, and BMI were analyzed and bootstrapped for 50 and 100 samples, 599 observations per sample, for additional estimation of the misclassification error rate. Each data set was explored for the departure of multivariate normality and the equality of covariance matrices of the three risk groups. Both the original data and the bootstrap samples showed nonnormality and unequal covariance matrices. The parametric linear discriminant function, quadratic discriminant function, and the nonparametric k-nearest neighbors’ discriminant function were performed over 50 and 100 bootstrap samples and applied to the original data. Searching the optimal classification rule, the choices of prior probabilities were set up for both equal proportions (0.33: 0.33: 0.33) and unequal proportions of (0.90:0.05:0.05), (0.80: 0.10: 0.10) and (0.70, 0.15, 0.15). The results from 50 and 100 bootstrap samples indicated that the k-nearest neighbors approach when k=3 or k=4 and the defined prior probabilities of non-risk: risk: diabetic as 0.90: 0.05:0.05 or 0.80:0.10:0.10 gave the smallest error rate of misclassification. The k-nearest neighbors approach would be suggested for classifying a three-class-imbalanced data of diabetes risk groups.

Keywords: Bootstrap, diabetes risk groups, error rate, k-nearest neighbors.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1954
13138 Time Series Simulation by Conditional Generative Adversarial Net

Authors: Rao Fu, Jie Chen, Shutian Zeng, Yiping Zhuang, Agus Sudjianto

Abstract:

Generative Adversarial Net (GAN) has proved to be a powerful machine learning tool in image data analysis and generation. In this paper, we propose to use Conditional Generative Adversarial Net (CGAN) to learn and simulate time series data. The conditions include both categorical and continuous variables with different auxiliary information. Our simulation studies show that CGAN has the capability to learn different types of normal and heavy-tailed distributions, as well as dependent structures of different time series. It also has the capability to generate conditional predictive distributions consistent with training data distributions. We also provide an in-depth discussion on the rationale behind GAN and the neural networks as hierarchical splines to establish a clear connection with existing statistical methods of distribution generation. In practice, CGAN has a wide range of applications in market risk and counterparty risk analysis: it can be applied to learn historical data and generate scenarios for the calculation of Value-at-Risk (VaR) and Expected Shortfall (ES), and it can also predict the movement of the market risk factors. We present a real data analysis including a backtesting to demonstrate that CGAN can outperform Historical Simulation (HS), a popular method in market risk analysis to calculate VaR. CGAN can also be applied in economic time series modeling and forecasting. In this regard, we have included an example of hypothetical shock analysis for economic models and the generation of potential CCAR scenarios by CGAN at the end of the paper.

Keywords: Conditional Generative Adversarial Net, market and credit risk management, neural network, time series.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1103
13137 Applications of Big Data in Education

Authors: Faisal Kalota

Abstract:

Big Data and analytics have gained a huge momentum in recent years. Big Data feeds into the field of Learning Analytics (LA) that may allow academic institutions to better understand the learners’ needs and proactively address them. Hence, it is important to have an understanding of Big Data and its applications. The purpose of this descriptive paper is to provide an overview of Big Data, the technologies used in Big Data, and some of the applications of Big Data in education. Additionally, it discusses some of the concerns related to Big Data and current research trends. While Big Data can provide big benefits, it is important that institutions understand their own needs, infrastructure, resources, and limitation before jumping on the Big Data bandwagon.

Keywords: Analytics, Big Data in Education, Hadoop, Learning Analytics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4806
13136 AniMoveMineR: Animal Behavior Exploratory Analysis Using Association Rules Mining

Authors: Suelane Garcia Fontes, Silvio Luiz Stanzani, Pedro L. Pizzigatti Corrła Ronaldo G. Morato

Abstract:

Environmental changes and major natural disasters are most prevalent in the world due to the damage that humanity has caused to nature and these damages directly affect the lives of animals. Thus, the study of animal behavior and their interactions with the environment can provide knowledge that guides researchers and public agencies in preservation and conservation actions. Exploratory analysis of animal movement can determine the patterns of animal behavior and with technological advances the ability of animals to be tracked and, consequently, behavioral studies have been expanded. There is a lot of research on animal movement and behavior, but we note that a proposal that combines resources and allows for exploratory analysis of animal movement and provide statistical measures on individual animal behavior and its interaction with the environment is missing. The contribution of this paper is to present the framework AniMoveMineR, a unified solution that aggregates trajectory analysis and data mining techniques to explore animal movement data and provide a first step in responding questions about the animal individual behavior and their interactions with other animals over time and space. We evaluated the framework through the use of monitored jaguar data in the city of Miranda Pantanal, Brazil, in order to verify if the use of AniMoveMineR allows to identify the interaction level between these jaguars. The results were positive and provided indications about the individual behavior of jaguars and about which jaguars have the highest or lowest correlation.

Keywords: Data mining, data science, trajectory, animal behavior.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 815
13135 Representing Data without Lost Compression Properties in Time Series: A Review

Authors: Nabilah Filzah Mohd Radzuan, Zalinda Othman, Azuraliza Abu Bakar, Abdul Razak Hamdan

Abstract:

Uncertain data is believed to be an important issue in building up a prediction model. The main objective in the time series uncertainty analysis is to formulate uncertain data in order to gain knowledge and fit low dimensional model prior to a prediction task. This paper discusses the performance of a number of techniques in dealing with uncertain data specifically those which solve uncertain data condition by minimizing the loss of compression properties.

Keywords: Compression properties, uncertainty, uncertain time series, mining technique, weather prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1566
13134 Are XBRL-based Financial Reports Better than Non-XBRL Reports? A Quality Assessment

Authors: Zhenkun Wang, Simon S. Gao

Abstract:

Using a scoring system, this paper provides a comparative assessment of the quality of data between XBRL formatted financial reports and non-XBRL financial reports. It shows a major improvement in the quality of data of XBRL formatted financial reports. Although XBRL formatted financial reports do not show much advantage in the quality at the beginning, XBRL financial reports lately display a large improvement in the quality of data in almost all aspects. With the improved XBRL web data managing, presentation and analysis applications, XBRL formatted financial reports have a much better accessibility, are more accurate and better in timeliness.

Keywords: Data Quality; Financial Report; Information; XBRL

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2493
13133 Research of Data Cleaning Methods Based on Dependency Rules

Authors: Yang Bao, Shi Wei Deng, Wang Qun Lin

Abstract:

This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSql), and gives 6 data cleaning methods based on these algorithms.

Keywords: Data cleaning, dependency rules, violation data discovery, data repair.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2555
13132 An Automatic Tool for Checking Consistency between Data Flow Diagrams (DFDs)

Authors: Rosziati Ibrahim, Siow Yen Yen

Abstract:

System development life cycle (SDLC) is a process uses during the development of any system. SDLC consists of four main phases: analysis, design, implement and testing. During analysis phase, context diagram and data flow diagrams are used to produce the process model of a system. A consistency of the context diagram to lower-level data flow diagrams is very important in smoothing up developing process of a system. However, manual consistency check from context diagram to lower-level data flow diagrams by using a checklist is time-consuming process. At the same time, the limitation of human ability to validate the errors is one of the factors that influence the correctness and balancing of the diagrams. This paper presents a tool that automates the consistency check between Data Flow Diagrams (DFDs) based on the rules of DFDs. The tool serves two purposes: as an editor to draw the diagrams and as a checker to check the correctness of the diagrams drawn. The consistency check from context diagram to lower-level data flow diagrams is embedded inside the tool to overcome the manual checking problem.

Keywords: Data Flow Diagram, Context Diagram, ConsistencyCheck, Syntax and Semantic Rules

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3382