Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30737

Search results for: time series data mining

30737 Distributed Perceptually Important Point Identification for Time Series Data Mining

Authors: Tak-Chung Fu, Ying-Kit Hung, Fu-Lai Chung


In the field of time series data mining, the concept of the Perceptually Important Point (PIP) identification process is first introduced in 2001. This process originally works for financial time series pattern matching and it is then found suitable for time series dimensionality reduction and representation. Its strength is on preserving the overall shape of the time series by identifying the salient points in it. With the rise of Big Data, time series data contributes a major proportion, especially on the data which generates by sensors in the Internet of Things (IoT) environment. According to the nature of PIP identification and the successful cases, it is worth to further explore the opportunity to apply PIP in time series ‘Big Data’. However, the performance of PIP identification is always considered as the limitation when dealing with ‘Big’ time series data. In this paper, two distributed versions of PIP identification based on the Specialized Binary (SB) Tree are proposed. The proposed approaches solve the bottleneck when running the PIP identification process in a standalone computer. Improvement in term of speed is obtained by the distributed versions.

Keywords: distributed computing, performance analysis, Perceptually Important Point identification, time series data mining

Procedia PDF Downloads 149
30736 Hierarchical Piecewise Linear Representation of Time Series Data

Authors: Vineetha Bettaiah, Heggere S. Ranganath


This paper presents a Hierarchical Piecewise Linear Approximation (HPLA) for the representation of time series data in which the time series is treated as a curve in the time-amplitude image space. The curve is partitioned into segments by choosing perceptually important points as break points. Each segment between adjacent break points is recursively partitioned into two segments at the best point or midpoint until the error between the approximating line and the original curve becomes less than a pre-specified threshold. The HPLA representation achieves dimensionality reduction while preserving prominent local features and general shape of time series. The representation permits course-fine processing at different levels of details, allows flexible definition of similarity based on mathematical measures or general time series shape, and supports time series data mining operations including query by content, clustering and classification based on whole or subsequence similarity.

Keywords: data mining, dimensionality reduction, piecewise linear representation, time series representation

Procedia PDF Downloads 191
30735 Representation Data without Lost Compression Properties in Time Series: A Review

Authors: Nabilah Filzah Mohd Radzuan, Zalinda Othman, Azuraliza Abu Bakar, Abdul Razak Hamdan


Uncertain data is believed to be an important issue in building up a prediction model. The main objective in the time series uncertainty analysis is to formulate uncertain data in order to gain knowledge and fit low dimensional model prior to a prediction task. This paper discusses the performance of a number of techniques in dealing with uncertain data specifically those which solve uncertain data condition by minimizing the loss of compression properties.

Keywords: compression properties, uncertainty, uncertain time series, mining technique, weather prediction

Procedia PDF Downloads 339
30734 A Method for Mining Information about Business Cycles from Stock Prices Data

Authors: Koki Kyo


In this paper, we propose a method for analyzing business cycles based on stock prices data. We take the Nikkei Stock Average (NSA) as a proxy of stock prices and use the Composite Index in Japan (CIJ) as an indicator for business cycles. We investigate the correlation of the CIJ with respect to a difference between two moving average series of the NSA, and introduce a lag parameter in the CIJ. Thus, we can determine an effective frequency interval for the differenced series by maximizing the correlation coefficient, and we can analyze the lead–lag relationship between the NSA and the CIJ based on the estimate of the lag. Moreover, we analyze the dynamic relationship between the differenced series of the NSA with the most effective frequency interval and the CIJ using a regression model that is constructed using a time-varying coefficient. As an empirical example, we analyze the daily time series of the NSA for closing values from January 1, 1991 to September 28, 2018, together with the monthly CIJ data over the same period.

Keywords: big data analysis, business cycles in Japan, daily stock price data, data mining, state-space model

Procedia PDF Downloads 17
30733 A Comprehensive Survey and Improvement to Existing Privacy Preserving Data Mining Techniques

Authors: Tosin Ige


Ethics must be a condition of the world, like logic. (Ludwig Wittgenstein, 1889-1951). As important as data mining is, it possess a significant threat to ethics, privacy, and legality, since data mining makes it difficult for an individual or consumer (in the case of a company) to control the accessibility and usage of his data. This research focuses on Current issues and the latest research and development on Privacy preserving data mining methods as at year 2022. It also discusses some advances in those techniques while at the same time highlighting and providing a new technique as a solution to an existing technique of privacy preserving data mining methods. This paper also bridges the wide gap between Data mining and the Web Application Programing Interface (web API), where research is urgently needed for an added layer of security in data mining while at the same time introducing a seamless and more efficient way of data mining.

Keywords: data, privacy, data mining, association rule, privacy preserving, mining technique

Procedia PDF Downloads 61
30732 Investigation on Performance of Change Point Algorithm in Time Series Dynamical Regimes and Effect of Data Characteristics

Authors: Farhad Asadi, Mohammad Javad Mollakazemi


In this paper, Bayesian online inference in models of data series are constructed by change-points algorithm, which separated the observed time series into independent series and study the change and variation of the regime of the data with related statistical characteristics. variation of statistical characteristics of time series data often represent separated phenomena in the some dynamical system, like a change in state of brain dynamical reflected in EEG signal data measurement or a change in important regime of data in many dynamical system. In this paper, prediction algorithm for studying change point location in some time series data is simulated. It is verified that pattern of proposed distribution of data has important factor on simpler and smother fluctuation of hazard rate parameter and also for better identification of change point locations. Finally, the conditions of how the time series distribution effect on factors in this approach are explained and validated with different time series databases for some dynamical system.

Keywords: time series, fluctuation in statistical characteristics, optimal learning, change-point algorithm

Procedia PDF Downloads 351
30731 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity

Authors: Hoda A. Abdel Hafez


Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.

Keywords: mining big data, big data, machine learning, telecommunication

Procedia PDF Downloads 295
30730 Time Series Regression with Meta-Clusters

Authors: Monika Chuchro


This paper presents a preliminary attempt to apply classification of time series using meta-clusters in order to improve the quality of regression models. In this case, clustering was performed as a method to obtain a subgroups of time series data with normal distribution from inflow into waste water treatment plant data which Composed of several groups differing by mean value. Two simple algorithms: K-mean and EM were chosen as a clustering method. The rand index was used to measure the similarity. After simple meta-clustering, regression model was performed for each subgroups. The final model was a sum of subgroups models. The quality of obtained model was compared with the regression model made using the same explanatory variables but with no clustering of data. Results were compared by determination coefficient (R2), measure of prediction accuracy mean absolute percentage error (MAPE) and comparison on linear chart. Preliminary results allows to foresee the potential of the presented technique.

Keywords: clustering, data analysis, data mining, predictive models

Procedia PDF Downloads 330
30729 A Review Paper on Data Mining and Genetic Algorithm

Authors: Sikander Singh Cheema, Jasmeen Kaur


In this paper, the concept of data mining is summarized and its one of the important process i.e KDD is summarized. The data mining based on Genetic Algorithm is researched in and ways to achieve the data mining Genetic Algorithm are surveyed. This paper also conducts a formal review on the area of data mining tasks and genetic algorithm in various fields.

Keywords: data mining, KDD, genetic algorithm, descriptive mining, predictive mining

Procedia PDF Downloads 303
30728 Quantum Statistical Machine Learning and Quantum Time Series

Authors: Omar Alzeley, Sergey Utev


Minimizing a constrained multivariate function is the fundamental of Machine learning, and these algorithms are at the core of data mining and data visualization techniques. The decision function that maps input points to output points is based on the result of optimization. This optimization is the central of learning theory. One approach to complex systems where the dynamics of the system is inferred by a statistical analysis of the fluctuations in time of some associated observable is time series analysis. The purpose of this paper is a mathematical transition from the autoregressive model of classical time series to the matrix formalization of quantum theory. Firstly, we have proposed a quantum time series model (QTS). Although Hamiltonian technique becomes an established tool to detect a deterministic chaos, other approaches emerge. The quantum probabilistic technique is used to motivate the construction of our QTS model. The QTS model resembles the quantum dynamic model which was applied to financial data. Secondly, various statistical methods, including machine learning algorithms such as the Kalman filter algorithm, are applied to estimate and analyses the unknown parameters of the model. Finally, simulation techniques such as Markov chain Monte Carlo have been used to support our investigations. The proposed model has been examined by using real and simulated data. We establish the relation between quantum statistical machine and quantum time series via random matrix theory. It is interesting to note that the primary focus of the application of QTS in the field of quantum chaos was to find a model that explain chaotic behaviour. Maybe this model will reveal another insight into quantum chaos.

Keywords: machine learning, simulation techniques, quantum probability, tensor product, time series

Procedia PDF Downloads 325
30727 Applying a Noise Reduction Method to Reveal Chaos in the River Flow Time Series

Authors: Mohammad H. Fattahi


Chaotic analysis has been performed on the river flow time series before and after applying the wavelet based de-noising techniques in order to investigate the noise content effects on chaotic nature of flow series. In this study, 38 years of monthly runoff data of three gauging stations were used. Gauging stations were located in Ghar-e-Aghaj river basin, Fars province, Iran. The noise level of time series was estimated with the aid of Gaussian kernel algorithm. This step was found to be crucial in preventing removal of the vital data such as memory, correlation and trend from the time series in addition to the noise during de-noising process.

Keywords: chaotic behavior, wavelet, noise reduction, river flow

Procedia PDF Downloads 389
30726 Data-Mining Approach to Analyzing Industrial Process Information for Real-Time Monitoring

Authors: Seung-Lock Seo


This work presents a data-mining empirical monitoring scheme for industrial processes with partially unbalanced data. Measurement data of good operations are relatively easy to gather, but in unusual special events or faults it is generally difficult to collect process information or almost impossible to analyze some noisy data of industrial processes. At this time some noise filtering techniques can be used to enhance process monitoring performance in a real-time basis. In addition, pre-processing of raw process data is helpful to eliminate unwanted variation of industrial process data. In this work, the performance of various monitoring schemes was tested and demonstrated for discrete batch process data. It showed that the monitoring performance was improved significantly in terms of monitoring success rate of given process faults.

Keywords: data mining, process data, monitoring, safety, industrial processes

Procedia PDF Downloads 289
30725 Association Rules Mining and NOSQL Oriented Document in Big Data

Authors: Sarra Senhadji, Imene Benzeguimi, Zohra Yagoub


Big Data represents the recent technology of manipulating voluminous and unstructured data sets over multiple sources. Therefore, NOSQL appears to handle the problem of unstructured data. Association rules mining is one of the popular techniques of data mining to extract hidden relationship from transactional databases. The algorithm for finding association dependencies is well-solved with Map Reduce. The goal of our work is to reduce the time of generating of frequent itemsets by using Map Reduce and NOSQL database oriented document. A comparative study is given to evaluate the performances of our algorithm with the classical algorithm Apriori.

Keywords: Apriori, Association rules mining, Big Data, Data Mining, Hadoop, MapReduce, MongoDB, NoSQL

Procedia PDF Downloads 64
30724 A Modular Framework for Enabling Analysis for Educators with Different Levels of Data Mining Skills

Authors: Kyle De Freitas, Margaret Bernard


Enabling data mining analysis among a wider audience of educators is an active area of research within the educational data mining (EDM) community. The paper proposes a framework for developing an environment that caters for educators who have little technical data mining skills as well as for more advanced users with some data mining expertise. This framework architecture was developed through the review of the strengths and weaknesses of existing models in the literature. The proposed framework provides a modular architecture for future researchers to focus on the development of specific areas within the EDM process. Finally, the paper also highlights a strategy of enabling analysis through either the use of predefined questions or a guided data mining process and highlights how the developed questions and analysis conducted can be reused and extended over time.

Keywords: educational data mining, learning management system, learning analytics, EDM framework

Procedia PDF Downloads 239
30723 Analysis of Dynamics Underlying the Observation Time Series by Using a Singular Spectrum Approach

Authors: O. Delage, H. Bencherif, T. Portafaix, A. Bourdier


The main purpose of time series analysis is to learn about the dynamics behind some time ordered measurement data. Two approaches are used in the literature to get a better knowledge of the dynamics contained in observation data sequences. The first of these approaches concerns time series decomposition, which is an important analysis step allowing patterns and behaviors to be extracted as components providing insight into the mechanisms producing the time series. As in many cases, time series are short, noisy, and non-stationary. To provide components which are physically meaningful, methods such as Empirical Mode Decomposition (EMD), Empirical Wavelet Transform (EWT) or, more recently, Empirical Adaptive Wavelet Decomposition (EAWD) have been proposed. The second approach is to reconstruct the dynamics underlying the time series as a trajectory in state space by mapping a time series into a set of Rᵐ lag vectors by using the method of delays (MOD). Takens has proved that the trajectory obtained with the MOD technic is equivalent to the trajectory representing the dynamics behind the original time series. This work introduces the singular spectrum decomposition (SSD), which is a new adaptive method for decomposing non-linear and non-stationary time series in narrow-banded components. This method takes its origin from singular spectrum analysis (SSA), a nonparametric spectral estimation method used for the analysis and prediction of time series. As the first step of SSD is to constitute a trajectory matrix by embedding a one-dimensional time series into a set of lagged vectors, SSD can also be seen as a reconstruction method like MOD. We will first give a brief overview of the existing decomposition methods (EMD-EWT-EAWD). The SSD method will then be described in detail and applied to experimental time series of observations resulting from total columns of ozone measurements. The results obtained will be compared with those provided by the previously mentioned decomposition methods. We will also compare the reconstruction qualities of the observed dynamics obtained from the SSD and MOD methods.

Keywords: time series analysis, adaptive time series decomposition, wavelet, phase space reconstruction, singular spectrum analysis

Procedia PDF Downloads 14
30722 Data Mining Approach for Commercial Data Classification and Migration in Hybrid Storage Systems

Authors: Mais Haj Qasem, Maen M. Al Assaf, Ali Rodan


Parallel hybrid storage systems consist of a hierarchy of different storage devices that vary in terms of data reading speed performance. As we ascend in the hierarchy, data reading speed becomes faster. Thus, migrating the application’ important data that will be accessed in the near future to the uppermost level will reduce the application I/O waiting time; hence, reducing its execution elapsed time. In this research, we implement trace-driven two-levels parallel hybrid storage system prototype that consists of HDDs and SSDs. The prototype uses data mining techniques to classify application’ data in order to determine its near future data accesses in parallel with the its on-demand request. The important data (i.e. the data that the application will access in the near future) are continuously migrated to the uppermost level of the hierarchy. Our simulation results show that our data migration approach integrated with data mining techniques reduces the application execution elapsed time when using variety of traces in at least to 22%.

Keywords: hybrid storage system, data mining, recurrent neural network, support vector machine

Procedia PDF Downloads 231
30721 Indexing and Incremental Approach Using Map Reduce Bipartite Graph (MRBG) for Mining Evolving Big Data

Authors: Adarsh Shroff


Big data is a collection of dataset so large and complex that it becomes difficult to process using data base management tools. To perform operations like search, analysis, visualization on big data by using data mining; which is the process of extraction of patterns or knowledge from large data set. In recent years, the data mining applications become stale and obsolete over time. Incremental processing is a promising approach to refreshing mining results. It utilizes previously saved states to avoid the expense of re-computation from scratch. This project uses i2MapReduce, an incremental processing extension to Map Reduce, the most widely used framework for mining big data. I2MapReduce performs key-value pair level incremental processing rather than task level re-computation, supports not only one-step computation but also more sophisticated iterative computation, which is widely used in data mining applications, and incorporates a set of novel techniques to reduce I/O overhead for accessing preserved fine-grain computation states. To optimize the mining results, evaluate i2MapReduce using a one-step algorithm and three iterative algorithms with diverse computation characteristics for efficient mining.

Keywords: big data, map reduce, incremental processing, iterative computation

Procedia PDF Downloads 251
30720 Efficient Frequent Itemset Mining Methods over Real-Time Spatial Big Data

Authors: Hamdi Sana, Emna Bouazizi, Sami Faiz


In recent years, there is a huge increase in the use of spatio-temporal applications where data and queries are continuously moving. As a result, the need to process real-time spatio-temporal data seems clear and real-time stream data management becomes a hot topic. Sliding window model and frequent itemset mining over dynamic data are the most important problems in the context of data mining. Thus, sliding window model for frequent itemset mining is a widely used model for data stream mining due to its emphasis on recent data and its bounded memory requirement. These methods use the traditional transaction-based sliding window model where the window size is based on a fixed number of transactions. Actually, this model supposes that all transactions have a constant rate which is not suited for real-time applications. And the use of this model in such applications endangers their performance. Based on these observations, this paper relaxes the notion of window size and proposes the use of a timestamp-based sliding window model. In our proposed frequent itemset mining algorithm, support conditions are used to differentiate frequents and infrequent patterns. Thereafter, a tree is developed to incrementally maintain the essential information. We evaluate our contribution. The preliminary results are quite promising.

Keywords: real-time spatial big data, frequent itemset, transaction-based sliding window model, timestamp-based sliding window model, weighted frequent patterns, tree, stream query

Procedia PDF Downloads 78
30719 Frequent Item Set Mining for Big Data Using MapReduce Framework

Authors: Tamanna Jethava, Rahul Joshi


Frequent Item sets play an essential role in many data Mining tasks that try to find interesting patterns from the database. Typically it refers to a set of items that frequently appear together in transaction dataset. There are several mining algorithm being used for frequent item set mining, yet most do not scale to the type of data we presented with today, so called “BIG DATA”. Big Data is a collection of large data sets. Our approach is to work on the frequent item set mining over the large dataset with scalable and speedy way. Big Data basically works with Map Reduce along with HDFS is used to find out frequent item sets from Big Data on large cluster. This paper focuses on using pre-processing & mining algorithm as hybrid approach for big data over Hadoop platform.

Keywords: frequent item set mining, big data, Hadoop, MapReduce

Procedia PDF Downloads 268
30718 A Posteriori Trading-Inspired Model-Free Time Series Segmentation

Authors: Plessen Mogens Graf


Within the context of multivariate time series segmentation, this paper proposes a method inspired by a posteriori optimal trading. After a normalization step, time series are treated channelwise as surrogate stock prices that can be traded optimally a posteriori in a virtual portfolio holding either stock or cash. Linear transaction costs are interpreted as hyperparameters for noise filtering. Trading signals, as well as trading signals obtained on the reversed time series, are used for unsupervised channelwise labeling before a consensus over all channels is reached that determines the final segmentation time instants. The method is model-free such that no model prescriptions for segments are made. Benefits of proposed approach include simplicity, computational efficiency, and adaptability to a wide range of different shapes of time series. Performance is demonstrated on synthetic and real-world data, including a large-scale dataset comprising a multivariate time series of dimension 1000 and length 2709. Proposed method is compared to a popular model-based bottom-up approach fitting piecewise affine models and to a recent model-based top-down approach fitting Gaussian models and found to be consistently faster while producing more intuitive results in the sense of segmenting time series at peaks and valleys.

Keywords: time series segmentation, model-free, trading-inspired, multivariate data

Procedia PDF Downloads 62
30717 Harmonic Data Preparation for Clustering and Classification

Authors: Ali Asheibi


The rapid increase in the size of databases required to store power quality monitoring data has demanded new techniques for analysing and understanding the data. One suggested technique to assist in analysis is data mining. Preparing raw data to be ready for data mining exploration take up most of the effort and time spent in the whole data mining process. Clustering is an important technique in data mining and machine learning in which underlying and meaningful groups of data are discovered. Large amounts of harmonic data have been collected from an actual harmonic monitoring system in a distribution system in Australia for three years. This amount of acquired data makes it difficult to identify operational events that significantly impact the harmonics generated on the system. In this paper, harmonic data preparation processes to better understanding of the data have been presented. Underlying classes in this data has then been identified using clustering technique based on the Minimum Message Length (MML) method. The underlying operational information contained within the clusters can be rapidly visualised by the engineers. The C5.0 algorithm was used for classification and interpretation of the generated clusters.

Keywords: data mining, harmonic data, clustering, classification

Procedia PDF Downloads 158
30716 Adaptive Neuro Fuzzy Inference System Model Based on Support Vector Regression for Stock Time Series Forecasting

Authors: Anita Setianingrum, Oki S. Jaya, Zuherman Rustam


Forecasting stock price is a challenging task due to the complex time series of the data. The complexity arises from many variables that affect the stock market. Many time series models have been proposed before, but those previous models still have some problems: 1) put the subjectivity of choosing the technical indicators, and 2) rely upon some assumptions about the variables, so it is limited to be applied to all datasets. Therefore, this paper studied a novel Adaptive Neuro-Fuzzy Inference System (ANFIS) time series model based on Support Vector Regression (SVR) for forecasting the stock market. In order to evaluate the performance of proposed models, stock market transaction data of TAIEX and HIS from January to December 2015 is collected as experimental datasets. As a result, the method has outperformed its counterparts in terms of accuracy.

Keywords: ANFIS, fuzzy time series, stock forecasting, SVR

Procedia PDF Downloads 165
30715 Comparison of Applicability of Time Series Forecasting Models VAR, ARCH and ARMA in Management Science: Study Based on Empirical Analysis of Time Series Techniques

Authors: Muhammad Tariq, Hammad Tahir, Fawwad Mahmood Butt


Purpose: This study attempts to examine the best forecasting methodologies in the time series. The time series forecasting models such as VAR, ARCH and the ARMA are considered for the analysis. Methodology: The Bench Marks or the parameters such as Adjusted R square, F-stats, Durban Watson, and Direction of the roots have been critically and empirically analyzed. The empirical analysis consists of time series data of Consumer Price Index and Closing Stock Price. Findings: The results show that the VAR model performed better in comparison to other models. Both the reliability and significance of VAR model is highly appreciable. In contrary to it, the ARCH model showed very poor results for forecasting. However, the results of ARMA model appeared double standards i.e. the AR roots showed that model is stationary and that of MA roots showed that the model is invertible. Therefore, the forecasting would remain doubtful if it made on the bases of ARMA model. It has been concluded that VAR model provides best forecasting results. Practical Implications: This paper provides empirical evidences for the application of time series forecasting model. This paper therefore provides the base for the application of best time series forecasting model.

Keywords: forecasting, time series, auto regression, ARCH, ARMA

Procedia PDF Downloads 217
30714 Reviewing Privacy Preserving Distributed Data Mining

Authors: Sajjad Baghernezhad, Saeideh Baghernezhad


Nowadays considering human involved in increasing data development some methods such as data mining to extract science are unavoidable. One of the discussions of data mining is inherent distribution of the data usually the bases creating or receiving such data belong to corporate or non-corporate persons and do not give their information freely to others. Yet there is no guarantee to enable someone to mine special data without entering in the owner’s privacy. Sending data and then gathering them by each vertical or horizontal software depends on the type of their preserving type and also executed to improve data privacy. In this study it was attempted to compare comprehensively preserving data methods; also general methods such as random data, coding and strong and weak points of each one are examined.

Keywords: data mining, distributed data mining, privacy protection, privacy preserving

Procedia PDF Downloads 391
30713 Analysis of Financial Time Series by Using Ornstein-Uhlenbeck Type Models

Authors: Md Al Masum Bhuiyan, Maria C. Mariani, Osei K. Tweneboah


In the present work, we develop a technique for estimating the volatility of financial time series by using stochastic differential equation. Taking the daily closing prices from developed and emergent stock markets as the basis, we argue that the incorporation of stochastic volatility into the time-varying parameter estimation significantly improves the forecasting performance via Maximum Likelihood Estimation. While using the technique, we see the long-memory behavior of data sets and one-step-ahead-predicted log-volatility with ±2 standard errors despite the variation of the observed noise from a Normal mixture distribution, because the financial data studied is not fully Gaussian. Also, the Ornstein-Uhlenbeck process followed in this work simulates well the financial time series, which aligns our estimation algorithm with large data sets due to the fact that this algorithm has good convergence properties.

Keywords: financial time series, maximum likelihood estimation, Ornstein-Uhlenbeck type models, stochastic volatility model

Procedia PDF Downloads 161
30712 Knowledge Discovery and Data Mining Techniques in Textile Industry

Authors: Filiz Ersoz, Taner Ersoz, Erkin Guler


This paper addresses the issues and technique for textile industry using data mining techniques. Data mining has been applied to the stitching of garments products that were obtained from a textile company. Data mining techniques were applied to the data obtained from the CHAID algorithm, CART algorithm, Regression Analysis and, Artificial Neural Networks. Classification technique based analyses were used while data mining and decision model about the production per person and variables affecting about production were found by this method. In the study, the results show that as the daily working time increases, the production per person also decreases. In addition, the relationship between total daily working and production per person shows a negative result and the production per person show the highest and negative relationship.

Keywords: data mining, textile production, decision trees, classification

Procedia PDF Downloads 270
30711 Wind Spped Data Analysis in Colombia in 2013 and 2015

Authors: Harold P. Villota, Alejandro Osorio B.


The energy meteorology is an area for study energy complementarity and the use of renewable sources in interconnected systems. Due to diversify the energy matrix in Colombia with wind sources, is necessary to know the data bases about this one. However, the time series given by 260 automatic weather stations have empty, and no apply data, so the purpose is to fill the time series selecting two years to characterize, impute and use like base to complete the data between 2005 and 2020.

Keywords: complementarity, wind speed, renewable, colombia, characteri, characterization, imputation

Procedia PDF Downloads 25
30710 Algorithms used in Spatial Data Mining GIS

Authors: Vahid Bairami Rad


Extracting knowledge from spatial data like GIS data is important to reduce the data and extract information. Therefore, the development of new techniques and tools that support the human in transforming data into useful knowledge has been the focus of the relatively new and interdisciplinary research area ‘knowledge discovery in databases’. Thus, we introduce a set of database primitives or basic operations for spatial data mining which are sufficient to express most of the spatial data mining algorithms from the literature. This approach has several advantages. Similar to the relational standard language SQL, the use of standard primitives will speed-up the development of new data mining algorithms and will also make them more portable. We introduced a database-oriented framework for spatial data mining which is based on the concepts of neighborhood graphs and paths. A small set of basic operations on these graphs and paths were defined as database primitives for spatial data mining. Furthermore, techniques to efficiently support the database primitives by a commercial DBMS were presented.

Keywords: spatial data base, knowledge discovery database, data mining, spatial relationship, predictive data mining

Procedia PDF Downloads 351
30709 The Modelling of Real Time Series Data

Authors: Valeria Bondarenko


We proposed algorithms for: estimation of parameters fBm (volatility and Hurst exponent) and for the approximation of random time series by functional of fBm. We proved the consistency of the estimators, which constitute the above algorithms, and proved the optimal forecast of approximated time series. The adequacy of estimation algorithms, approximation, and forecasting is proved by numerical experiment. During the process of creating software, the system has been created, which is displayed by the hierarchical structure. The comparative analysis of proposed algorithms with the other methods gives evidence of the advantage of approximation method. The results can be used to develop methods for the analysis and modeling of time series describing the economic, physical, biological and other processes.

Keywords: mathematical model, random process, Wiener process, fractional Brownian motion

Procedia PDF Downloads 267
30708 Implementation of an IoT Sensor Data Collection and Analysis Library

Authors: Jihyun Song, Kyeongjoo Kim, Minsoo Lee


Due to the development of information technology and wireless Internet technology, various data are being generated in various fields. These data are advantageous in that they provide real-time information to the users themselves. However, when the data are accumulated and analyzed, more various information can be extracted. In addition, development and dissemination of boards such as Arduino and Raspberry Pie have made it possible to easily test various sensors, and it is possible to collect sensor data directly by using database application tools such as MySQL. These directly collected data can be used for various research and can be useful as data for data mining. However, there are many difficulties in using the board to collect data, and there are many difficulties in using it when the user is not a computer programmer, or when using it for the first time. Even if data are collected, lack of expert knowledge or experience may cause difficulties in data analysis and visualization. In this paper, we aim to construct a library for sensor data collection and analysis to overcome these problems.

Keywords: clustering, data mining, DBSCAN, k-means, k-medoids, sensor data

Procedia PDF Downloads 289