Search results for: time series data mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 37711

Search results for: time series data mining

37621 Time-Series Load Data Analysis for User Power Profiling

Authors: Mahdi Daghmhehci Firoozjaei, Minchang Kim, Dima Alhadidi

Abstract:

In this paper, we present a power profiling model for smart grid consumers based on real time load data acquired smart meters. It profiles consumers’ power consumption behaviour using the dynamic time warping (DTW) clustering algorithm. Due to the invariability of signal warping of this algorithm, time-disordered load data can be profiled and consumption features be extracted. Two load types are defined and the related load patterns are extracted for classifying consumption behaviour by DTW. The classification methodology is discussed in detail. To evaluate the performance of the method, we analyze the time-series load data measured by a smart meter in a real case. The results verify the effectiveness of the proposed profiling method with 90.91% true positive rate for load type clustering in the best case.

Keywords: power profiling, user privacy, dynamic time warping, smart grid

Procedia PDF Downloads 112
37620 Study for Establishing a Concept of Underground Mining in a Folded Deposit with Weathering

Authors: Chandan Pramanik, Bikramjit Chanda

Abstract:

Large metal mines operated with open-cast mining methods must transition to underground mining at the conclusion of the operation; however, this requires a period of a difficult time when production convergence due to interference between the two mining methods. A transition model with collaborative mining operations is presented and established in this work, based on the case of the South Kaliapani Underground Project, to address these technical issues of inadequate production security and other mining challenges during the transition phase and beyond. By integrating the technology of the small-scale Drift and Fill method and Highly productive Sub Level Open Stoping at deep section, this hybrid mining concept tries to eliminate major bottlenecks and offers an optimized production profile with the safe and sustainable operation. Considering every geo-mining aspect, this study offers a genuine and precise technical deliberation for the transition from open pit to underground mining.

Keywords: drift and fill, geo-mining aspect, sublevel open stoping, underground mining method

Procedia PDF Downloads 70
37619 Towards a Distributed Computation Platform Tailored for Educational Process Discovery and Analysis

Authors: Awatef Hicheur Cairns, Billel Gueni, Hind Hafdi, Christian Joubert, Nasser Khelifa

Abstract:

Given the ever changing needs of the job markets, education and training centers are increasingly held accountable for student success. Therefore, education and training centers have to focus on ways to streamline their offers and educational processes in order to achieve the highest level of quality in curriculum contents and managerial decisions. Educational process mining is an emerging field in the educational data mining (EDM) discipline, concerned with developing methods to discover, analyze and provide a visual representation of complete educational processes. In this paper, we present our distributed computation platform which allows different education centers and institutions to load their data and access to advanced data mining and process mining services. To achieve this, we present also a comparative study of the different clustering techniques developed in the context of process mining to partition efficiently educational traces. Our goal is to find the best strategy for distributing heavy analysis computations on many processing nodes of our platform.

Keywords: educational process mining, distributed process mining, clustering, distributed platform, educational data mining, ProM

Procedia PDF Downloads 427
37618 Exploring Time-Series Phosphoproteomic Datasets in the Context of Network Models

Authors: Sandeep Kaur, Jenny Vuong, Marcel Julliard, Sean O'Donoghue

Abstract:

Time-series data are useful for modelling as they can enable model-evaluation. However, when reconstructing models from phosphoproteomic data, often non-exact methods are utilised, as the knowledge regarding the network structure, such as, which kinases and phosphatases lead to the observed phosphorylation state, is incomplete. Thus, such reactions are often hypothesised, which gives rise to uncertainty. Here, we propose a framework, implemented via a web-based tool (as an extension to Minardo), which given time-series phosphoproteomic datasets, can generate κ models. The incompleteness and uncertainty in the generated model and reactions are clearly presented to the user via the visual method. Furthermore, we demonstrate, via a toy EGF signalling model, the use of algorithmic verification to verify κ models. Manually formulated requirements were evaluated with regards to the model, leading to the highlighting of the nodes causing unsatisfiability (i.e. error causing nodes). We aim to integrate such methods into our web-based tool and demonstrate how the identified erroneous nodes can be presented to the user via the visual method. Thus, in this research we present a framework, to enable a user to explore phosphorylation proteomic time-series data in the context of models. The observer can visualise which reactions in the model are highly uncertain, and which nodes cause incorrect simulation outputs. A tool such as this enables an end-user to determine the empirical analysis to perform, to reduce uncertainty in the presented model - thus enabling a better understanding of the underlying system.

Keywords: κ-models, model verification, time-series phosphoproteomic datasets, uncertainty and error visualisation

Procedia PDF Downloads 230
37617 Data Mining in Medicine Domain Using Decision Trees and Vector Support Machine

Authors: Djamila Benhaddouche, Abdelkader Benyettou

Abstract:

In this paper, we used data mining to extract biomedical knowledge. In general, complex biomedical data collected in studies of populations are treated by statistical methods, although they are robust, they are not sufficient in themselves to harness the potential wealth of data. For that you used in step two learning algorithms: the Decision Trees and Support Vector Machine (SVM). These supervised classification methods are used to make the diagnosis of thyroid disease. In this context, we propose to promote the study and use of symbolic data mining techniques.

Keywords: biomedical data, learning, classifier, algorithms decision tree, knowledge extraction

Procedia PDF Downloads 518
37616 Analysis of Different Classification Techniques Using WEKA for Diabetic Disease

Authors: Usama Ahmed

Abstract:

Data mining is the process of analyze data which are used to predict helpful information. It is the field of research which solve various type of problem. In data mining, classification is an important technique to classify different kind of data. Diabetes is most common disease. This paper implements different classification technique using Waikato Environment for Knowledge Analysis (WEKA) on diabetes dataset and find which algorithm is suitable for working. The best classification algorithm based on diabetic data is Naïve Bayes. The accuracy of Naïve Bayes is 76.31% and take 0.06 seconds to build the model.

Keywords: data mining, classification, diabetes, WEKA

Procedia PDF Downloads 124
37615 HPPDFIM-HD: Transaction Distortion and Connected Perturbation Approach for Hierarchical Privacy Preserving Distributed Frequent Itemset Mining over Horizontally-Partitioned Dataset

Authors: Fuad Ali Mohammed Al-Yarimi

Abstract:

Many algorithms have been proposed to provide privacy preserving in data mining. These protocols are based on two main approaches named as: the perturbation approach and the Cryptographic approach. The first one is based on perturbation of the valuable information while the second one uses cryptographic techniques. The perturbation approach is much more efficient with reduced accuracy while the cryptographic approach can provide solutions with perfect accuracy. However, the cryptographic approach is a much slower method and requires considerable computation and communication overhead. In this paper, a new scalable protocol is proposed which combines the advantages of the perturbation and distortion along with cryptographic approach to perform privacy preserving in distributed frequent itemset mining on horizontally distributed data. Both the privacy and performance characteristics of the proposed protocol are studied empirically.

Keywords: anonymity data, data mining, distributed frequent itemset mining, gaussian perturbation, perturbation approach, privacy preserving data mining

Procedia PDF Downloads 482
37614 Mining Multicity Urban Data for Sustainable Population Relocation

Authors: Xu Du, Aparna S. Varde

Abstract:

In this research, we propose to conduct diagnostic and predictive analysis about the key factors and consequences of urban population relocation. To achieve this goal, urban simulation models extract the urban development trends as land use change patterns from a variety of data sources. The results are treated as part of urban big data with other information such as population change and economic conditions. Multiple data mining methods are deployed on this data to analyze nonlinear relationships between parameters. The result determines the driving force of population relocation with respect to urban sprawl and urban sustainability and their related parameters. Experiments so far reveal that data mining methods discover useful knowledge from the multicity urban data. This work sets the stage for developing a comprehensive urban simulation model for catering to specific questions by targeted users. It contributes towards achieving sustainability as a whole.

Keywords: data mining, environmental modeling, sustainability, urban planning

Procedia PDF Downloads 270
37613 Statistical Time-Series and Neural Architecture of Malaria Patients Records in Lagos, Nigeria

Authors: Akinbo Razak Yinka, Adesanya Kehinde Kazeem, Oladokun Oluwagbenga Peter

Abstract:

Time series data are sequences of observations collected over a period of time. Such data can be used to predict health outcomes, such as disease progression, mortality, hospitalization, etc. The Statistical approach is based on mathematical models that capture the patterns and trends of the data, such as autocorrelation, seasonality, and noise, while Neural methods are based on artificial neural networks, which are computational models that mimic the structure and function of biological neurons. This paper compared both parametric and non-parametric time series models of patients treated for malaria in Maternal and Child Health Centres in Lagos State, Nigeria. The forecast methods considered linear regression, Integrated Moving Average, ARIMA and SARIMA Modeling for the parametric approach, while Multilayer Perceptron (MLP) and Long Short-Term Memory (LSTM) Network were used for the non-parametric model. The performance of each method is evaluated using the Mean Absolute Error (MAE), R-squared (R2) and Root Mean Square Error (RMSE) as criteria to determine the accuracy of each model. The study revealed that the best performance in terms of error was found in MLP, followed by the LSTM and ARIMA models. In addition, the Bootstrap Aggregating technique was used to make robust forecasts when there are uncertainties in the data.

Keywords: ARIMA, bootstrap aggregation, MLP, LSTM, SARIMA, time-series analysis

Procedia PDF Downloads 41
37612 Fractal-Wavelet Based Techniques for Improving the Artificial Neural Network Models

Authors: Reza Bazargan lari, Mohammad H. Fattahi

Abstract:

Natural resources management including water resources requires reliable estimations of time variant environmental parameters. Small improvements in the estimation of environmental parameters would result in grate effects on managing decisions. Noise reduction using wavelet techniques is an effective approach for pre-processing of practical data sets. Predictability enhancement of the river flow time series are assessed using fractal approaches before and after applying wavelet based pre-processing. Time series correlation and persistency, the minimum sufficient length for training the predicting model and the maximum valid length of predictions were also investigated through a fractal assessment.

Keywords: wavelet, de-noising, predictability, time series fractal analysis, valid length, ANN

Procedia PDF Downloads 345
37611 Summarizing Data Sets for Data Mining by Using Statistical Methods in Coastal Engineering

Authors: Yunus Doğan, Ahmet Durap

Abstract:

Coastal regions are the one of the most commonly used places by the natural balance and the growing population. In coastal engineering, the most valuable data is wave behaviors. The amount of this data becomes very big because of observations that take place for periods of hours, days and months. In this study, some statistical methods such as the wave spectrum analysis methods and the standard statistical methods have been used. The goal of this study is the discovery profiles of the different coast areas by using these statistical methods, and thus, obtaining an instance based data set from the big data to analysis by using data mining algorithms. In the experimental studies, the six sample data sets about the wave behaviors obtained by 20 minutes of observations from Mersin Bay in Turkey and converted to an instance based form, while different clustering techniques in data mining algorithms were used to discover similar coastal places. Moreover, this study discusses that this summarization approach can be used in other branches collecting big data such as medicine.

Keywords: clustering algorithms, coastal engineering, data mining, data summarization, statistical methods

Procedia PDF Downloads 339
37610 Multi-Source Data Fusion for Urban Comprehensive Management

Authors: Bolin Hua

Abstract:

In city governance, various data are involved, including city component data, demographic data, housing data and all kinds of business data. These data reflects different aspects of people, events and activities. Data generated from various systems are different in form and data source are different because they may come from different sectors. In order to reflect one or several facets of an event or rule, data from multiple sources need fusion together. Data from different sources using different ways of collection raised several issues which need to be resolved. Problem of data fusion include data update and synchronization, data exchange and sharing, file parsing and entry, duplicate data and its comparison, resource catalogue construction. Governments adopt statistical analysis, time series analysis, extrapolation, monitoring analysis, value mining, scenario prediction in order to achieve pattern discovery, law verification, root cause analysis and public opinion monitoring. The result of Multi-source data fusion is to form a uniform central database, which includes people data, location data, object data, and institution data, business data and space data. We need to use meta data to be referred to and read when application needs to access, manipulate and display the data. A uniform meta data management ensures effectiveness and consistency of data in the process of data exchange, data modeling, data cleansing, data loading, data storing, data analysis, data search and data delivery.

Keywords: multi-source data fusion, urban comprehensive management, information fusion, government data

Procedia PDF Downloads 353
37609 Automatic Thresholding for Data Gap Detection for a Set of Sensors in Instrumented Buildings

Authors: Houda Najeh, Stéphane Ploix, Mahendra Pratap Singh, Karim Chabir, Mohamed Naceur Abdelkrim

Abstract:

Building systems are highly vulnerable to different kinds of faults and failures. In fact, various faults, failures and human behaviors could affect the building performance. This paper tackles the detection of unreliable sensors in buildings. Different literature surveys on diagnosis techniques for sensor grids in buildings have been published but all of them treat only bias and outliers. Occurences of data gaps have also not been given an adequate span of attention in the academia. The proposed methodology comprises the automatic thresholding for data gap detection for a set of heterogeneous sensors in instrumented buildings. Sensor measurements are considered to be regular time series. However, in reality, sensor values are not uniformly sampled. So, the issue to solve is from which delay each sensor become faulty? The use of time series is required for detection of abnormalities on the delays. The efficiency of the method is evaluated on measurements obtained from a real power plant: an office at Grenoble Institute of technology equipped by 30 sensors.

Keywords: building system, time series, diagnosis, outliers, delay, data gap

Procedia PDF Downloads 221
37608 Exploring the Role of Data Mining in Crime Classification: A Systematic Literature Review

Authors: Faisal Muhibuddin, Ani Dijah Rahajoe

Abstract:

This in-depth exploration, through a systematic literature review, scrutinizes the nuanced role of data mining in the classification of criminal activities. The research focuses on investigating various methodological aspects and recent developments in leveraging data mining techniques to enhance the effectiveness and precision of crime categorization. Commencing with an exposition of the foundational concepts of crime classification and its evolutionary dynamics, this study details the paradigm shift from conventional methods towards approaches supported by data mining, addressing the challenges and complexities inherent in the modern crime landscape. Specifically, the research delves into various data mining techniques, including K-means clustering, Naïve Bayes, K-nearest neighbour, and clustering methods. A comprehensive review of the strengths and limitations of each technique provides insights into their respective contributions to improving crime classification models. The integration of diverse data sources takes centre stage in this research. A detailed analysis explores how the amalgamation of structured data (such as criminal records) and unstructured data (such as social media) can offer a holistic understanding of crime, enriching classification models with more profound insights. Furthermore, the study explores the temporal implications in crime classification, emphasizing the significance of considering temporal factors to comprehend long-term trends and seasonality. The availability of real-time data is also elucidated as a crucial element in enhancing responsiveness and accuracy in crime classification.

Keywords: data mining, classification algorithm, naïve bayes, k-means clustering, k-nearest neigbhor, crime, data analysis, sistematic literature review

Procedia PDF Downloads 36
37607 Business-Intelligence Mining of Large Decentralized Multimedia Datasets with a Distributed Multi-Agent System

Authors: Karima Qayumi, Alex Norta

Abstract:

The rapid generation of high volume and a broad variety of data from the application of new technologies pose challenges for the generation of business-intelligence. Most organizations and business owners need to extract data from multiple sources and apply analytical methods for the purposes of developing their business. Therefore, the recently decentralized data management environment is relying on a distributed computing paradigm. While data are stored in highly distributed systems, the implementation of distributed data-mining techniques is a challenge. The aim of this technique is to gather knowledge from every domain and all the datasets stemming from distributed resources. As agent technologies offer significant contributions for managing the complexity of distributed systems, we consider this for next-generation data-mining processes. To demonstrate agent-based business intelligence operations, we use agent-oriented modeling techniques to develop a new artifact for mining massive datasets.

Keywords: agent-oriented modeling (AOM), business intelligence model (BIM), distributed data mining (DDM), multi-agent system (MAS)

Procedia PDF Downloads 399
37606 STTS-EAD: Improving Spatio-Temporal Learning Based Time Series Prediction via Embedded Anomaly Detection

Authors: Tianhao Zhang, Cen Chen, Dawei Cheng, Yuqi Liang, Yuanyuan Liang

Abstract:

Dealing with anomalies is a crucial preprocessing step for multivariate time series prediction. However, existing methods that separate anomaly preprocessing and model training into two stages have certain limitations. Specifically, these methods fail to leverage auxiliary information necessary to distinguish latent anomalies related to spatiotemporal factors during the preprocessing stage. Instead, they solely rely on data distribution for detection which may lead to incorrect processing of many samples that are beneficial for training. To address this, we propose STTS-EAD, an end-to-end method that seamlessly integrates anomaly detection into the training process of multivariate time series forecasting and aims to improve Spatio-Temporal learning based Time Series prediction via Embedded Anomaly Detection. Our proposed STTS-EAD leverages spatio-temporal information for forecasting and anomaly detection, with the two parts alternately executed and optimized for each other. To the best of our knowledge, STTS-EAD is the first to integrate anomaly detection and forecasting tasks in the training phase for improving the accuracy of multivariate time series forecasting. Extensive experiments on a public stock dataset and two real-world sales datasets from a renowned coffee chain enterprise show that our proposed method can effectively process detected anomalies in the training stage to improve forecasting performance in the inference stage and significantly outperform baselines.

Keywords: multivariate time series, anomaly detection, time series forecasting, spatiotemporal feature learning

Procedia PDF Downloads 14
37605 Evidence Theory Enabled Quickest Change Detection Using Big Time-Series Data from Internet of Things

Authors: Hossein Jafari, Xiangfang Li, Lijun Qian, Alexander Aved, Timothy Kroecker

Abstract:

Traditionally in sensor networks and recently in the Internet of Things, numerous heterogeneous sensors are deployed in distributed manner to monitor a phenomenon that often can be model by an underlying stochastic process. The big time-series data collected by the sensors must be analyzed to detect change in the stochastic process as quickly as possible with tolerable false alarm rate. However, sensors may have different accuracy and sensitivity range, and they decay along time. As a result, the big time-series data collected by the sensors will contain uncertainties and sometimes they are conflicting. In this study, we present a framework to take advantage of Evidence Theory (a.k.a. Dempster-Shafer and Dezert-Smarandache Theories) capabilities of representing and managing uncertainty and conflict to fast change detection and effectively deal with complementary hypotheses. Specifically, Kullback-Leibler divergence is used as the similarity metric to calculate the distances between the estimated current distribution with the pre- and post-change distributions. Then mass functions are calculated and related combination rules are applied to combine the mass values among all sensors. Furthermore, we applied the method to estimate the minimum number of sensors needed to combine, so computational efficiency could be improved. Cumulative sum test is then applied on the ratio of pignistic probability to detect and declare the change for decision making purpose. Simulation results using both synthetic data and real data from experimental setup demonstrate the effectiveness of the presented schemes.

Keywords: CUSUM, evidence theory, kl divergence, quickest change detection, time series data

Procedia PDF Downloads 306
37604 Time Series Modelling and Prediction of River Runoff: Case Study of Karkheh River, Iran

Authors: Karim Hamidi Machekposhti, Hossein Sedghi, Abdolrasoul Telvari, Hossein Babazadeh

Abstract:

Rainfall and runoff phenomenon is a chaotic and complex outcome of nature which requires sophisticated modelling and simulation methods for explanation and use. Time Series modelling allows runoff data analysis and can be used as forecasting tool. In the paper attempt is made to model river runoff data and predict the future behavioural pattern of river based on annual past observations of annual river runoff. The river runoff analysis and predict are done using ARIMA model. For evaluating the efficiency of prediction to hydrological events such as rainfall, runoff and etc., we use the statistical formulae applicable. The good agreement between predicted and observation river runoff coefficient of determination (R2) display that the ARIMA (4,1,1) is the suitable model for predicting Karkheh River runoff at Iran.

Keywords: time series modelling, ARIMA model, river runoff, Karkheh River, CLS method

Procedia PDF Downloads 313
37603 Short Life Cycle Time Series Forecasting

Authors: Shalaka Kadam, Dinesh Apte, Sagar Mainkar

Abstract:

The life cycle of products is becoming shorter and shorter due to increased competition in market, shorter product development time and increased product diversity. Short life cycles are normal in retail industry, style business, entertainment media, and telecom and semiconductor industry. The subject of accurate forecasting for demand of short lifecycle products is of special enthusiasm for many researchers and organizations. Due to short life cycle of products the amount of historical data that is available for forecasting is very minimal or even absent when new or modified products are launched in market. The companies dealing with such products want to increase the accuracy in demand forecasting so that they can utilize the full potential of the market at the same time do not oversupply. This provides the challenge to develop a forecasting model that can forecast accurately while handling large variations in data and consider the complex relationships between various parameters of data. Many statistical models have been proposed in literature for forecasting time series data. Traditional time series forecasting models do not work well for short life cycles due to lack of historical data. Also artificial neural networks (ANN) models are very time consuming to perform forecasting. We have studied the existing models that are used for forecasting and their limitations. This work proposes an effective and powerful forecasting approach for short life cycle time series forecasting. We have proposed an approach which takes into consideration different scenarios related to data availability for short lifecycle products. We then suggest a methodology which combines statistical analysis with structured judgement. Also the defined approach can be applied across domains. We then describe the method of creating a profile from analogous products. This profile can then be used for forecasting products with historical data of analogous products. We have designed an application which combines data, analytics and domain knowledge using point-and-click technology. The forecasting results generated are compared using MAPE, MSE and RMSE error scores. Conclusion: Based on the results it is observed that no one approach is sufficient for short life-cycle forecasting and we need to combine two or more approaches for achieving the desired accuracy.

Keywords: forecast, short life cycle product, structured judgement, time series

Procedia PDF Downloads 325
37602 Data Mining Spatial: Unsupervised Classification of Geographic Data

Authors: Chahrazed Zouaoui

Abstract:

In recent years, the volume of geospatial information is increasing due to the evolution of communication technologies and information, this information is presented often by geographic information systems (GIS) and stored on of spatial databases (BDS). The classical data mining revealed a weakness in knowledge extraction at these enormous amounts of data due to the particularity of these spatial entities, which are characterized by the interdependence between them (1st law of geography). This gave rise to spatial data mining. Spatial data mining is a process of analyzing geographic data, which allows the extraction of knowledge and spatial relationships from geospatial data, including methods of this process we distinguish the monothematic and thematic, geo- Clustering is one of the main tasks of spatial data mining, which is registered in the part of the monothematic method. It includes geo-spatial entities similar in the same class and it affects more dissimilar to the different classes. In other words, maximize intra-class similarity and minimize inter similarity classes. Taking account of the particularity of geo-spatial data. Two approaches to geo-clustering exist, the dynamic processing of data involves applying algorithms designed for the direct treatment of spatial data, and the approach based on the spatial data pre-processing, which consists of applying clustering algorithms classic pre-processed data (by integration of spatial relationships). This approach (based on pre-treatment) is quite complex in different cases, so the search for approximate solutions involves the use of approximation algorithms, including the algorithms we are interested in dedicated approaches (clustering methods for partitioning and methods for density) and approaching bees (biomimetic approach), our study is proposed to design very significant to this problem, using different algorithms for automatically detecting geo-spatial neighborhood in order to implement the method of geo- clustering by pre-treatment, and the application of the bees algorithm to this problem for the first time in the field of geo-spatial.

Keywords: mining, GIS, geo-clustering, neighborhood

Procedia PDF Downloads 358
37601 Predicting Groundwater Areas Using Data Mining Techniques: Groundwater in Jordan as Case Study

Authors: Faisal Aburub, Wael Hadi

Abstract:

Data mining is the process of extracting useful or hidden information from a large database. Extracted information can be used to discover relationships among features, where data objects are grouped according to logical relationships; or to predict unseen objects to one of the predefined groups. In this paper, we aim to investigate four well-known data mining algorithms in order to predict groundwater areas in Jordan. These algorithms are Support Vector Machines (SVMs), Naïve Bayes (NB), K-Nearest Neighbor (kNN) and Classification Based on Association Rule (CBA). The experimental results indicate that the SVMs algorithm outperformed other algorithms in terms of classification accuracy, precision and F1 evaluation measures using the datasets of groundwater areas that were collected from Jordanian Ministry of Water and Irrigation.

Keywords: classification, data mining, evaluation measures, groundwater

Procedia PDF Downloads 253
37600 Analyzing Tools and Techniques for Classification In Educational Data Mining: A Survey

Authors: D. I. George Amalarethinam, A. Emima

Abstract:

Educational Data Mining (EDM) is one of the newest topics to emerge in recent years, and it is concerned with developing methods for analyzing various types of data gathered from the educational circle. EDM methods and techniques with machine learning algorithms are used to extract meaningful and usable information from huge databases. For scientists and researchers, realistic applications of Machine Learning in the EDM sectors offer new frontiers and present new problems. One of the most important research areas in EDM is predicting student success. The prediction algorithms and techniques must be developed to forecast students' performance, which aids the tutor, institution to boost the level of student’s performance. This paper examines various classification techniques in prediction methods and data mining tools used in EDM.

Keywords: classification technique, data mining, EDM methods, prediction methods

Procedia PDF Downloads 100
37599 “Octopub”: Geographical Sentiment Analysis Using Named Entity Recognition from Social Networks for Geo-Targeted Billboard Advertising

Authors: Oussama Hafferssas, Hiba Benyahia, Amina Madani, Nassima Zeriri

Abstract:

Although data nowadays has multiple forms; from text to images, and from audio to videos, yet text is still the most used one at a public level. At an academical and research level, and unlike other forms, text can be considered as the easiest form to process. Therefore, a brunch of Data Mining researches has been always under its shadow, called "Text Mining". Its concept is just like data mining’s, finding valuable patterns in data, from large collections and tremendous volumes of data, in this case: Text. Named entity recognition (NER) is one of Text Mining’s disciplines, it aims to extract and classify references such as proper names, locations, expressions of time and dates, organizations and more in a given text. Our approach "Octopub" does not aim to find new ways to improve named entity recognition process, rather than that it’s about finding a new, and yet smart way, to use NER in a way that we can extract sentiments of millions of people using Social Networks as a limitless information source, and Marketing for product promotion as the main domain of application.

Keywords: textmining, named entity recognition(NER), sentiment analysis, social media networks (SN, SMN), business intelligence(BI), marketing

Procedia PDF Downloads 560
37598 A Review of Different Studies on Hidden Markov Models for Multi-Temporal Satellite Images: Stationarity and Non-Stationarity Issues

Authors: Ali Ben Abbes, Imed Riadh Farah

Abstract:

Due to the considerable advances in Multi-Temporal Satellite Images (MTSI), remote sensing application became more accurate. Recently, many advances in modeling MTSI are developed using various models. The purpose of this article is to present an overview of studies using Hidden Markov Model (HMM). First of all, we provide a background of using HMM and their applications in this context. A comparison of the different works is discussed, and possible areas and challenges are highlighted. Secondly, we discussed the difference on vegetation monitoring as well as urban growth. Nevertheless, most research efforts have been used only stationary data. From another point of view, in this paper, we describe a new non-stationarity HMM, that is defined with a set of parts of the time series e.g. seasonal, trend and random. In addition, a new approach giving more accurate results and improve the applicability of the HMM in modeling a non-stationary data series. In order to assess the performance of the HMM, different experiments are carried out using Moderate Resolution Imaging Spectroradiometer (MODIS) NDVI time series of the northwestern region of Tunisia and Landsat time series of tres Cantos-Madrid in Spain.

Keywords: multi-temporal satellite image, HMM , nonstationarity, vegetation, urban

Procedia PDF Downloads 324
37597 Facial Pose Classification Using Hilbert Space Filling Curve and Multidimensional Scaling

Authors: Mekamı Hayet, Bounoua Nacer, Benabderrahmane Sidahmed, Taleb Ahmed

Abstract:

Pose estimation is an important task in computer vision. Though the majority of the existing solutions provide good accuracy results, they are often overly complex and computationally expensive. In this perspective, we propose the use of dimensionality reduction techniques to address the problem of facial pose estimation. Firstly, a face image is converted into one-dimensional time series using Hilbert space filling curve, then the approach converts these time series data to a symbolic representation. Furthermore, a distance matrix is calculated between symbolic series of an input learning dataset of images, to generate classifiers of frontal vs. profile face pose. The proposed method is evaluated with three public datasets. Experimental results have shown that our approach is able to achieve a correct classification rate exceeding 97% with K-NN algorithm.

Keywords: machine learning, pattern recognition, facial pose classification, time series

Procedia PDF Downloads 324
37596 Analysis of Reliability of Mining Shovel Using Weibull Model

Authors: Anurag Savarnya

Abstract:

The reliability of the various parts of electric mining shovel has been assessed through the application of Weibull Model. The study was initiated to find reliability of components of electric mining shovel. The paper aims to optimize the reliability of components and increase the life cycle of component. A multilevel decomposition of the electric mining shovel was done and maintenance records were used to evaluate the failure data and appropriate system characterization was done to model the system in terms of reasonable number of components. The approach used develops a mathematical model to assess the reliability of the electric mining shovel components. The model can be used to predict reliability of components of the hydraulic mining shovel and system performance. Reliability is an inherent attribute to a system. When the life-cycle costs of a system are being analyzed, reliability plays an important role as a major driver of these costs and has considerable influence on system performance. It is an iterative process that begins with specification of reliability goals consistent with cost and performance objectives. The data were collected from an Indian open cast coal mine and the reliability of various components of the electric mining shovel has been assessed by following a Weibull Model.

Keywords: reliability, Weibull model, electric mining shovel

Procedia PDF Downloads 474
37595 Timing and Noise Data Mining Algorithm and Software Tool in Very Large Scale Integration (VLSI) Design

Authors: Qing K. Zhu

Abstract:

Very Large Scale Integration (VLSI) design becomes very complex due to the continuous integration of millions of gates in one chip based on Moore’s law. Designers have encountered numerous report files during design iterations using timing and noise analysis tools. This paper presented our work using data mining techniques combined with HTML tables to extract and represent critical timing/noise data. When we apply this data-mining tool in real applications, the running speed is important. The software employs table look-up techniques in the programming for the reasonable running speed based on performance testing results. We added several advanced features for the application in one industry chip design.

Keywords: VLSI design, data mining, big data, HTML forms, web, VLSI, EDA, timing, noise

Procedia PDF Downloads 229
37594 Static vs. Stream Mining Trajectories Similarity Measures

Authors: Musaab Riyadh, Norwati Mustapha, Dina Riyadh

Abstract:

Trajectory similarity can be defined as the cost of transforming one trajectory into another based on certain similarity method. It is the core of numerous mining tasks such as clustering, classification, and indexing. Various approaches have been suggested to measure similarity based on the geometric and dynamic properties of trajectory, the overlapping between trajectory segments, and the confined area between entire trajectories. In this article, an evaluation of these approaches has been done based on computational cost, usage memory, accuracy, and the amount of data which is needed in advance to determine its suitability to stream mining applications. The evaluation results show that the stream mining applications support similarity methods which have low computational cost and memory, single scan on data, and free of mathematical complexity due to the high-speed generation of data.

Keywords: global distance measure, local distance measure, semantic trajectory, spatial dimension, stream data mining

Procedia PDF Downloads 373
37593 Forecasting Unusual Infection of Patient Used by Irregular Weighted Point Set

Authors: Seema Vaidya

Abstract:

Mining association rule is a key issue in data mining. In any case, the standard models ignore the distinction among the exchanges, and the weighted association rule mining does not transform on databases with just binary attributes. This paper proposes a novel continuous example and executes a tree (FP-tree) structure, which is an increased prefix-tree structure for securing compacted, discriminating data about examples, and makes a fit FP-tree-based mining system, FP enhanced capacity algorithm is used, for mining the complete game plan of examples by illustration incessant development. Here, this paper handles the motivation behind making remarkable and weighted item sets, i.e. rare weighted item set mining issue. The two novel brightness measures are proposed for figuring the infrequent weighted item set mining issue. Also, the algorithm are handled which perform IWI which is more insignificant IWI mining. Moreover we utilized the rare item set for choice based structure. The general issue of the start of reliable definite rules is troublesome for the grounds that hypothetically no inciting technique with no other person can promise the rightness of influenced theories. In this way, this framework expects the disorder with the uncommon signs. Usage study demonstrates that proposed algorithm upgrades the structure which is successful and versatile for mining both long and short diagnostics rules. Structure upgrades aftereffects of foreseeing rare diseases of patient.

Keywords: association rule, data mining, IWI mining, infrequent item set, frequent pattern growth

Procedia PDF Downloads 380
37592 Time Series Analysis of Radon Concentration at Different Depths in an Underground Goldmine

Authors: Theophilus Adjirackor, Frederic Sam, Irene Opoku-Ntim, David Okoh Kpeglo, Prince K. Gyekye, Frank K. Quashie, Kofi Ofori

Abstract:

Indoor radon concentrations were collected monthly over a period of one year in 10 different levels in an underground goldmine, and the data was analyzed using a four-moving average time series to determine the relationship between the depths of the underground mine and the indoor radon concentration. The detectors were installed in batches within four quarters. The measurements were carried out using LR115 solid-state nuclear track detectors. Statistical models are applied in the prediction and analysis of the radon concentration at various depths. The time series model predicted a positive relationship between the depth of the underground mine and the indoor radon concentration. Thus, elevated radon concentrations are expected at deeper levels of the underground mine, but the relationship was insignificant at the 5% level of significance with a negative adjusted R2 (R2 = – 0.021) due to an appropriate engineering and adequate ventilation rate in the underground mine.

Keywords: LR115, radon concentration, rime series, underground goldmine

Procedia PDF Downloads 9