Search results for: design exploration and data mining
11434 Analysis of DNA Microarray Data using Association Rules: A Selective Study
Authors: M. Anandhavalli Gauthaman
Abstract:
DNA microarrays allow the measurement of expression levels for a large number of genes, perhaps all genes of an organism, within a number of different experimental samples. It is very much important to extract biologically meaningful information from this huge amount of expression data to know the current state of the cell because most cellular processes are regulated by changes in gene expression. Association rule mining techniques are helpful to find association relationship between genes. Numerous association rule mining algorithms have been developed to analyze and associate this huge amount of gene expression data. This paper focuses on some of the popular association rule mining algorithms developed to analyze gene expression data.
Keywords: DNA microarray, gene expression, association rule mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 214511433 Using Data Clustering in Oral Medicine
Authors: Fahad Shahbaz Khan, Rao Muhammad Anwer, Olof Torgersson
Abstract:
The vast amount of information hidden in huge databases has created tremendous interests in the field of data mining. This paper examines the possibility of using data clustering techniques in oral medicine to identify functional relationships between different attributes and classification of similar patient examinations. Commonly used data clustering algorithms have been reviewed and as a result several interesting results have been gathered.Keywords: Oral Medicine, Cluto, Data Clustering, Data Mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 197711432 Explorative Data Mining of Constructivist Learning Experiences and Activities with Multiple Dimensions
Authors: Patrick Wessa, Bart Baesens
Abstract:
This paper discusses the use of explorative data mining tools that allow the educator to explore new relationships between reported learning experiences and actual activities, even if there are multiple dimensions with a large number of measured items. The underlying technology is based on the so-called Compendium Platform for Reproducible Computing (http://www.freestatistics.org) which was built on top the computational R Framework (http://www.wessa.net).Keywords: Reproducible computing, data mining, explorative data analysis, compendium technology, computer assisted education
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 125311431 Modified Data Mining Approach for Defective Diagnosis in Hard Disk Drive Industry
Authors: S. Soommat, S. Patamatamkul, T. Prempridi, M. Sritulyachot, P. Ineure, S. Yimman
Abstract:
Currently, slider process of Hard Disk Drive Industry become more complex, defective diagnosis for yield improvement becomes more complicated and time-consumed. Manufacturing data analysis with data mining approach is widely used for solving that problem. The existing mining approach from combining of the KMean clustering, the machine oriented Kruskal-Wallis test and the multivariate chart were applied for defective diagnosis but it is still be a semiautomatic diagnosis system. This article aims to modify an algorithm to support an automatic decision for the existing approach. Based on the research framework, the new approach can do an automatic diagnosis and help engineer to find out the defective factors faster than the existing approach about 50%.Keywords: Slider process, Defective diagnosis and Data mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 119911430 FCA-based Conceptual Knowledge Discovery in Folksonomy
Authors: Yu-Kyung Kang, Suk-Hyung Hwang, Kyoung-Mo Yang
Abstract:
The tagging data of (users, tags and resources) constitutes a folksonomy that is the user-driven and bottom-up approach to organizing and classifying information on the Web. Tagging data stored in the folksonomy include a lot of very useful information and knowledge. However, appropriate approach for analyzing tagging data and discovering hidden knowledge from them still remains one of the main problems on the folksonomy mining researches. In this paper, we have proposed a folksonomy data mining approach based on FCA for discovering hidden knowledge easily from folksonomy. Also we have demonstrated how our proposed approach can be applied in the collaborative tagging system through our experiment. Our proposed approach can be applied to some interesting areas such as social network analysis, semantic web mining and so on.
Keywords: Folksonomy data mining, formal concept analysis, collaborative tagging, conceptual knowledge discovery, classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 202811429 Auto Classification for Search Intelligence
Authors: Lilac A. E. Al-Safadi
Abstract:
This paper proposes an auto-classification algorithm of Web pages using Data mining techniques. We consider the problem of discovering association rules between terms in a set of Web pages belonging to a category in a search engine database, and present an auto-classification algorithm for solving this problem that are fundamentally based on Apriori algorithm. The proposed technique has two phases. The first phase is a training phase where human experts determines the categories of different Web pages, and the supervised Data mining algorithm will combine these categories with appropriate weighted index terms according to the highest supported rules among the most frequent words. The second phase is the categorization phase where a web crawler will crawl through the World Wide Web to build a database categorized according to the result of the data mining approach. This database contains URLs and their categories.Keywords: Information Processing on the Web, Data Mining, Document Classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 161911428 Generator of Hypotheses an Approach of Data Mining Based on Monotone Systems Theory
Authors: Rein Kuusik, Grete Lind
Abstract:
Generator of hypotheses is a new method for data mining. It makes possible to classify the source data automatically and produces a particular enumeration of patterns. Pattern is an expression (in a certain language) describing facts in a subset of facts. The goal is to describe the source data via patterns and/or IF...THEN rules. Used evaluation criteria are deterministic (not probabilistic). The search results are trees - form that is easy to comprehend and interpret. Generator of hypotheses uses very effective algorithm based on the theory of monotone systems (MS) named MONSA (MONotone System Algorithm).Keywords: data mining, monotone systems, pattern, rule.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 125611427 W3-Miner: Mining Weighted Frequent Subtree Patterns in a Collection of Trees
Authors: R. AliMohammadzadeh, M. Haghir Chehreghani, A. Zarnani, M. Rahgozar
Abstract:
Mining frequent tree patterns have many useful applications in XML mining, bioinformatics, network routing, etc. Most of the frequent subtree mining algorithms (i.e. FREQT, TreeMiner and CMTreeMiner) use anti-monotone property in the phase of candidate subtree generation. However, none of these algorithms have verified the correctness of this property in tree structured data. In this research it is shown that anti-monotonicity does not generally hold, when using weighed support in tree pattern discovery. As a result, tree mining algorithms that are based on this property would probably miss some of the valid frequent subtree patterns in a collection of trees. In this paper, we investigate the correctness of anti-monotone property for the problem of weighted frequent subtree mining. In addition we propose W3-Miner, a new algorithm for full extraction of frequent subtrees. The experimental results confirm that W3-Miner finds some frequent subtrees that the previously proposed algorithms are not able to discover.Keywords: Semi-Structured Data Mining, Anti-Monotone Property, Trees.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 138111426 An Application of the Data Mining Methods with Decision Rule
Authors: Xun Ge, Jianhua Gong
Abstract:
ankings for output of Chinese main agricultural commodity in the world for 1978, 1980, 1990, 2000, 2006, 2007 and 2008 have been released in United Nations FAO Database. Unfortunately, where the ranking of output of Chinese cotton lint in the world for 2008 was missed. This paper uses sequential data mining methods with decision rules filling this gap. This new data mining method will be help to give a further improvement for United Nations FAO Database.
Keywords: Ranking, output of the main agricultural commodity, gross domestic product, decision table, information system, data mining, decision rule
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 171011425 Web Log Mining by an Improved AprioriAll Algorithm
Authors: Wang Tong, He Pi-lian
Abstract:
This paper sets forth the possibility and importance about applying Data Mining in Web logs mining and shows some problems in the conventional searching engines. Then it offers an improved algorithm based on the original AprioriAll algorithm which has been used in Web logs mining widely. The new algorithm adds the property of the User ID during the every step of producing the candidate set and every step of scanning the database by which to decide whether an item in the candidate set should be put into the large set which will be used to produce next candidate set. At the meantime, in order to reduce the number of the database scanning, the new algorithm, by using the property of the Apriori algorithm, limits the size of the candidate set in time whenever it is produced. Test results show the improved algorithm has a more lower complexity of time and space, better restrain noise and fit the capacity of memory.
Keywords: Candidate Sets Pruning, Data Mining, ImprovedAlgorithm, Noise Restrain, Web Log
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 228011424 Risk-Management by Numerical Pattern Analysis in Data-Mining
Authors: M. Kargar, R. Mirmiran, F. Fartash, T. Saderi
Abstract:
In this paper a new method is suggested for risk management by the numerical patterns in data-mining. These patterns are designed using probability rules in decision trees and are cared to be valid, novel, useful and understandable. Considering a set of functions, the system reaches to a good pattern or better objectives. The patterns are analyzed through the produced matrices and some results are pointed out. By using the suggested method the direction of the functionality route in the systems can be controlled and best planning for special objectives be done.Keywords: Analysis, Data-mining, Pattern, Risk Management.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 127111423 Object-Centric Process Mining Using Process Cubes
Authors: Anahita Farhang Ghahfarokhi, Alessandro Berti, Wil M.P. van der Aalst
Abstract:
Process mining provides ways to analyze business processes. Common process mining techniques consider the process as a whole. However, in real-life business processes, different behaviors exist that make the overall process too complex to interpret. Process comparison is a branch of process mining that isolates different behaviors of the process from each other by using process cubes. Process cubes organize event data using different dimensions. Each cell contains a set of events that can be used as an input to apply process mining techniques. Existing work on process cubes assume single case notions. However, in real processes, several case notions (e.g., order, item, package, etc.) are intertwined. Object-centric process mining is a new branch of process mining addressing multiple case notions in a process. To make a bridge between object-centric process mining and process comparison, we propose a process cube framework, which supports process cube operations such as slice and dice on object-centric event logs. To facilitate the comparison, the framework is integrated with several object-centric process discovery approaches.Keywords: Process mining, multidimensional process mining, multi-perspective business processes, OLAP, process cubes, process discovery.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 111911422 An Improved k Nearest Neighbor Classifier Using Interestingness Measures for Medical Image Mining
Authors: J. Alamelu Mangai, Satej Wagle, V. Santhosh Kumar
Abstract:
The exponential increase in the volume of medical image database has imposed new challenges to clinical routine in maintaining patient history, diagnosis, treatment and monitoring. With the advent of data mining and machine learning techniques it is possible to automate and/or assist physicians in clinical diagnosis. In this research a medical image classification framework using data mining techniques is proposed. It involves feature extraction, feature selection, feature discretization and classification. In the classification phase, the performance of the traditional kNN k nearest neighbor classifier is improved using a feature weighting scheme and a distance weighted voting instead of simple majority voting. Feature weights are calculated using the interestingness measures used in association rule mining. Experiments on the retinal fundus images show that the proposed framework improves the classification accuracy of traditional kNN from 78.57 % to 92.85 %.
Keywords: Medical Image Mining, Data Mining, Feature Weighting, Association Rule Mining, k nearest neighbor classifier.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 330811421 Exploration and Exploitation within Operations
Authors: D. Gåsvaer, L. Stålberg, A. Fundin, M. Jackson, P. Johansson
Abstract:
Exploration and exploitation capabilities are both important within Operations as means for improvement when managed separately, and for establishing dynamic improvement capabilities when combined in balance. However, it is unclear what exploration and exploitation capabilities imply in improvement and development work within an Operations context. So, in order to better understand how to develop exploration and exploitation capabilities within Operations, the main characteristics of these constructs needs to be identified and further understood. Thus, the objective of this research is to increase the understanding about exploitation and exploration characteristics, to concretize what they translates to within the context of improvement and development work in an Operations unit, and to identify practical challenges. A literature review and a case study are presented. In the literature review, different interpretations of exploration and exploitation are portrayed, key characteristics have been identified, and a deepened understanding of exploration and exploitation characteristics is described. The case in the study is an Operations unit, and the aim is to explore to what extent and in what ways exploration and exploitation activities are part of the improvement structures and processes. The contribution includes an identification of key characteristics of exploitation and exploration, as well as an interpretation of the constructs. Further, some practical challenges are identified. For instance, exploration activities tend to be given low priority, both in daily work as in the manufacturing strategy. Also, the overall understanding about the concepts of exploitation and exploration (or any similar aspect of dynamic improvement capabilities) is very low.Keywords: Exploitation, Exploration, Improvement, Lean production, Manufacturing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 261511420 Signed Approach for Mining Web Content Outliers
Authors: G. Poonkuzhali, K.Thiagarajan, K.Sarukesi, G.V.Uma
Abstract:
The emergence of the Internet has brewed the revolution of information storage and retrieval. As most of the data in the web is unstructured, and contains a mix of text, video, audio etc, there is a need to mine information to cater to the specific needs of the users without loss of important hidden information. Thus developing user friendly and automated tools for providing relevant information quickly becomes a major challenge in web mining research. Most of the existing web mining algorithms have concentrated on finding frequent patterns while neglecting the less frequent ones that are likely to contain outlying data such as noise, irrelevant and redundant data. This paper mainly focuses on Signed approach and full word matching on the organized domain dictionary for mining web content outliers. This Signed approach gives the relevant web documents as well as outlying web documents. As the dictionary is organized based on the number of characters in a word, searching and retrieval of documents takes less time and less space.Keywords: Outliers, Relevant document, , Signed Approach, Web content mining, Web documents..
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 234911419 An Efficient Approach to Mining Frequent Itemsets on Data Streams
Authors: Sara Ansari, Mohammad Hadi Sadreddini
Abstract:
The increasing importance of data stream arising in a wide range of advanced applications has led to the extensive study of mining frequent patterns. Mining data streams poses many new challenges amongst which are the one-scan nature, the unbounded memory requirement and the high arrival rate of data streams. In this paper, we propose a new approach for mining itemsets on data stream. Our approach SFIDS has been developed based on FIDS algorithm. The main attempts were to keep some advantages of the previous approach and resolve some of its drawbacks, and consequently to improve run time and memory consumption. Our approach has the following advantages: using a data structure similar to lattice for keeping frequent itemsets, separating regions from each other with deleting common nodes that results in a decrease in search space, memory consumption and run time; and Finally, considering CPU constraint, with increasing arrival rate of data that result in overloading system, SFIDS automatically detect this situation and discard some of unprocessing data. We guarantee that error of results is bounded to user pre-specified threshold, based on a probability technique. Final results show that SFIDS algorithm could attain about 50% run time improvement than FIDS approach.Keywords: Data stream, frequent itemset, stream mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 141911418 Efficient Web Usage Mining Based on K-Medoids Clustering Technique
Authors: P. Sengottuvelan, T. Gopalakrishnan
Abstract:
Web Usage Mining is the application of data mining techniques to find usage patterns from web log data, so as to grasp required patterns and serve the requirements of Web-based applications. User’s expertise on the internet may be improved by minimizing user’s web access latency. This may be done by predicting the future search page earlier and the same may be prefetched and cached. Therefore, to enhance the standard of web services, it is needed topic to research the user web navigation behavior. Analysis of user’s web navigation behavior is achieved through modeling web navigation history. We propose this technique which cluster’s the user sessions, based on the K-medoids technique.Keywords: Clustering, K-medoids, Recommendation, User Session, Web Usage Mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 139611417 Data Mining Approach for Commercial Data Classification and Migration in Hybrid Storage Systems
Authors: Mais Haj Qasem, Maen M. Al Assaf, Ali Rodan
Abstract:
Parallel hybrid storage systems consist of a hierarchy of different storage devices that vary in terms of data reading speed performance. As we ascend in the hierarchy, data reading speed becomes faster. Thus, migrating the application’ important data that will be accessed in the near future to the uppermost level will reduce the application I/O waiting time; hence, reducing its execution elapsed time. In this research, we implement trace-driven two-levels parallel hybrid storage system prototype that consists of HDDs and SSDs. The prototype uses data mining techniques to classify application’ data in order to determine its near future data accesses in parallel with the its on-demand request. The important data (i.e. the data that the application will access in the near future) are continuously migrated to the uppermost level of the hierarchy. Our simulation results show that our data migration approach integrated with data mining techniques reduces the application execution elapsed time when using variety of traces in at least to 22%.Keywords: Data mining, hybrid storage system, recurrent neural network, support vector machine.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 173611416 Large-Dimensional Shells under Mining Tremors from Various Mining Regions in Poland
Authors: Joanna M. Dulińska, Maria Fabijańska
Abstract:
In the paper a detailed analysis of the dynamic response of a cooling tower shell to mining tremors originated from two main regions of mining activity in Poland (Upper Silesian Coal Basin and Legnica-Glogow Copper District) was presented. The representative time histories registered in the both regions were used as ground motion data in calculations of the dynamic response of the structure. It was proved that the dynamic response of the shell is strongly dependent not only on the level of vibration amplitudes but on the dominant frequency range of the mining shock typical for the mining region as well. Also a vertical component of vibrations occurred to have considerable influence on the total dynamic response of the shell. Finally, it turned out that non-uniformity of kinematic excitation resulting from spatial variety of ground motion plays a significant role in dynamic analysis of large-dimensional shells under mining shocks.Keywords: Cooling towers, dynamic response, mining tremors, non-uniform kinematic excitation
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 142111415 K-Means for Spherical Clusters with Large Variance in Sizes
Authors: A. M. Fahim, G. Saake, A. M. Salem, F. A. Torkey, M. A. Ramadan
Abstract:
Data clustering is an important data exploration technique with many applications in data mining. The k-means algorithm is well known for its efficiency in clustering large data sets. However, this algorithm is suitable for spherical shaped clusters of similar sizes and densities. The quality of the resulting clusters decreases when the data set contains spherical shaped with large variance in sizes. In this paper, we introduce a competent procedure to overcome this problem. The proposed method is based on shifting the center of the large cluster toward the small cluster, and recomputing the membership of small cluster points, the experimental results reveal that the proposed algorithm produces satisfactory results.Keywords: K-Means, Data Clustering, Cluster Analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 328111414 Spatial Data Mining by Decision Trees
Authors: S. Oujdi, H. Belbachir
Abstract:
Existing methods of data mining cannot be applied on spatial data because they require spatial specificity consideration, as spatial relationships. This paper focuses on the classification with decision trees, which are one of the data mining techniques. We propose an extension of the C4.5 algorithm for spatial data, based on two different approaches Join materialization and Querying on the fly the different tables. Similar works have been done on these two main approaches, the first - Join materialization - favors the processing time in spite of memory space, whereas the second - Querying on the fly different tables- promotes memory space despite of the processing time. The modified C4.5 algorithm requires three entries tables: a target table, a neighbor table, and a spatial index join that contains the possible spatial relationship among the objects in the target table and those in the neighbor table. Thus, the proposed algorithms are applied to a spatial data pattern in the accidentology domain. A comparative study of our approach with other works of classification by spatial decision trees will be detailed.
Keywords: C4.5 Algorithm, Decision trees, S-CART, Spatial data mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 298611413 Mining Multicity Urban Data for Sustainable Population Relocation
Authors: Xu Du, Aparna S. Varde
Abstract:
In this research, we propose to conduct diagnostic and predictive analysis about the key factors and consequences of urban population relocation. To achieve this goal, urban simulation models extract the urban development trends as land use change patterns from a variety of data sources. The results are treated as part of urban big data with other information such as population change and economic conditions. Multiple data mining methods are deployed on this data to analyze nonlinear relationships between parameters. The result determines the driving force of population relocation with respect to urban sprawl and urban sustainability and their related parameters. This work sets the stage for developing a comprehensive urban simulation model for catering to specific questions by targeted users. It contributes towards achieving sustainability as a whole.Keywords: Data Mining, Environmental Modeling, Sustainability, Urban Planning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 178311412 A Location Routing Model for the Logistic System in the Mining Collection Centers of the Northern Region of Boyacá-Colombia
Authors: Erika Ruíz, Luis Amaya, Diego Carreño
Abstract:
The main objective of this study is to design a mathematical model for the logistics of mining collection centers in the northern region of the department of Boyacá (Colombia), determining the structure that facilitates the flow of products along the supply chain. In order to achieve this, it is necessary to define a suitable design of the distribution network, taking into account the products, customer’s characteristics and the availability of information. Likewise, some other aspects must be defined, such as number and capacity of collection centers to establish, routes that must be taken to deliver products to the customers, among others. This research will use one of the operation research problems, which is used in the design of distribution networks known as Location Routing Problem (LRP).
Keywords: Location routing problem, logistic, mining collection, model.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 78911411 Parallel and Distributed Mining of Association Rule on Knowledge Grid
Authors: U. Sakthi, R. Hemalatha, R. S. Bhuvaneswaran
Abstract:
In Virtual organization, Knowledge Discovery (KD) service contains distributed data resources and computing grid nodes. Computational grid is integrated with data grid to form Knowledge Grid, which implements Apriori algorithm for mining association rule on grid network. This paper describes development of parallel and distributed version of Apriori algorithm on Globus Toolkit using Message Passing Interface extended with Grid Services (MPICHG2). The creation of Knowledge Grid on top of data and computational grid is to support decision making in real time applications. In this paper, the case study describes design and implementation of local and global mining of frequent item sets. The experiments were conducted on different configurations of grid network and computation time was recorded for each operation. We analyzed our result with various grid configurations and it shows speedup of computation time is almost superlinear.Keywords: Association rule, Grid computing, Knowledge grid, Mobility prediction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 218211410 Building an Integrated Relational Database from Swiss Nutrition National Survey and Swiss Health Datasets for Data Mining Purposes
Authors: Ilona Mewes, Helena Jenzer, Farshideh Einsele
Abstract:
Objective: The objective of the study was to integrate two big databases from Swiss nutrition national survey (menuCH) and Swiss health national survey 2012 for data mining purposes. Each database has a demographic base data. An integrated Swiss database is built to later discover critical food consumption patterns linked with lifestyle diseases known to be strongly tied with food consumption. Design: Swiss nutrition national survey (menuCH) with approx. 2000 respondents from two different surveys, one by Phone and the other by questionnaire along with Swiss health national survey 2012 with 21500 respondents were pre-processed, cleaned and finally integrated to a unique relational database. Results: The result of this study is an integrated relational database from the Swiss nutritional and health databases.
Keywords: Health informatics, data mining, nutritional and health databases, nutritional and chronical databases.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 172511409 Spatio-Temporal Data Mining with Association Rules for Lake Van
Authors: T. Aydin, M. F. Alaeddinoglu
Abstract:
People, throughout the history, have made estimates and inferences about the future by using their past experiences. Developing information technologies and the improvements in the database management systems make it possible to extract useful information from knowledge in hand for the strategic decisions. Therefore, different methods have been developed. Data mining by association rules learning is one of such methods. Apriori algorithm, one of the well-known association rules learning algorithms, is not commonly used in spatio-temporal data sets. However, it is possible to embed time and space features into the data sets and make Apriori algorithm a suitable data mining technique for learning spatiotemporal association rules. Lake Van, the largest lake of Turkey, is a closed basin. This feature causes the volume of the lake to increase or decrease as a result of change in water amount it holds. In this study, evaporation, humidity, lake altitude, amount of rainfall and temperature parameters recorded in Lake Van region throughout the years are used by the Apriori algorithm and a spatio-temporal data mining application is developed to identify overflows and newlyformed soil regions (underflows) occurring in the coastal parts of Lake Van. Identifying possible reasons of overflows and underflows may be used to alert the experts to take precautions and make the necessary investments.Keywords: Apriori algorithm, association rules, data mining, spatio-temporal data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 140511408 Visual-Graphical Methods for Exploring Longitudinal Data
Authors: H. W. Ker
Abstract:
Longitudinal data typically have the characteristics of changes over time, nonlinear growth patterns, between-subjects variability, and the within errors exhibiting heteroscedasticity and dependence. The data exploration is more complicated than that of cross-sectional data. The purpose of this paper is to organize/integrate of various visual-graphical techniques to explore longitudinal data. From the application of the proposed methods, investigators can answer the research questions include characterizing or describing the growth patterns at both group and individual level, identifying the time points where important changes occur and unusual subjects, selecting suitable statistical models, and suggesting possible within-error variance.Keywords: Data exploration, exploratory analysis, HLMs/LMEs, longitudinal data, visual-graphical methods.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 209511407 A Modified Fuzzy C-Means Algorithm for Natural Data Exploration
Authors: Binu Thomas, Raju G., Sonam Wangmo
Abstract:
In Data mining, Fuzzy clustering algorithms have demonstrated advantage over crisp clustering algorithms in dealing with the challenges posed by large collections of vague and uncertain natural data. This paper reviews concept of fuzzy logic and fuzzy clustering. The classical fuzzy c-means algorithm is presented and its limitations are highlighted. Based on the study of the fuzzy c-means algorithm and its extensions, we propose a modification to the cmeans algorithm to overcome the limitations of it in calculating the new cluster centers and in finding the membership values with natural data. The efficiency of the new modified method is demonstrated on real data collected for Bhutan-s Gross National Happiness (GNH) program.Keywords: Adaptive fuzzy clustering, clustering, fuzzy logic, fuzzy clustering, c-means.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 199011406 Probabilistic Approach as a Method Used in the Solution of Engineering Design for Biomechanics and Mining
Authors: Karel Frydrýšek
Abstract:
This paper focuses on the probabilistic numerical solution of the problems in biomechanics and mining. Applications of Simulation-Based Reliability Assessment (SBRA) Method are presented in the solution of designing of the external fixators applied in traumatology and orthopaedics (these fixators can be applied for the treatment of open and unstable fractures etc.) and in the solution of a hard rock (ore) disintegration process (i.e. the bit moves into the ore and subsequently disintegrates it, the results are compared with experiments, new design of excavation tool is proposed.Keywords: probabilistic approach, engineering design, traumatology, rock mechanics
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 147911405 Hybrid Reliability-Similarity-Based Approach for Supervised Machine Learning
Authors: Walid Cherif
Abstract:
Data mining has, over recent years, seen big advances because of the spread of internet, which generates everyday a tremendous volume of data, and also the immense advances in technologies which facilitate the analysis of these data. In particular, classification techniques are a subdomain of Data Mining which determines in which group each data instance is related within a given dataset. It is used to classify data into different classes according to desired criteria. Generally, a classification technique is either statistical or machine learning. Each type of these techniques has its own limits. Nowadays, current data are becoming increasingly heterogeneous; consequently, current classification techniques are encountering many difficulties. This paper defines new measure functions to quantify the resemblance between instances and then combines them in a new approach which is different from actual algorithms by its reliability computations. Results of the proposed approach exceeded most common classification techniques with an f-measure exceeding 97% on the IRIS Dataset.
Keywords: Data mining, knowledge discovery, machine learning, similarity measurement, supervised classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1527