Search results for: WEKA data mining tool
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 27636

Search results for: WEKA data mining tool

27606 Application of Artificial Neural Network Technique for Diagnosing Asthma

Authors: Azadeh Bashiri

Abstract:

Introduction: Lack of proper diagnosis and inadequate treatment of asthma leads to physical and financial complications. This study aimed to use data mining techniques and creating a neural network intelligent system for diagnosis of asthma. Methods: The study population is the patients who had visited one of the Lung Clinics in Tehran. Data were analyzed using the SPSS statistical tool and the chi-square Pearson's coefficient was the basis of decision making for data ranking. The considered neural network is trained using back propagation learning technique. Results: According to the analysis performed by means of SPSS to select the top factors, 13 effective factors were selected, in different performances, data was mixed in various forms, so the different models were made for training the data and testing networks and in all different modes, the network was able to predict correctly 100% of all cases. Conclusion: Using data mining methods before the design structure of system, aimed to reduce the data dimension and the optimum choice of the data, will lead to a more accurate system. Therefore, considering the data mining approaches due to the nature of medical data is necessary.

Keywords: asthma, data mining, Artificial Neural Network, intelligent system

Procedia PDF Downloads 246
27605 Neural Network Models for Actual Cost and Actual Duration Estimation in Construction Projects: Findings from Greece

Authors: Panagiotis Karadimos, Leonidas Anthopoulos

Abstract:

Predicting the actual cost and duration in construction projects concern a continuous and existing problem for the construction sector. This paper addresses this problem with modern methods and data available from past public construction projects. 39 bridge projects, constructed in Greece, with a similar type of available data were examined. Considering each project’s attributes with the actual cost and the actual duration, correlation analysis is performed and the most appropriate predictive project variables are defined. Additionally, the most efficient subgroup of variables is selected with the use of the WEKA application, through its attribute selection function. The selected variables are used as input neurons for neural network models through correlation analysis. For constructing neural network models, the application FANN Tool is used. The optimum neural network model, for predicting the actual cost, produced a mean squared error with a value of 3.84886e-05 and it was based on the budgeted cost and the quantity of deck concrete. The optimum neural network model, for predicting the actual duration, produced a mean squared error with a value of 5.89463e-05 and it also was based on the budgeted cost and the amount of deck concrete.

Keywords: actual cost and duration, attribute selection, bridge construction, neural networks, predicting models, FANN TOOL, WEKA

Procedia PDF Downloads 113
27604 Comparative Study Using WEKA for Red Blood Cells Classification

Authors: Jameela Ali, Hamid A. Jalab, Loay E. George, Abdul Rahim Ahmad, Azizah Suliman, Karim Al-Jashamy

Abstract:

Red blood cells (RBC) are the most common types of blood cells and are the most intensively studied in cell biology. The lack of RBCs is a condition in which the amount of hemoglobin level is lower than normal and is referred to as “anemia”. Abnormalities in RBCs will affect the exchange of oxygen. This paper presents a comparative study for various techniques for classifying the RBCs as normal, or abnormal (anemic) using WEKA. WEKA is an open source consists of different machine learning algorithms for data mining applications. The algorithm tested are Radial Basis Function neural network, Support vector machine, and K-Nearest Neighbors algorithm. Two sets of combined features were utilized for classification of blood cells images. The first set, exclusively consist of geometrical features, was used to identify whether the tested blood cell has a spherical shape or non-spherical cells. While the second set, consist mainly of textural features was used to recognize the types of the spherical cells. We have provided an evaluation based on applying these classification methods to our RBCs image dataset which were obtained from Serdang Hospital-alaysia, and measuring the accuracy of test results. The best achieved classification rates are 97%, 98%, and 79% for Support vector machines, Radial Basis Function neural network, and K-Nearest Neighbors algorithm respectively.

Keywords: K-nearest neighbors algorithm, radial basis function neural network, red blood cells, support vector machine

Procedia PDF Downloads 382
27603 A Comparative Study for Various Techniques Using WEKA for Red Blood Cells Classification

Authors: Jameela Ali, Hamid A. Jalab, Loay E. George, Abdul Rahim Ahmad, Azizah Suliman, Karim Al-Jashamy

Abstract:

Red blood cells (RBC) are the most common types of blood cells and are the most intensively studied in cell biology. The lack of RBCs is a condition in which the amount of hemoglobin level is lower than normal and is referred to as “anemia”. Abnormalities in RBCs will affect the exchange of oxygen. This paper presents a comparative study for various techniques for classifyig the red blood cells as normal, or abnormal (anemic) using WEKA. WEKA is an open source consists of different machine learning algorithms for data mining applications. The algorithm tested are Radial Basis Function neural network, Support vector machine, and K-Nearest Neighbors algorithm. Two sets of combined features were utilized for classification of blood cells images. The first set, exclusively consist of geometrical features, was used to identify whether the tested blood cell has a spherical shape or non-spherical cells. While the second set, consist mainly of textural features was used to recognize the types of the spherical cells. We have provided an evaluation based on applying these classification methods to our RBCs image dataset which were obtained from Serdang Hospital-Malaysia, and measuring the accuracy of test results. The best achieved classification rates are 97%, 98%, and 79% for Support vector machines, Radial Basis Function neural network, and K-Nearest Neighbors algorithm respectively

Keywords: red blood cells, classification, radial basis function neural networks, suport vector machine, k-nearest neighbors algorithm

Procedia PDF Downloads 449
27602 A Modular Framework for Enabling Analysis for Educators with Different Levels of Data Mining Skills

Authors: Kyle De Freitas, Margaret Bernard

Abstract:

Enabling data mining analysis among a wider audience of educators is an active area of research within the educational data mining (EDM) community. The paper proposes a framework for developing an environment that caters for educators who have little technical data mining skills as well as for more advanced users with some data mining expertise. This framework architecture was developed through the review of the strengths and weaknesses of existing models in the literature. The proposed framework provides a modular architecture for future researchers to focus on the development of specific areas within the EDM process. Finally, the paper also highlights a strategy of enabling analysis through either the use of predefined questions or a guided data mining process and highlights how the developed questions and analysis conducted can be reused and extended over time.

Keywords: educational data mining, learning management system, learning analytics, EDM framework

Procedia PDF Downloads 298
27601 Association Rules Mining Task Using Metaheuristics: Review

Authors: Abir Derouiche, Abdesslem Layeb

Abstract:

Association Rule Mining (ARM) is one of the most popular data mining tasks and it is widely used in various areas. The search for association rules is an NP-complete problem that is why metaheuristics have been widely used to solve it. The present paper presents the ARM as an optimization problem and surveys the proposed approaches in the literature based on metaheuristics.

Keywords: Optimization, Metaheuristics, Data Mining, Association rules Mining

Procedia PDF Downloads 135
27600 Association Rules Mining and NOSQL Oriented Document in Big Data

Authors: Sarra Senhadji, Imene Benzeguimi, Zohra Yagoub

Abstract:

Big Data represents the recent technology of manipulating voluminous and unstructured data sets over multiple sources. Therefore, NOSQL appears to handle the problem of unstructured data. Association rules mining is one of the popular techniques of data mining to extract hidden relationship from transactional databases. The algorithm for finding association dependencies is well-solved with Map Reduce. The goal of our work is to reduce the time of generating of frequent itemsets by using Map Reduce and NOSQL database oriented document. A comparative study is given to evaluate the performances of our algorithm with the classical algorithm Apriori.

Keywords: Apriori, Association rules mining, Big Data, Data Mining, Hadoop, MapReduce, MongoDB, NoSQL

Procedia PDF Downloads 138
27599 Business Intelligence for Profiling of Telecommunication Customer

Authors: Rokhmatul Insani, Hira Laksmiwati Soemitro

Abstract:

Business Intelligence is a methodology that exploits the data to produce information and knowledge systematically, business intelligence can support the decision-making process. Some methods in business intelligence are data warehouse and data mining. A data warehouse can store historical data from transactional data. For data modelling in data warehouse, we apply dimensional modelling by Kimball. While data mining is used to extracting patterns from the data and get insight from the data. Data mining has many techniques, one of which is segmentation. For profiling of telecommunication customer, we use customer segmentation according to customer’s usage of services, customer invoice and customer payment. Customers can be grouped according to their characteristics and can be identified the profitable customers. We apply K-Means Clustering Algorithm for segmentation. The input variable for that algorithm we use RFM (Recency, Frequency and Monetary) model. All process in data mining, we use tools IBM SPSS modeller.

Keywords: business intelligence, customer segmentation, data warehouse, data mining

Procedia PDF Downloads 450
27598 Identify Users Behavior from Mobile Web Access Logs Using Automated Log Analyzer

Authors: Bharat P. Modi, Jayesh M. Patel

Abstract:

Mobile Internet is acting as a major source of data. As the number of web pages continues to grow the Mobile web provides the data miners with just the right ingredients for extracting information. In order to cater to this growing need, a special term called Mobile Web mining was coined. Mobile Web mining makes use of data mining techniques and deciphers potentially useful information from web data. Web Usage mining deals with understanding the behavior of users by making use of Mobile Web Access Logs that are generated on the server while the user is accessing the website. A Web access log comprises of various entries like the name of the user, his IP address, a number of bytes transferred time-stamp etc. A variety of Log Analyzer tools exists which help in analyzing various things like users navigational pattern, the part of the website the users are mostly interested in etc. The present paper makes use of such log analyzer tool called Mobile Web Log Expert for ascertaining the behavior of users who access an astrology website. It also provides a comparative study between a few log analyzer tools available.

Keywords: mobile web access logs, web usage mining, web server, log analyzer

Procedia PDF Downloads 340
27597 A Review on Existing Challenges of Data Mining and Future Research Perspectives

Authors: Hema Bhardwaj, D. Srinivasa Rao

Abstract:

Technology for analysing, processing, and extracting meaningful data from enormous and complicated datasets can be termed as "big data." The technique of big data mining and big data analysis is extremely helpful for business movements such as making decisions, building organisational plans, researching the market efficiently, improving sales, etc., because typical management tools cannot handle such complicated datasets. Special computational and statistical issues, such as measurement errors, noise accumulation, spurious correlation, and storage and scalability limitations, are brought on by big data. These unique problems call for new computational and statistical paradigms. This research paper offers an overview of the literature on big data mining, its process, along with problems and difficulties, with a focus on the unique characteristics of big data. Organizations have several difficulties when undertaking data mining, which has an impact on their decision-making. Every day, terabytes of data are produced, yet only around 1% of that data is really analyzed. The idea of the mining and analysis of data and knowledge discovery techniques that have recently been created with practical application systems is presented in this study. This article's conclusion also includes a list of issues and difficulties for further research in the area. The report discusses the management's main big data and data mining challenges.

Keywords: big data, data mining, data analysis, knowledge discovery techniques, data mining challenges

Procedia PDF Downloads 85
27596 Indexing and Incremental Approach Using Map Reduce Bipartite Graph (MRBG) for Mining Evolving Big Data

Authors: Adarsh Shroff

Abstract:

Big data is a collection of dataset so large and complex that it becomes difficult to process using data base management tools. To perform operations like search, analysis, visualization on big data by using data mining; which is the process of extraction of patterns or knowledge from large data set. In recent years, the data mining applications become stale and obsolete over time. Incremental processing is a promising approach to refreshing mining results. It utilizes previously saved states to avoid the expense of re-computation from scratch. This project uses i2MapReduce, an incremental processing extension to Map Reduce, the most widely used framework for mining big data. I2MapReduce performs key-value pair level incremental processing rather than task level re-computation, supports not only one-step computation but also more sophisticated iterative computation, which is widely used in data mining applications, and incorporates a set of novel techniques to reduce I/O overhead for accessing preserved fine-grain computation states. To optimize the mining results, evaluate i2MapReduce using a one-step algorithm and three iterative algorithms with diverse computation characteristics for efficient mining.

Keywords: big data, map reduce, incremental processing, iterative computation

Procedia PDF Downloads 321
27595 Review of Different Machine Learning Algorithms

Authors: Syed Romat Ali Shah, Bilal Shoaib, Saleem Akhtar, Munib Ahmad, Shahan Sadiqui

Abstract:

Classification is a data mining technique, which is recognizedon Machine Learning (ML) algorithm. It is used to classifythe individual articlein a knownofinformation into a set of predefinemodules or group. Web mining is also a portion of that sympathetic of data mining methods. The main purpose of this paper to analysis and compare the performance of Naïve Bayse Algorithm, Decision Tree, K-Nearest Neighbor (KNN), Artificial Neural Network (ANN)and Support Vector Machine (SVM). This paper consists of different ML algorithm and their advantages and disadvantages and also define research issues.

Keywords: Data Mining, Web Mining, classification, ML Algorithms

Procedia PDF Downloads 262
27594 Review and Comparison of Associative Classification Data Mining Approaches

Authors: Suzan Wedyan

Abstract:

Data mining is one of the main phases in the Knowledge Discovery Database (KDD) which is responsible of finding hidden and useful knowledge from databases. There are many different tasks for data mining including regression, pattern recognition, clustering, classification, and association rule. In recent years a promising data mining approach called associative classification (AC) has been proposed, AC integrates classification and association rule discovery to build classification models (classifiers). This paper surveys and critically compares several AC algorithms with reference of the different procedures are used in each algorithm, such as rule learning, rule sorting, rule pruning, classifier building, and class allocation for test cases.

Keywords: associative classification, classification, data mining, learning, rule ranking, rule pruning, prediction

Procedia PDF Downloads 509
27593 Data Mining Techniques for Anti-Money Laundering

Authors: M. Sai Veerendra

Abstract:

Today, money laundering (ML) poses a serious threat not only to financial institutions but also to the nation. This criminal activity is becoming more and more sophisticated and seems to have moved from the cliché of drug trafficking to financing terrorism and surely not forgetting personal gain. Most of the financial institutions internationally have been implementing anti-money laundering solutions (AML) to fight investment fraud activities. However, traditional investigative techniques consume numerous man-hours. Recently, data mining approaches have been developed and are considered as well-suited techniques for detecting ML activities. Within the scope of a collaboration project on developing a new data mining solution for AML Units in an international investment bank in Ireland, we survey recent data mining approaches for AML. In this paper, we present not only these approaches but also give an overview on the important factors in building data mining solutions for AML activities.

Keywords: data mining, clustering, money laundering, anti-money laundering solutions

Procedia PDF Downloads 514
27592 The Application of Data Mining Technology in Building Energy Consumption Data Analysis

Authors: Liang Zhao, Jili Zhang, Chongquan Zhong

Abstract:

Energy consumption data, in particular those involving public buildings, are impacted by many factors: the building structure, climate/environmental parameters, construction, system operating condition, and user behavior patterns. Traditional methods for data analysis are insufficient. This paper delves into the data mining technology to determine its application in the analysis of building energy consumption data including energy consumption prediction, fault diagnosis, and optimal operation. Recent literature are reviewed and summarized, the problems faced by data mining technology in the area of energy consumption data analysis are enumerated, and research points for future studies are given.

Keywords: data mining, data analysis, prediction, optimization, building operational performance

Procedia PDF Downloads 825
27591 Frequent Itemset Mining Using Rough-Sets

Authors: Usman Qamar, Younus Javed

Abstract:

Frequent pattern mining is the process of finding a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set. It was proposed in the context of frequent itemsets and association rule mining. Frequent pattern mining is used to find inherent regularities in data. What products were often purchased together? Its applications include basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis. However, one of the bottlenecks of frequent itemset mining is that as the data increase the amount of time and resources required to mining the data increases at an exponential rate. In this investigation a new algorithm is proposed which can be uses as a pre-processor for frequent itemset mining. FASTER (FeAture SelecTion using Entropy and Rough sets) is a hybrid pre-processor algorithm which utilizes entropy and rough-sets to carry out record reduction and feature (attribute) selection respectively. FASTER for frequent itemset mining can produce a speed up of 3.1 times when compared to original algorithm while maintaining an accuracy of 71%.

Keywords: rough-sets, classification, feature selection, entropy, outliers, frequent itemset mining

Procedia PDF Downloads 410
27590 Harmonic Data Preparation for Clustering and Classification

Authors: Ali Asheibi

Abstract:

The rapid increase in the size of databases required to store power quality monitoring data has demanded new techniques for analysing and understanding the data. One suggested technique to assist in analysis is data mining. Preparing raw data to be ready for data mining exploration take up most of the effort and time spent in the whole data mining process. Clustering is an important technique in data mining and machine learning in which underlying and meaningful groups of data are discovered. Large amounts of harmonic data have been collected from an actual harmonic monitoring system in a distribution system in Australia for three years. This amount of acquired data makes it difficult to identify operational events that significantly impact the harmonics generated on the system. In this paper, harmonic data preparation processes to better understanding of the data have been presented. Underlying classes in this data has then been identified using clustering technique based on the Minimum Message Length (MML) method. The underlying operational information contained within the clusters can be rapidly visualised by the engineers. The C5.0 algorithm was used for classification and interpretation of the generated clusters.

Keywords: data mining, harmonic data, clustering, classification

Procedia PDF Downloads 222
27589 Knowledge Discovery and Data Mining Techniques in Textile Industry

Authors: Filiz Ersoz, Taner Ersoz, Erkin Guler

Abstract:

This paper addresses the issues and technique for textile industry using data mining techniques. Data mining has been applied to the stitching of garments products that were obtained from a textile company. Data mining techniques were applied to the data obtained from the CHAID algorithm, CART algorithm, Regression Analysis and, Artificial Neural Networks. Classification technique based analyses were used while data mining and decision model about the production per person and variables affecting about production were found by this method. In the study, the results show that as the daily working time increases, the production per person also decreases. In addition, the relationship between total daily working and production per person shows a negative result and the production per person show the highest and negative relationship.

Keywords: data mining, textile production, decision trees, classification

Procedia PDF Downloads 327
27588 An Analysis of Sequential Pattern Mining on Databases Using Approximate Sequential Patterns

Authors: J. Suneetha, Vijayalaxmi

Abstract:

Sequential Pattern Mining involves applying data mining methods to large data repositories to extract usage patterns. Sequential pattern mining methodologies used to analyze the data and identify patterns. The patterns have been used to implement efficient systems can recommend on previously observed patterns, in making predictions, improve usability of systems, detecting events, and in general help in making strategic product decisions. In this paper, identified performance of approximate sequential pattern mining defines as identifying patterns approximately shared with many sequences. Approximate sequential patterns can effectively summarize and represent the databases by identifying the underlying trends in the data. Conducting an extensive and systematic performance over synthetic and real data. The results demonstrate that ApproxMAP effective and scalable in mining large sequences databases with long patterns.

Keywords: multiple data, performance analysis, sequential pattern, sequence database scalability

Procedia PDF Downloads 307
27587 Association of Social Data as a Tool to Support Government Decision Making

Authors: Diego Rodrigues, Marcelo Lisboa, Elismar Batista, Marcos Dias

Abstract:

Based on data on child labor, this work arises questions about how to understand and locate the factors that make up the child labor rates, and which properties are important to analyze these cases. Using data mining techniques to discover valid patterns on Brazilian social databases were evaluated data of child labor in the State of Tocantins (located north of Brazil with a territory of 277000 km2 and comprises 139 counties). This work aims to detect factors that are deterministic for the practice of child labor and their relationships with financial indicators, educational, regional and social, generating information that is not explicit in the government database, thus enabling better monitoring and updating policies for this purpose.

Keywords: social data, government decision making, association of social data, data mining

Procedia PDF Downloads 342
27586 Customer Churn Analysis in Telecommunication Industry Using Data Mining Approach

Authors: Burcu Oralhan, Zeki Oralhan, Nilsun Sariyer, Kumru Uyar

Abstract:

Data mining has been becoming more and more important and a wide range of applications in recent years. Data mining is the process of find hidden and unknown patterns in big data. One of the applied fields of data mining is Customer Relationship Management. Understanding the relationships between products and customers is crucial for every business. Customer Relationship Management is an approach to focus on customer relationship development, retention and increase on customer satisfaction. In this study, we made an application of a data mining methods in telecommunication customer relationship management side. This study aims to determine the customers profile who likely to leave the system, develop marketing strategies, and customized campaigns for customers. Data are clustered by applying classification techniques for used to determine the churners. As a result of this study, we will obtain knowledge from international telecommunication industry. We will contribute to the understanding and development of this subject in Customer Relationship Management.

Keywords: customer churn analysis, customer relationship management, data mining, telecommunication industry

Procedia PDF Downloads 288
27585 An Observation of the Information Technology Research and Development Based on Article Data Mining: A Survey Study on Science Direct

Authors: Muhammet Dursun Kaya, Hasan Asil

Abstract:

One of the most important factors of research and development is the deep insight into the evolutions of scientific development. The state-of-the-art tools and instruments can considerably assist the researchers, and many of the world organizations have become aware of the advantages of data mining for the acquisition of the knowledge required for the unstructured data. This paper was an attempt to review the articles on the information technology published in the past five years with the aid of data mining. A clustering approach was used to study these articles, and the research results revealed that three topics, namely health, innovation, and information systems, have captured the special attention of the researchers.

Keywords: information technology, data mining, scientific development, clustering

Procedia PDF Downloads 250
27584 Data Mining and Knowledge Management Application to Enhance Business Operations: An Exploratory Study

Authors: Zeba Mahmood

Abstract:

The modern business organizations are adopting technological advancement to achieve competitive edge and satisfy their consumer. The development in the field of Information technology systems has changed the way of conducting business today. Business operations today rely more on the data they obtained and this data is continuously increasing in volume. The data stored in different locations is difficult to find and use without the effective implementation of Data mining and Knowledge management techniques. Organizations who smartly identify, obtain and then convert data in useful formats for their decision making and operational improvements create additional value for their customers and enhance their operational capabilities. Marketers and Customer relationship departments of firm use Data mining techniques to make relevant decisions, this paper emphasizes on the identification of different data mining and Knowledge management techniques that are applied to different business industries. The challenges and issues of execution of these techniques are also discussed and critically analyzed in this paper.

Keywords: knowledge, knowledge management, knowledge discovery in databases, business, operational, information, data mining

Procedia PDF Downloads 509
27583 Towards a Distributed Computation Platform Tailored for Educational Process Discovery and Analysis

Authors: Awatef Hicheur Cairns, Billel Gueni, Hind Hafdi, Christian Joubert, Nasser Khelifa

Abstract:

Given the ever changing needs of the job markets, education and training centers are increasingly held accountable for student success. Therefore, education and training centers have to focus on ways to streamline their offers and educational processes in order to achieve the highest level of quality in curriculum contents and managerial decisions. Educational process mining is an emerging field in the educational data mining (EDM) discipline, concerned with developing methods to discover, analyze and provide a visual representation of complete educational processes. In this paper, we present our distributed computation platform which allows different education centers and institutions to load their data and access to advanced data mining and process mining services. To achieve this, we present also a comparative study of the different clustering techniques developed in the context of process mining to partition efficiently educational traces. Our goal is to find the best strategy for distributing heavy analysis computations on many processing nodes of our platform.

Keywords: educational process mining, distributed process mining, clustering, distributed platform, educational data mining, ProM

Procedia PDF Downloads 429
27582 Data Mining in Medicine Domain Using Decision Trees and Vector Support Machine

Authors: Djamila Benhaddouche, Abdelkader Benyettou

Abstract:

In this paper, we used data mining to extract biomedical knowledge. In general, complex biomedical data collected in studies of populations are treated by statistical methods, although they are robust, they are not sufficient in themselves to harness the potential wealth of data. For that you used in step two learning algorithms: the Decision Trees and Support Vector Machine (SVM). These supervised classification methods are used to make the diagnosis of thyroid disease. In this context, we propose to promote the study and use of symbolic data mining techniques.

Keywords: biomedical data, learning, classifier, algorithms decision tree, knowledge extraction

Procedia PDF Downloads 520
27581 HPPDFIM-HD: Transaction Distortion and Connected Perturbation Approach for Hierarchical Privacy Preserving Distributed Frequent Itemset Mining over Horizontally-Partitioned Dataset

Authors: Fuad Ali Mohammed Al-Yarimi

Abstract:

Many algorithms have been proposed to provide privacy preserving in data mining. These protocols are based on two main approaches named as: the perturbation approach and the Cryptographic approach. The first one is based on perturbation of the valuable information while the second one uses cryptographic techniques. The perturbation approach is much more efficient with reduced accuracy while the cryptographic approach can provide solutions with perfect accuracy. However, the cryptographic approach is a much slower method and requires considerable computation and communication overhead. In this paper, a new scalable protocol is proposed which combines the advantages of the perturbation and distortion along with cryptographic approach to perform privacy preserving in distributed frequent itemset mining on horizontally distributed data. Both the privacy and performance characteristics of the proposed protocol are studied empirically.

Keywords: anonymity data, data mining, distributed frequent itemset mining, gaussian perturbation, perturbation approach, privacy preserving data mining

Procedia PDF Downloads 482
27580 Mining Multicity Urban Data for Sustainable Population Relocation

Authors: Xu Du, Aparna S. Varde

Abstract:

In this research, we propose to conduct diagnostic and predictive analysis about the key factors and consequences of urban population relocation. To achieve this goal, urban simulation models extract the urban development trends as land use change patterns from a variety of data sources. The results are treated as part of urban big data with other information such as population change and economic conditions. Multiple data mining methods are deployed on this data to analyze nonlinear relationships between parameters. The result determines the driving force of population relocation with respect to urban sprawl and urban sustainability and their related parameters. Experiments so far reveal that data mining methods discover useful knowledge from the multicity urban data. This work sets the stage for developing a comprehensive urban simulation model for catering to specific questions by targeted users. It contributes towards achieving sustainability as a whole.

Keywords: data mining, environmental modeling, sustainability, urban planning

Procedia PDF Downloads 272
27579 Develop a Conceptual Data Model of Geotechnical Risk Assessment in Underground Coal Mining Using a Cloud-Based Machine Learning Platform

Authors: Reza Mohammadzadeh

Abstract:

The major challenges in geotechnical engineering in underground spaces arise from uncertainties and different probabilities. The collection, collation, and collaboration of existing data to incorporate them in analysis and design for given prospect evaluation would be a reliable, practical problem solving method under uncertainty. Machine learning (ML) is a subfield of artificial intelligence in statistical science which applies different techniques (e.g., Regression, neural networks, support vector machines, decision trees, random forests, genetic programming, etc.) on data to automatically learn and improve from them without being explicitly programmed and make decisions and predictions. In this paper, a conceptual database schema of geotechnical risks in underground coal mining based on a cloud system architecture has been designed. A new approach of risk assessment using a three-dimensional risk matrix supported by the level of knowledge (LoK) has been proposed in this model. Subsequently, the model workflow methodology stages have been described. In order to train data and LoK models deployment, an ML platform has been implemented. IBM Watson Studio, as a leading data science tool and data-driven cloud integration ML platform, is employed in this study. As a Use case, a data set of geotechnical hazards and risk assessment in underground coal mining were prepared to demonstrate the performance of the model, and accordingly, the results have been outlined.

Keywords: data model, geotechnical risks, machine learning, underground coal mining

Procedia PDF Downloads 247
27578 Summarizing Data Sets for Data Mining by Using Statistical Methods in Coastal Engineering

Authors: Yunus Doğan, Ahmet Durap

Abstract:

Coastal regions are the one of the most commonly used places by the natural balance and the growing population. In coastal engineering, the most valuable data is wave behaviors. The amount of this data becomes very big because of observations that take place for periods of hours, days and months. In this study, some statistical methods such as the wave spectrum analysis methods and the standard statistical methods have been used. The goal of this study is the discovery profiles of the different coast areas by using these statistical methods, and thus, obtaining an instance based data set from the big data to analysis by using data mining algorithms. In the experimental studies, the six sample data sets about the wave behaviors obtained by 20 minutes of observations from Mersin Bay in Turkey and converted to an instance based form, while different clustering techniques in data mining algorithms were used to discover similar coastal places. Moreover, this study discusses that this summarization approach can be used in other branches collecting big data such as medicine.

Keywords: clustering algorithms, coastal engineering, data mining, data summarization, statistical methods

Procedia PDF Downloads 340
27577 An Assessment of Floodplain Vegetation Response to Groundwater Changes Using the Soil & Water Assessment Tool Hydrological Model, Geographic Information System, and Machine Learning in the Southeast Australian River Basin

Authors: Newton Muhury, Armando A. Apan, Tek N. Marasani, Gebiaw T. Ayele

Abstract:

The changing climate has degraded freshwater availability in Australia that influencing vegetation growth to a great extent. This study assessed the vegetation responses to groundwater using Terra’s moderate resolution imaging spectroradiometer (MODIS), Normalised Difference Vegetation Index (NDVI), and soil water content (SWC). A hydrological model, SWAT, has been set up in a southeast Australian river catchment for groundwater analysis. The model was calibrated and validated against monthly streamflow from 2001 to 2006 and 2007 to 2010, respectively. The SWAT simulated soil water content for 43 sub-basins and monthly MODIS NDVI data for three different types of vegetation (forest, shrub, and grass) were applied in the machine learning tool, Waikato Environment for Knowledge Analysis (WEKA), using two supervised machine learning algorithms, i.e., support vector machine (SVM) and random forest (RF). The assessment shows that different types of vegetation response and soil water content vary in the dry and wet seasons. The WEKA model generated high positive relationships (r = 0.76, 0.73, and 0.81) between NDVI values of all vegetation in the sub-basins against soil water content (SWC), the groundwater flow (GW), and the combination of these two variables, respectively, during the dry season. However, these responses were reduced by 36.8% (r = 0.48) and 13.6% (r = 0.63) against GW and SWC, respectively, in the wet season. Although the rainfall pattern is highly variable in the study area, the summer rainfall is very effective for the growth of the grass vegetation type. This study has enriched our knowledge of vegetation responses to groundwater in each season, which will facilitate better floodplain vegetation management.

Keywords: ArcSWAT, machine learning, floodplain vegetation, MODIS NDVI, groundwater

Procedia PDF Downloads 70