Search results for: subgraph mining
348 Decision Support System Based on Data Warehouse
Authors: Yang Bao, LuJing Zhang
Abstract:
Typical Intelligent Decision Support System is 4-based, its design composes of Data Warehouse, Online Analytical Processing, Data Mining and Decision Supporting based on models, which is called Decision Support System Based on Data Warehouse (DSSBDW). This way takes ETL,OLAP and DM as its implementing means, and integrates traditional model-driving DSS and data-driving DSS into a whole. For this kind of problem, this paper analyzes the DSSBDW architecture and DW model, and discusses the following key issues: ETL designing and Realization; metadata managing technology using XML; SQL implementing, optimizing performance, data mapping in OLAP; lastly, it illustrates the designing principle and method of DW in DSSBDW.
Keywords: Decision Support System, Data Warehouse, Data Mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3863347 Hydrogeological Risk and Mining Tunnels: the Fontane-Rodoretto Mine Turin (Italy)
Authors: Paola Gattinoni, Laura Scesi, Elena Cerino Adbin, Daniele Cremonesi
Abstract:
The interaction of tunneling or mining with groundwater has become a very relevant problem not only due to the need to guarantee the safety of workers and to assure the efficiency of the tunnel drainage systems, but also to safeguard water resources from impoverishment and pollution risk. Therefore it is very important to forecast the drainage processes (i.e., the evaluation of drained discharge and drawdown caused by the excavation). The aim of this study was to know better the system and to quantify the flow drained from the Fontane mines, located in Val Germanasca (Turin, Italy). This allowed to understand the hydrogeological local changes in time. The work has therefore been structured as follows: the reconstruction of the conceptual model with the geological, hydrogeological and geological-structural study; the calculation of the tunnel inflows (through the use of structural methods) and the comparison with the measured flow rates; the water balance at the basin scale. In this way it was possible to understand what are the relationships between rainfall, groundwater level variations and the effect of the presence of tunnels as a means of draining water. Subsequently, it the effects produced by the excavation of the mining tunnels was quantified, through numerical modeling. In particular, the modeling made it possible to observe the drawdown variation as a function of number, excavation depth and different mines linings.Keywords: Groundwater, Italy, numerical model, tunneling.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1928346 Utilization of Process Mapping Tool to Enhance Production Drilling in Underground Metal Mining Operations
Authors: Sidharth Talan, Sanjay Kumar Sharma, Eoin Joseph Wallace, Nikita Agrawal
Abstract:
Underground mining is at the core of rapidly evolving metals and minerals sector due to the increasing mineral consumption globally. Even though the surface mines are still more abundant on earth, the scales of industry are slowly tipping towards underground mining due to rising depth and complexities of orebodies. Thus, the efficient and productive functioning of underground operations depends significantly on the synchronized performance of key elements such as operating site, mining equipment, manpower and mine services. Production drilling is the process of conducting long hole drilling for the purpose of charging and blasting these holes for the production of ore in underground metal mines. Thus, production drilling is the crucial segment in the underground metal mining value chain. This paper presents the process mapping tool to evaluate the production drilling process in the underground metal mining operation by dividing the given process into three segments namely Input, Process and Output. The three segments are further segregated into factors and sub-factors. As per the study, the major input factors crucial for the efficient functioning of production drilling process are power, drilling water, geotechnical support of the drilling site, skilled drilling operators, services installation crew, oils and drill accessories for drilling machine, survey markings at drill site, proper housekeeping, regular maintenance of drill machine, suitable transportation for reaching the drilling site and finally proper ventilation. The major outputs for the production drilling process are ore, waste as a result of dilution, timely reporting and investigation of unsafe practices, optimized process time and finally well fragmented blasted material within specifications set by the mining company. The paper also exhibits the drilling loss matrix, which is utilized to appraise the loss in planned production meters per day in a mine on account of availability loss in the machine due to breakdowns, underutilization of the machine and productivity loss in the machine measured in drilling meters per unit of percussion hour with respect to its planned productivity for the day. The given three losses would be essential to detect the bottlenecks in the process map of production drilling operation so as to instigate the action plan to suppress or prevent the causes leading to the operational performance deficiency. The given tool is beneficial to mine management to focus on the critical factors negatively impacting the production drilling operation and design necessary operational and maintenance strategies to mitigate them.
Keywords: Process map, drilling loss matrix, availability, utilization, productivity, percussion rate.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1089345 Discovering Complex Regularities: from Tree to Semi-Lattice Classifications
Authors: A. Faro, D. Giordano, F. Maiorana
Abstract:
Data mining uses a variety of techniques each of which is useful for some particular task. It is important to have a deep understanding of each technique and be able to perform sophisticated analysis. In this article we describe a tool built to simulate a variation of the Kohonen network to perform unsupervised clustering and support the entire data mining process up to results visualization. A graphical representation helps the user to find out a strategy to optimize classification by adding, moving or delete a neuron in order to change the number of classes. The tool is able to automatically suggest a strategy to optimize the number of classes optimization, but also support both tree classifications and semi-lattice organizations of the classes to give to the users the possibility of passing from one class to the ones with which it has some aspects in common. Examples of using tree and semi-lattice classifications are given to illustrate advantages and problems. The tool is applied to classify macroeconomic data that report the most developed countries- import and export. It is possible to classify the countries based on their economic behaviour and use the tool to characterize the commercial behaviour of a country in a selected class from the analysis of positive and negative features that contribute to classes formation. Possible interrelationships between the classes and their meaning are also discussed.Keywords: Unsupervised classification, Kohonen networks, macroeconomics, Visual data mining, Cluster interpretation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1542344 Influenza Pattern Analysis System through Mining Weblogs
Authors: Pei Lin Khoo, Yunli Lee
Abstract:
Weblogs are resource of social structure to discover and track the various type of information written by blogger. In this paper, we proposed to use mining weblogs technique for identifying the trends of influenza where blogger had disseminated their opinion for the anomaly disease. In order to identify the trends, web crawler is applied to perform a search and generated a list of visited links based on a set of influenza keywords. This information is used to implement the analytics report system for monitoring and analyzing the pattern and trends of influenza (H1N1). Statistical and graphical analysis reports are generated. Both types of the report have shown satisfactory reports that reflect the awareness of Malaysian on the issue of influenza outbreak through blogs.
Keywords: H1N1, Weblogs, Web Crawler, Analytics Report System.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2466343 Analysis of Road Repairs in Undermined Areas
Authors: Tomáš Seidler, Marek Mihola, Denisa Cihlarova
Abstract:
The article presents analysis results of maps of expected subsidence in undermined areas for road repair management. The analysis was done in the area of Karvina district in the Czech Republic, including undermined areas with ongoing deep mining activities or finished deep mining in years 2003 - 2009. The article discusses the possibilities of local road maintenance authorities to determine areas that will need most repairs in the future with limited data available. Using the expected subsidence maps new map of surface curvature was calculated. Combined with road maps and historical data about repairs the result came for five main categories of undermined areas, proving very simple tool for management.Keywords: GIS, Map of Subsidence, Road, Undermined Area
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1327342 Impact of Coal Mining on River Sediment Quality in the Sydney Basin, Australia
Authors: A. Ali, V. Strezov, P. Davies, I. Wright, T. Kan
Abstract:
The environmental impacts arising from mining activities affect the air, water, and soil quality. Impacts may result in unexpected and adverse environmental outcomes. This study reports on the impact of coal production on sediment in Sydney region of Australia. The sediment samples upstream and downstream from the discharge points from three mines were taken, and 80 parameters were tested. The results were assessed against sediment quality based on presence of metals. The study revealed the increment of metal content in the sediment downstream of the reference locations. In many cases, the sediment was above the Australia and New Zealand Environment Conservation Council and international sediment quality guidelines value (SQGV). The major outliers to the guidelines were nickel (Ni) and zinc (Zn).Keywords: Coal mine, environmental impact, produced water, sediment quality guidelines value.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1403341 Using Data Mining Techniques for Finding Cardiac Outlier Patients
Authors: Farhan Ismaeel Dakheel, Raoof Smko, K. Negrat, Abdelsalam Almarimi
Abstract:
In this paper we used data mining techniques to identify outlier patients who are using large amount of drugs over a long period of time. Any healthcare or health insurance system should deal with the quantities of drugs utilized by chronic diseases patients. In Kingdom of Bahrain, about 20% of health budget is spent on medications. For the managers of healthcare systems, there is no enough information about the ways of drug utilization by chronic diseases patients, is there any misuse or is there outliers patients. In this work, which has been done in cooperation with information department in the Bahrain Defence Force hospital; we select the data for Cardiac patients in the period starting from 1/1/2008 to December 31/12/2008 to be the data for the model in this paper. We used three techniques for finding the drug utilization for cardiac patients. First we applied a clustering technique, followed by measuring of clustering validity, and finally we applied a decision tree as classification algorithm. The clustering results is divided into three clusters according to the drug utilization, for 1603 patients, who received 15,806 prescriptions during this period can be partitioned into three groups, where 23 patients (2.59%) who received 1316 prescriptions (8.32%) are classified to be outliers. The classification algorithm shows that the use of average drug utilization and the age, and the gender of the patient can be considered to be the main predictive factors in the induced model.Keywords: Data Mining, Clustering, Classification, Drug Utilization..
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1899340 Implementation of an IoT Sensor Data Collection and Analysis Library
Authors: Jihyun Song, Kyeongjoo Kim, Minsoo Lee
Abstract:
Due to the development of information technology and wireless Internet technology, various data are being generated in various fields. These data are advantageous in that they provide real-time information to the users themselves. However, when the data are accumulated and analyzed, more various information can be extracted. In addition, development and dissemination of boards such as Arduino and Raspberry Pie have made it possible to easily test various sensors, and it is possible to collect sensor data directly by using database application tools such as MySQL. These directly collected data can be used for various research and can be useful as data for data mining. However, there are many difficulties in using the board to collect data, and there are many difficulties in using it when the user is not a computer programmer, or when using it for the first time. Even if data are collected, lack of expert knowledge or experience may cause difficulties in data analysis and visualization. In this paper, we aim to construct a library for sensor data collection and analysis to overcome these problems.
Keywords: Clustering, data mining, DBSCAN, k-means, k-medoids, sensor data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2010339 A Novel Approach to Optimal Cutting Tool Replacement
Authors: Cem Karacal, Sohyung Cho, William Yu
Abstract:
In metal cutting industries, mathematical/statistical models are typically used to predict tool replacement time. These off-line methods usually result in less than optimum replacement time thereby either wasting resources or causing quality problems. The few online real-time methods proposed use indirect measurement techniques and are prone to similar errors. Our idea is based on identifying the optimal replacement time using an electronic nose to detect the airborne compounds released when the tool wear reaches to a chemical substrate doped into tool material during the fabrication. The study investigates the feasibility of the idea, possible doping materials and methods along with data stream mining techniques for detection and monitoring different phases of tool wear.Keywords: Tool condition monitoring, cutting tool replacement, data stream mining, e-Nose.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1882338 Road Traffic Accidents Analysis in Mexico City through Crowdsourcing Data and Data Mining Techniques
Authors: Gabriela V. Angeles Perez, Jose Castillejos Lopez, Araceli L. Reyes Cabello, Emilio Bravo Grajales, Adriana Perez Espinosa, Jose L. Quiroz Fabian
Abstract:
Road traffic accidents are among the principal causes of traffic congestion, causing human losses, damages to health and the environment, economic losses and material damages. Studies about traditional road traffic accidents in urban zones represents very high inversion of time and money, additionally, the result are not current. However, nowadays in many countries, the crowdsourced GPS based traffic and navigation apps have emerged as an important source of information to low cost to studies of road traffic accidents and urban congestion caused by them. In this article we identified the zones, roads and specific time in the CDMX in which the largest number of road traffic accidents are concentrated during 2016. We built a database compiling information obtained from the social network known as Waze. The methodology employed was Discovery of knowledge in the database (KDD) for the discovery of patterns in the accidents reports. Furthermore, using data mining techniques with the help of Weka. The selected algorithms was the Maximization of Expectations (EM) to obtain the number ideal of clusters for the data and k-means as a grouping method. Finally, the results were visualized with the Geographic Information System QGIS.Keywords: Data mining, K-means, road traffic accidents, Waze, Weka.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1216337 Design of Personal Job Recommendation Framework on Smartphone Platform
Authors: Chayaporn Kaensar
Abstract:
Recently, Job Recommender Systems have gained much attention in industries since they solve the problem of information overload on the recruiting website. Therefore, we proposed Extended Personalized Job System that has the capability of providing the appropriate jobs for job seeker and recommending some suitable information for them using Data Mining Techniques and Dynamic User Profile. On the other hands, company can also interact to the system for publishing and updating job information. This system have emerged and supported various platforms such as web application and android mobile application. In this paper, User profiles, Implicit User Action, User Feedback, and Clustering Techniques in WEKA libraries were applied and implemented. In additions, open source tools like Yii Web Application Framework, Bootstrap Front End Framework and Android Mobile Technology were also applied.Keywords: Recommendation, user profile, data mining, web technology, mobile technology.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2151336 Finding Fuzzy Association Rules Using FWFP-Growth with Linguistic Supports and Confidences
Authors: Chien-Hua Wang, Chin-Tzong Pang
Abstract:
In data mining, the association rules are used to search for the relations of items of the transactions database. Following the data is collected and stored, it can find rules of value through association rules, and assist manager to proceed marketing strategy and plan market framework. In this paper, we attempt fuzzy partition methods and decide membership function of quantitative values of each transaction item. Also, by managers we can reflect the importance of items as linguistic terms, which are transformed as fuzzy sets of weights. Next, fuzzy weighted frequent pattern growth (FWFP-Growth) is used to complete the process of data mining. The method above is expected to improve Apriori algorithm for its better efficiency of the whole association rules. An example is given to clearly illustrate the proposed approach.Keywords: Association Rule, Fuzzy Partition Methods, FWFP-Growth, Apiroir algorithm
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1652335 AniMoveMineR: Animal Behavior Exploratory Analysis Using Association Rules Mining
Authors: Suelane Garcia Fontes, Silvio Luiz Stanzani, Pedro L. Pizzigatti Corrła Ronaldo G. Morato
Abstract:
Environmental changes and major natural disasters are most prevalent in the world due to the damage that humanity has caused to nature and these damages directly affect the lives of animals. Thus, the study of animal behavior and their interactions with the environment can provide knowledge that guides researchers and public agencies in preservation and conservation actions. Exploratory analysis of animal movement can determine the patterns of animal behavior and with technological advances the ability of animals to be tracked and, consequently, behavioral studies have been expanded. There is a lot of research on animal movement and behavior, but we note that a proposal that combines resources and allows for exploratory analysis of animal movement and provide statistical measures on individual animal behavior and its interaction with the environment is missing. The contribution of this paper is to present the framework AniMoveMineR, a unified solution that aggregates trajectory analysis and data mining techniques to explore animal movement data and provide a first step in responding questions about the animal individual behavior and their interactions with other animals over time and space. We evaluated the framework through the use of monitored jaguar data in the city of Miranda Pantanal, Brazil, in order to verify if the use of AniMoveMineR allows to identify the interaction level between these jaguars. The results were positive and provided indications about the individual behavior of jaguars and about which jaguars have the highest or lowest correlation.Keywords: Data mining, data science, trajectory, animal behavior.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 919334 Feature Selection Approaches with Missing Values Handling for Data Mining - A Case Study of Heart Failure Dataset
Authors: N.Poolsawad, C.Kambhampati, J. G. F. Cleland
Abstract:
In this paper, we investigated the characteristic of a clinical dataseton the feature selection and classification measurements which deal with missing values problem.And also posed the appropriated techniques to achieve the aim of the activity; in this research aims to find features that have high effect to mortality and mortality time frame. We quantify the complexity of a clinical dataset. According to the complexity of the dataset, we proposed the data mining processto cope their complexity; missing values, high dimensionality, and the prediction problem by using the methods of missing value replacement, feature selection, and classification.The experimental results will extend to develop the prediction model for cardiology.Keywords: feature selection, missing values, classification, clinical dataset, heart failure.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3212333 Seasonal Variation of the Impact of Mining Activities on Ga-Selati River in Limpopo Province, South Africa
Authors: Joshua N. Edokpayi, John O. Odiyo, Patience P. Shikwambana
Abstract:
Water is a very rare natural resource in South Africa. Ga-Selati River is used for both domestic and industrial purposes. This study was carried out in order to assess the quality of Ga-Selati River in a mining area of Limpopo Province-Phalaborwa. The pH, Electrical Conductivity (EC) and Total Dissolved Solids (TDS) were determined using a Crinson multimeter while turbidity was measured using a Labcon Turbidimeter. The concentrations of Al, Ca, Cd, Cr, Fe, K, Mg, Mn, Na and Pb were analysed in triplicate using a Varian 520 flame atomic absorption spectrometer (AAS) supplied by PerkinElmer, after acid digestion with nitric acid in a fume cupboard. The average pH of the river from eight different sampling sites was 8.00 and 9.38 in wet and dry season respectively. Higher EC values were determined in the dry season (138.7 mS/m) than in the wet season (96.93 mS/m). Similarly, TDS values were higher in dry (929.29 mg/L) than in the wet season (640.72 mg/L) season. These values exceeded the recommended guideline of South Africa Department of Water Affairs and Forestry (DWAF) for domestic water use (70 mS/m) and that of the World Health Organization (WHO) (600 mS/m), respectively. Turbidity varied between 1.78-5.20 and 0.95-2.37 NTU in both wet and dry seasons. Total hardness of 312.50 mg/L and 297.75 mg/L as the concentration of CaCO3 was computed for the river in both the wet and the dry seasons and the river water was categorised as very hard. Mean concentration of the metals studied in both the wet and the dry seasons are: Na (94.06 mg/L and 196.3 mg/L), K (11.79 mg/L and 13.62 mg/L), Ca (45.60 mg/L and 41.30 mg/L), Mg (48.41 mg/L and 44.71 mg/L), Al (0.31 mg/L and 0.38 mg/L), Cd (0.01 mg/L and 0.01 mg/L), Cr (0.02 mg/L and 0.09 mg/L), Pb (0.05 mg/L and 0.06 mg/L), Mn (0.31 mg/L and 0.11 mg/L) and Fe (0.76 mg/L and 0.69 mg/L). Results from this study reveal that most of the metals were present in concentrations higher than the recommended guidelines of DWAF and WHO for domestic use and the protection of aquatic life.Keywords: Contamination, mining activities, surface water, trace metals.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1988332 Efficient Implementation of Serial and Parallel Support Vector Machine Training with a Multi-Parameter Kernel for Large-Scale Data Mining
Authors: Tatjana Eitrich, Bruno Lang
Abstract:
This work deals with aspects of support vector learning for large-scale data mining tasks. Based on a decomposition algorithm that can be run in serial and parallel mode we introduce a data transformation that allows for the usage of an expensive generalized kernel without additional costs. In order to speed up the decomposition algorithm we analyze the problem of working set selection for large data sets and analyze the influence of the working set sizes onto the scalability of the parallel decomposition scheme. Our modifications and settings lead to improvement of support vector learning performance and thus allow using extensive parameter search methods to optimize classification accuracy.
Keywords: Support Vector Machines, Shared Memory Parallel Computing, Large Data
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1577331 Integration of Educational Data Mining Models to a Web-Based Support System for Predicting High School Student Performance
Authors: Sokkhey Phauk, Takeo Okazaki
Abstract:
The challenging task in educational institutions is to maximize the high performance of students and minimize the failure rate of poor-performing students. An effective method to leverage this task is to know student learning patterns with highly influencing factors and get an early prediction of student learning outcomes at the timely stage for setting up policies for improvement. Educational data mining (EDM) is an emerging disciplinary field of data mining, statistics, and machine learning concerned with extracting useful knowledge and information for the sake of improvement and development in the education environment. The study is of this work is to propose techniques in EDM and integrate it into a web-based system for predicting poor-performing students. A comparative study of prediction models is conducted. Subsequently, high performing models are developed to get higher performance. The hybrid random forest (Hybrid RF) produces the most successful classification. For the context of intervention and improving the learning outcomes, a feature selection method MICHI, which is the combination of mutual information (MI) and chi-square (CHI) algorithms based on the ranked feature scores, is introduced to select a dominant feature set that improves the performance of prediction and uses the obtained dominant set as information for intervention. By using the proposed techniques of EDM, an academic performance prediction system (APPS) is subsequently developed for educational stockholders to get an early prediction of student learning outcomes for timely intervention. Experimental outcomes and evaluation surveys report the effectiveness and usefulness of the developed system. The system is used to help educational stakeholders and related individuals for intervening and improving student performance.
Keywords: Academic performance prediction system, prediction model, educational data mining, dominant factors, feature selection methods, student performance.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 980330 Mining User-Generated Contents to Detect Service Failures with Topic Model
Authors: Kyung Bae Park, Sung Ho Ha
Abstract:
Online user-generated contents (UGC) significantly change the way customers behave (e.g., shop, travel), and a pressing need to handle the overwhelmingly plethora amount of various UGC is one of the paramount issues for management. However, a current approach (e.g., sentiment analysis) is often ineffective for leveraging textual information to detect the problems or issues that a certain management suffers from. In this paper, we employ text mining of Latent Dirichlet Allocation (LDA) on a popular online review site dedicated to complaint from users. We find that the employed LDA efficiently detects customer complaints, and a further inspection with the visualization technique is effective to categorize the problems or issues. As such, management can identify the issues at stake and prioritize them accordingly in a timely manner given the limited amount of resources. The findings provide managerial insights into how analytics on social media can help maintain and improve their reputation management. Our interdisciplinary approach also highlights several insights by applying machine learning techniques in marketing research domain. On a broader technical note, this paper illustrates the details of how to implement LDA in R program from a beginning (data collection in R) to an end (LDA analysis in R) since the instruction is still largely undocumented. In this regard, it will help lower the boundary for interdisciplinary researcher to conduct related research.Keywords: Latent Dirichlet allocation, R program, text mining, topic model, user generated contents, visualization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1216329 Feature Based Unsupervised Intrusion Detection
Authors: Deeman Yousif Mahmood, Mohammed Abdullah Hussein
Abstract:
The goal of a network-based intrusion detection system is to classify activities of network traffics into two major categories: normal and attack (intrusive) activities. Nowadays, data mining and machine learning plays an important role in many sciences; including intrusion detection system (IDS) using both supervised and unsupervised techniques. However, one of the essential steps of data mining is feature selection that helps in improving the efficiency, performance and prediction rate of proposed approach. This paper applies unsupervised K-means clustering algorithm with information gain (IG) for feature selection and reduction to build a network intrusion detection system. For our experimental analysis, we have used the new NSL-KDD dataset, which is a modified dataset for KDDCup 1999 intrusion detection benchmark dataset. With a split of 60.0% for the training set and the remainder for the testing set, a 2 class classifications have been implemented (Normal, Attack). Weka framework which is a java based open source software consists of a collection of machine learning algorithms for data mining tasks has been used in the testing process. The experimental results show that the proposed approach is very accurate with low false positive rate and high true positive rate and it takes less learning time in comparison with using the full features of the dataset with the same algorithm.
Keywords: Information Gain (IG), Intrusion Detection System (IDS), K-means Clustering, Weka.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2776328 A Supervised Learning Data Mining Approach for Object Recognition and Classification in High Resolution Satellite Data
Authors: Mais Nijim, Rama Devi Chennuboyina, Waseem Al Aqqad
Abstract:
Advances in spatial and spectral resolution of satellite images have led to tremendous growth in large image databases. The data we acquire through satellites, radars, and sensors consists of important geographical information that can be used for remote sensing applications such as region planning, disaster management. Spatial data classification and object recognition are important tasks for many applications. However, classifying objects and identifying them manually from images is a difficult task. Object recognition is often considered as a classification problem, this task can be performed using machine-learning techniques. Despite of many machine-learning algorithms, the classification is done using supervised classifiers such as Support Vector Machines (SVM) as the area of interest is known. We proposed a classification method, which considers neighboring pixels in a region for feature extraction and it evaluates classifications precisely according to neighboring classes for semantic interpretation of region of interest (ROI). A dataset has been created for training and testing purpose; we generated the attributes by considering pixel intensity values and mean values of reflectance. We demonstrated the benefits of using knowledge discovery and data-mining techniques, which can be on image data for accurate information extraction and classification from high spatial resolution remote sensing imagery.Keywords: Remote sensing, object recognition, classification, data mining, waterbody identification, feature extraction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2055327 Post Mining- Discovering Valid Rules from Different Sized Data Sources
Authors: R. Nedunchezhian, K. Anbumani
Abstract:
A big organization may have multiple branches spread across different locations. Processing of data from these branches becomes a huge task when innumerable transactions take place. Also, branches may be reluctant to forward their data for centralized processing but are ready to pass their association rules. Local mining may also generate a large amount of rules. Further, it is not practically possible for all local data sources to be of the same size. A model is proposed for discovering valid rules from different sized data sources where the valid rules are high weighted rules. These rules can be obtained from the high frequency rules generated from each of the data sources. A data source selection procedure is considered in order to efficiently synthesize rules. Support Equalization is another method proposed which focuses on eliminating low frequency rules at the local sites itself thus reducing the rules by a significant amount.
Keywords: Association rules, multiple data stores, synthesizing, valid rules.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1404326 Content Based Sampling over Transactional Data Streams
Authors: Mansour Tarafdar, Mohammad Saniee Abade
Abstract:
This paper investigates the problem of sampling from transactional data streams. We introduce CFISDS as a content based sampling algorithm that works on a landmark window model of data streams and preserve more informed sample in sample space. This algorithm that work based on closed frequent itemset mining tasks, first initiate a concept lattice using initial data, then update lattice structure using an incremental mechanism.Incremental mechanism insert, update and delete nodes in/from concept lattice in batch manner. Presented algorithm extracts the final samples on demand of user. Experimental results show the accuracy of CFISDS on synthetic and real datasets, despite on CFISDS algorithm is not faster than exist sampling algorithms such as Z and DSS.
Keywords: Sampling, data streams, closed frequent item set mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1709325 Mining Implicit Knowledge to Predict Political Risk by Providing Novel Framework with Using Bayesian Network
Authors: Siavash Asadi Ghajarloo
Abstract:
Nowadays predicting political risk level of country has become a critical issue for investors who intend to achieve accurate information concerning stability of the business environments. Since, most of the times investors are layman and nonprofessional IT personnel; this paper aims to propose a framework named GECR in order to help nonexpert persons to discover political risk stability across time based on the political news and events. To achieve this goal, the Bayesian Networks approach was utilized for 186 political news of Pakistan as sample dataset. Bayesian Networks as an artificial intelligence approach has been employed in presented framework, since this is a powerful technique that can be applied to model uncertain domains. The results showed that our framework along with Bayesian Networks as decision support tool, predicted the political risk level with a high degree of accuracy.Keywords: Bayesian Networks, Data mining, GECRframework, Predicting political risk.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2174324 Appraisal of Methods for Identifying, Mapping, and Modelling of Fluvial Erosion in a Mining Environment
Authors: F. F. Howard, I. Yakubu, C. B. Boye, J. S. Y. Kuma
Abstract:
Natural and human activities, such as mining operations, expose the natural soil to adverse environmental conditions, leading to contamination of soil, groundwater, and surface water, which has negative effects on humans, flora, and fauna. Bare or partly exposed soil is most liable to fluvial erosion. This paper enumerates various methods used to identify, map, and model fluvial erosion in a mining environment. Classical, Artificial Intelligence (AI), and GIS methods have been reviewed. One of the many classical methods used to estimate river erosion is the Revised Universal Soil Loss Equation (RUSLE) model. The RUSLE model is easy to use. Its reliance on empirical relationships that may not always be applicable to specific circumstances or locations is a flaw. Other classical models for estimating fluvial erosion are the Soil and Water Assessment Tool (SWAT) and the Universal Soil Loss Equation (USLE). These models offer a more complete understanding of the underlying physical processes and encompass a wider range of situations. Although more difficult to utilise, they depend on the availability and dependability of input data for correctness. AI can help deal with multivariate and complex difficulties and predict soil loss with higher accuracy than traditional methods, and also be used to build unique models for identifying degraded areas. AI techniques have become popular as an alternative predictor for degraded environments. However, this research proposed a hybrid of classical, AI, and GIS methods for efficient and effective modelling of fluvial erosion.
Keywords: Fluvial erosion, classical methods, Artificial Intelligence, Geographic Information System.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 185323 A Hybrid Approach for Thread Recommendation in MOOC Forums
Authors: Ahmad. A. Kardan, Amir Narimani, Foozhan Ataiefard
Abstract:
Recommender Systems have been developed to provide contents and services compatible to users based on their behaviors and interests. Due to information overload in online discussion forums and users diverse interests, recommending relative topics and threads is considered to be helpful for improving the ease of forum usage. In order to lead learners to find relevant information in educational forums, recommendations are even more needed. We present a hybrid thread recommender system for MOOC forums by applying social network analysis and association rule mining techniques. Initial results indicate that the proposed recommender system performs comparatively well with regard to limited available data from users' previous posts in the forum.Keywords: Association rule mining, hybrid recommender system, massive open online courses, MOOCs, social network analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1263322 The Use of Classifiers in Image Analysis of Oil Wells Profiling Process and the Automatic Identification of Events
Authors: Jaqueline M. R. Vieira
Abstract:
Different strategies and tools are available at the oil and gas industry for detecting and analyzing tension and possible fractures in borehole walls. Most of these techniques are based on manual observation of the captured borehole images. While this strategy may be possible and convenient with small images and few data, it may become difficult and suitable to errors when big databases of images must be treated. While the patterns may differ among the image area, depending on many characteristics (drilling strategy, rock components, rock strength, etc.). In this work we propose the inclusion of data-mining classification strategies in order to create a knowledge database of the segmented curves. These classifiers allow that, after some time using and manually pointing parts of borehole images that correspond to tension regions and breakout areas, the system will indicate and suggest automatically new candidate regions, with higher accuracy. We suggest the use of different classifiers methods, in order to achieve different knowledge dataset configurations.
Keywords: Brazil, classifiers, data-mining, Image Segmentation, oil well visualization, classifiers.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2544321 Q-Map: Clinical Concept Mining from Clinical Documents
Authors: Sheikh Shams Azam, Manoj Raju, Venkatesh Pagidimarri, Vamsi Kasivajjala
Abstract:
Over the past decade, there has been a steep rise in the data-driven analysis in major areas of medicine, such as clinical decision support system, survival analysis, patient similarity analysis, image analytics etc. Most of the data in the field are well-structured and available in numerical or categorical formats which can be used for experiments directly. But on the opposite end of the spectrum, there exists a wide expanse of data that is intractable for direct analysis owing to its unstructured nature which can be found in the form of discharge summaries, clinical notes, procedural notes which are in human written narrative format and neither have any relational model nor any standard grammatical structure. An important step in the utilization of these texts for such studies is to transform and process the data to retrieve structured information from the haystack of irrelevant data using information retrieval and data mining techniques. To address this problem, the authors present Q-Map in this paper, which is a simple yet robust system that can sift through massive datasets with unregulated formats to retrieve structured information aggressively and efficiently. It is backed by an effective mining technique which is based on a string matching algorithm that is indexed on curated knowledge sources, that is both fast and configurable. The authors also briefly examine its comparative performance with MetaMap, one of the most reputed tools for medical concepts retrieval and present the advantages the former displays over the latter.Keywords: Information retrieval (IR), unified medical language system (UMLS), Syntax Based Analysis, natural language processing (NLP), medical informatics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 779320 Genetic-based Anomaly Detection in Logs of Process Aware Systems
Authors: Hanieh Jalali, Ahmad Baraani
Abstract:
Nowaday-s, many organizations use systems that support business process as a whole or partially. However, in some application domains, like software development and health care processes, a normative Process Aware System (PAS) is not suitable, because a flexible support is needed to respond rapidly to new process models. On the other hand, a flexible Process Aware System may be vulnerable to undesirable and fraudulent executions, which imposes a tradeoff between flexibility and security. In order to make this tradeoff available, a genetic-based anomaly detection model for logs of Process Aware Systems is presented in this paper. The detection of an anomalous trace is based on discovering an appropriate process model by using genetic process mining and detecting traces that do not fit the appropriate model as anomalous trace; therefore, when used in PAS, this model is an automated solution that can support coexistence of flexibility and security.Keywords: Anomaly Detection, Genetic Algorithm, ProcessAware Systems, Process Mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1925319 Clustering Unstructured Text Documents Using Fading Function
Authors: Pallav Roxy, Durga Toshniwal
Abstract:
Clustering unstructured text documents is an important issue in data mining community and has a number of applications such as document archive filtering, document organization and topic detection and subject tracing. In the real world, some of the already clustered documents may not be of importance while new documents of more significance may evolve. Most of the work done so far in clustering unstructured text documents overlooks this aspect of clustering. This paper, addresses this issue by using the Fading Function. The unstructured text documents are clustered. And for each cluster a statistics structure called Cluster Profile (CP) is implemented. The cluster profile incorporates the Fading Function. This Fading Function keeps an account of the time-dependent importance of the cluster. The work proposes a novel algorithm Clustering n-ary Merge Algorithm (CnMA) for unstructured text documents, that uses Cluster Profile and Fading Function. Experimental results illustrating the effectiveness of the proposed technique are also included.Keywords: Clustering, Text Mining, Unstructured TextDocuments, Fading Function.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1985