Search results for: Multi-Agent Data Mining (MADM)
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7528

Search results for: Multi-Agent Data Mining (MADM)

7198 Lead and Cadmium Spatial Pattern and Risk Assessment around Coal Mine in Hyrcanian Forest, North Iran

Authors: Mahsa Tavakoli, Seyed Mohammad Hojjati, Yahya Kooch

Abstract:

In this study, the effect of coal mining activities on lead and cadmium concentrations and distribution in soil was investigated in Hyrcanian forest, North Iran. 16 plots (20×20 m2) were established by systematic-randomly (60×60 m2) in an area of 4 ha (200×200 m2-mine entrance placed at center). An area adjacent to the mine was not affected by the mining activity; considered as the controlled area. In order to investigate soil lead and cadmium concentration, one sample was taken from the 0-10 cm in each plot. To study the spatial pattern of soil properties and lead and cadmium concentrations in the mining area, an area of 80×80m2 (the mine as the center) was considered and 80 soil samples were systematic-randomly taken (10 m intervals). Geostatistical analysis was performed via Kriging method and GS+ software (version 5.1). In order to estimate the impact of coal mining activities on soil quality, pollution index was measured. Lead and cadmium concentrations were significantly higher in mine area (Pb: 10.97±0.30, Cd: 184.47±6.26 mg.kg-1) in comparison to control area (Pb: 9.42±0.17, Cd: 131.71±15.77 mg.kg-1). The mean values of the PI index indicate that Pb (1.16) and Cd (1.77) presented slightly polluted. Results of the NIPI index showed that Pb (1.44) and Cd (2.52) presented slight pollution and moderate pollution respectively. Results of variography and kriging method showed that it is possible to prepare interpolation maps of lead and cadmium around the mining areas in Hyrcanian forest. According to results of pollution and risk assessments, forest soil was contaminated by heavy metals (lead and cadmium); therefore, using reclamation and remediation techniques in these areas is necessary.

Keywords: Traditional coal mining, heavy metals, pollution indicators, geostatistics, caspian forest.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 996
7197 Growing Self Organising Map Based Exploratory Analysis of Text Data

Authors: Sumith Matharage, Damminda Alahakoon

Abstract:

Textual data plays an important role in the modern world. The possibilities of applying data mining techniques to uncover hidden information present in large volumes of text collections is immense. The Growing Self Organizing Map (GSOM) is a highly successful member of the Self Organising Map family and has been used as a clustering and visualisation tool across wide range of disciplines to discover hidden patterns present in the data. A comprehensive analysis of the GSOM’s capabilities as a text clustering and visualisation tool has so far not been published. These functionalities, namely map visualisation capabilities, automatic cluster identification and hierarchical clustering capabilities are presented in this paper and are further demonstrated with experiments on a benchmark text corpus.

Keywords: Text Clustering, Growing Self Organizing Map, Automatic Cluster Identification, Hierarchical Clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1905
7196 Environmental Inspection using WSANs Based on Multi-agent Coordination Method

Authors: Mohammad J. Heydari, Farnaz Derakhshan

Abstract:

In this paper, we focus on the problem of driving and herding a collection of autonomous actors to a given area. Then, a new method based on multi-agent coordination is proposed for solving the problem. In our proposed method, we assume that the environment is covered by sensors. When an event is occurred, sensors forward information to a sink node. Based on received information, the sink node will estimate the direction and the speed of movement of actors and announce the obtained value to the actors. The actors coordinate to reach the target location.

Keywords: Coordination, Environmental Inspection, Multiagent systems, Wireless Sensor and Actor Networks (WSANs)

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1399
7195 Application of a New Hybrid Optimization Algorithm on Cluster Analysis

Authors: T. Niknam, M. Nayeripour, B.Bahmani Firouzi

Abstract:

Clustering techniques have received attention in many areas including engineering, medicine, biology and data mining. The purpose of clustering is to group together data points, which are close to one another. The K-means algorithm is one of the most widely used techniques for clustering. However, K-means has two shortcomings: dependency on the initial state and convergence to local optima and global solutions of large problems cannot found with reasonable amount of computation effort. In order to overcome local optima problem lots of studies done in clustering. This paper is presented an efficient hybrid evolutionary optimization algorithm based on combining Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO), called PSO-ACO, for optimally clustering N object into K clusters. The new PSO-ACO algorithm is tested on several data sets, and its performance is compared with those of ACO, PSO and K-means clustering. The simulation results show that the proposed evolutionary optimization algorithm is robust and suitable for handing data clustering.

Keywords: Ant Colony Optimization (ACO), Data clustering, Hybrid evolutionary optimization algorithm, K-means clustering, Particle Swarm Optimization (PSO).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2165
7194 Time Series Regression with Meta-Clusters

Authors: Monika Chuchro

Abstract:

This paper presents a preliminary attempt to apply classification of time series using meta-clusters in order to improve the quality of regression models. In this case, clustering was performed as a method to obtain subgroups of time series data with normal distribution from the inflow into wastewater treatment plant data, composed of several groups differing by mean value. Two simple algorithms, K-mean and EM, were chosen as a clustering method. The Rand index was used to measure the similarity. After simple meta-clustering, a regression model was performed for each subgroups. The final model was a sum of the subgroups models. The quality of the obtained model was compared with the regression model made using the same explanatory variables, but with no clustering of data. Results were compared using determination coefficient (R2), measure of prediction accuracy- mean absolute percentage error (MAPE) and comparison on a linear chart. Preliminary results allow us to foresee the potential of the presented technique.

Keywords: Clustering, Data analysis, Data mining, Predictive models.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1907
7193 Appraisal of Methods for Identifying, Mapping, and Modelling of Fluvial Erosion in a Mining Environment

Authors: F. F. Howard, I. Yakubu, C. B. Boye, J. S. Y. Kuma

Abstract:

Natural and human activities, such as mining operations, expose the natural soil to adverse environmental conditions, leading to contamination of soil, groundwater, and surface water, which has negative effects on humans, flora, and fauna. Bare or partly exposed soil is most liable to fluvial erosion. This paper enumerates various methods used to identify, map, and model fluvial erosion in a mining environment. Classical, Artificial Intelligence (AI), and GIS methods have been reviewed. One of the many classical methods used to estimate river erosion is the Revised Universal Soil Loss Equation (RUSLE) model. The RUSLE model is easy to use. Its reliance on empirical relationships that may not always be applicable to specific circumstances or locations is a flaw. Other classical models for estimating fluvial erosion are the Soil and Water Assessment Tool (SWAT) and the Universal Soil Loss Equation (USLE). These models offer a more complete understanding of the underlying physical processes and encompass a wider range of situations. Although more difficult to utilise, they depend on the availability and dependability of input data for correctness. AI can help deal with multivariate and complex difficulties and predict soil loss with higher accuracy than traditional methods, and also be used to build unique models for identifying degraded areas. AI techniques have become popular as an alternative predictor for degraded environments. However, this research proposed a hybrid of classical, AI, and GIS methods for efficient and effective modelling of fluvial erosion.

Keywords: Fluvial erosion, classical methods, Artificial Intelligence, Geographic Information System.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 121
7192 An Attribute-Centre Based Decision Tree Classification Algorithm

Authors: Gökhan Silahtaroğlu

Abstract:

Decision tree algorithms have very important place at classification model of data mining. In literature, algorithms use entropy concept or gini index to form the tree. The shape of the classes and their closeness to each other some of the factors that affect the performance of the algorithm. In this paper we introduce a new decision tree algorithm which employs data (attribute) folding method and variation of the class variables over the branches to be created. A comparative performance analysis has been held between the proposed algorithm and C4.5.

Keywords: Classification, decision tree, split, pruning, entropy, gini.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1333
7191 Combining Bagging and Boosting

Authors: S. B. Kotsiantis, P. E. Pintelas

Abstract:

Bagging and boosting are among the most popular resampling ensemble methods that generate and combine a diversity of classifiers using the same learning algorithm for the base-classifiers. Boosting algorithms are considered stronger than bagging on noisefree data. However, there are strong empirical indications that bagging is much more robust than boosting in noisy settings. For this reason, in this work we built an ensemble using a voting methodology of bagging and boosting ensembles with 10 subclassifiers in each one. We performed a comparison with simple bagging and boosting ensembles with 25 sub-classifiers, as well as other well known combining methods, on standard benchmark datasets and the proposed technique was the most accurate.

Keywords: data mining, machine learning, pattern recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2507
7190 Plant Varieties Selection System

Authors: Kitti Koonsanit, Chuleerat Jaruskulchai, Poonsak Miphokasap, Apisit Eiumnoh

Abstract:

In the end of the day, meteorological data and environmental data becomes widely used such as plant varieties selection system. Variety plant selection for planted area is of almost importance for all crops, including varieties of sugarcane. Since sugarcane have many varieties. Variety plant non selection for planting may not be adapted to the climate or soil conditions for planted area. Poor growth, bloom drop, poor fruit, and low price are to be from varieties which were not recommended for those planted area. This paper presents plant varieties selection system for planted areas in Thailand from meteorological data and environmental data by the use of decision tree techniques. With this software developed as an environmental data analysis tool, it can analyze resulting easier and faster. Our software is a front end of WEKA that provides fundamental data mining functions such as classify, clustering, and analysis functions. It also supports pre-processing, analysis, and decision tree output with exporting result. After that, our software can export and display data result to Google maps API in order to display result and plot plant icons effectively.

Keywords: Plant varieties selection system, decision tree, expert recommendation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1754
7189 On Speeding Up Support Vector Machines: Proximity Graphs Versus Random Sampling for Pre-Selection Condensation

Authors: Xiaohua Liu, Juan F. Beltran, Nishant Mohanchandra, Godfried T. Toussaint

Abstract:

Support vector machines (SVMs) are considered to be the best machine learning algorithms for minimizing the predictive probability of misclassification. However, their drawback is that for large data sets the computation of the optimal decision boundary is a time consuming function of the size of the training set. Hence several methods have been proposed to speed up the SVM algorithm. Here three methods used to speed up the computation of the SVM classifiers are compared experimentally using a musical genre classification problem. The simplest method pre-selects a random sample of the data before the application of the SVM algorithm. Two additional methods use proximity graphs to pre-select data that are near the decision boundary. One uses k-Nearest Neighbor graphs and the other Relative Neighborhood Graphs to accomplish the task.

Keywords: Machine learning, data mining, support vector machines, proximity graphs, relative-neighborhood graphs, k-nearestneighbor graphs, random sampling, training data condensation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1879
7188 Application Methodology for the Generation of 3D Thermal Models Using UAV Photogrammety and Dual Sensors for Mining/Industrial Facilities Inspection

Authors: Javier Sedano-Cibrián, Julio Manuel de Luis-Ruiz, Rubén Pérez-Álvarez, Raúl Pereda-García, Beatriz Malagón-Picón

Abstract:

Structural inspection activities are necessary to ensure the correct functioning of infrastructures. UAV techniques have become more popular than traditional techniques. Specifically, UAV Photogrammetry allows time and cost savings. The development of this technology has permitted the use of low-cost thermal sensors in UAVs. The representation of 3D thermal models with this type of equipment is in continuous evolution. The direct processing of thermal images usually leads to errors and inaccurate results. In this paper, a methodology is proposed for the generation of 3D thermal models using dual sensors, which involves the application of RGB and thermal images in parallel. Hence, the RGB images are used as the basis for the generation of the model geometry, and the thermal images are the source of the surface temperature information that is projected onto the model. Mining/industrial facilities representations that are obtained can be used for inspection activities.

Keywords: Aerial thermography, data processing, drone, low-cost, point cloud.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 277
7187 Evaluation of the Contribution of Starting Pitchers in a Professional Baseball Team by Grey Relational Analysis

Authors: Chih-Cheng Chen, Yung-Tan Lee, Shih-Yang Lee, Shih-Kuei Huang, Tien-Tze Chen, Qiu-Jun Chen

Abstract:

The evaluation of the contribution of professional baseball starting pitchers is a complex decision-making problem that includes several quantitative attributes. It is considered a type of multi-attribute or multi-criteria decision making (MADM/MCDM) problem. This study proposes a model using the Grey Relational Analysis (GRA) to evaluate the starting pitcher contribution for teams of the Chinese Professional Baseball League. The GRA calculates the individual grey relational degree of each alternative to the positive ideal alternative. An empirical analysis was conducted to show the use of the model for the starting pitcher contribution problem. The results demonstrate the effectiveness and feasibility of the proposed model.

Keywords: Starting pitchers, Grey Relational Analysis, Chinese Professional Baseball

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1587
7186 Finding an Optimized Discriminate Function for Internet Application Recognition

Authors: E. Khorram, S.M. Mirzababaei

Abstract:

Everyday the usages of the Internet increase and simply a world of the data become accessible. Network providers do not want to let the provided services to be used in harmful or terrorist affairs, so they used a variety of methods to protect the special regions from the harmful data. One of the most important methods is supposed to be the firewall. Firewall stops the transfer of such packets through several ways, but in some cases they do not use firewall because of its blind packet stopping, high process power needed and expensive prices. Here we have proposed a method to find a discriminate function to distinguish between usual packets and harmful ones by the statistical processing on the network router logs. So an administrator can alarm to the user. This method is very fast and can be used simply in adjacent with the Internet routers.

Keywords: Data Mining, Firewall, Optimization, Packetclassification, Statistical Pattern Recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1370
7185 Probabilistic Approach as a Method Used in the Solution of Engineering Design for Biomechanics and Mining

Authors: Karel Frydrýšek

Abstract:

This paper focuses on the probabilistic numerical solution of the problems in biomechanics and mining. Applications of Simulation-Based Reliability Assessment (SBRA) Method are presented in the solution of designing of the external fixators applied in traumatology and orthopaedics (these fixators can be applied for the treatment of open and unstable fractures etc.) and in the solution of a hard rock (ore) disintegration process (i.e. the bit moves into the ore and subsequently disintegrates it, the results are compared with experiments, new design of excavation tool is proposed.

Keywords: probabilistic approach, engineering design, traumatology, rock mechanics

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1429
7184 Dual Pyramid of Agents for Image Segmentation

Authors: K. Idir, H. Merouani, Y. Tlili.

Abstract:

An effective method for the early detection of breast cancer is the mammographic screening. One of the most important signs of early breast cancer is the presence of microcalcifications. For the detection of microcalcification in a mammography image, we propose to conceive a multiagent system based on a dual irregular pyramid. An initial segmentation is obtained by an incremental approach; the result represents level zero of the pyramid. The edge information obtained by application of the Canny filter is taken into account to affine the segmentation. The edge-agents and region-agents cooper level by level of the pyramid by exploiting its various characteristics to provide the segmentation process convergence.

Keywords: Dual Pyramid, Image Segmentation, Multi-agent System, Region/Edge Cooperation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1881
7183 Network Anomaly Detection using Soft Computing

Authors: Surat Srinoy, Werasak Kurutach, Witcha Chimphlee, Siriporn Chimphlee

Abstract:

One main drawback of intrusion detection system is the inability of detecting new attacks which do not have known signatures. In this paper we discuss an intrusion detection method that proposes independent component analysis (ICA) based feature selection heuristics and using rough fuzzy for clustering data. ICA is to separate these independent components (ICs) from the monitored variables. Rough set has to decrease the amount of data and get rid of redundancy and Fuzzy methods allow objects to belong to several clusters simultaneously, with different degrees of membership. Our approach allows us to recognize not only known attacks but also to detect activity that may be the result of a new, unknown attack. The experimental results on Knowledge Discovery and Data Mining- (KDDCup 1999) dataset.

Keywords: Network security, intrusion detection, rough set, ICA, anomaly detection, independent component analysis, rough fuzzy .

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1908
7182 The Benefits of End-To-End Integrated Planning from the Mine to Client Supply for Minimizing Penalties

Authors: G. Martino, F. Silva, E. Marchal

Abstract:

The control over delivered iron ore blend characteristics is one of the most important aspects of the mining business. The iron ore price is a function of its composition, which is the outcome of the beneficiation process. So, end-to-end integrated planning of mine operations can reduce risks of penalties on the iron ore price. In a standard iron mining company, the production chain is composed of mining, ore beneficiation, and client supply. When mine planning and client supply decisions are made uncoordinated, the beneficiation plant struggles to deliver the best blend possible. Technological improvements in several fields allowed bridging the gap between departments and boosting integrated decision-making processes. Clusterization and classification algorithms over historical production data generate reasonable previsions for quality and volume of iron ore produced for each pile of run-of-mine (ROM) processed. Mathematical modeling can use those deterministic relations to propose iron ore blends that better-fit specifications within a delivery schedule. Additionally, a model capable of representing the whole production chain can clearly compare the overall impact of different decisions in the process. This study shows how flexibilization combined with a planning optimization model between the mine and the ore beneficiation processes can reduce risks of out of specification deliveries. The model capabilities are illustrated on a hypothetical iron ore mine with magnetic separation process. Finally, this study shows ways of cost reduction or profit increase by optimizing process indicators across the production chain and integrating the different plannings with the sales decisions.

Keywords: Clusterization and classification algorithms, integrated planning, optimization, mathematical modeling, penalty minimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 594
7181 An MADM Framework toward Hierarchical Production Planning in Hybrid MTS/MTO Environments

Authors: H. Rafiei, M. Rabbani

Abstract:

This paper proposes a new decision making structure to determine the appropriate product delivery strategy for different products in a manufacturing system among make-to-stock, make-toorder, and hybrid strategy. Given product delivery strategies for all products in the manufacturing system, the position of the Order Penetrating Point (OPP) can be located regarding the delivery strategies among which location of OPP in hybrid strategy is a cumbersome task. In this regard, we employ analytic network process, because there are varieties of interrelated driving factors involved in choosing the right location. Moreover, the proposed structure is augmented with fuzzy sets theory in order to cope with the uncertainty of judgments. Finally, applicability of the proposed structure is proven in practice through a real industrial case company. The numerical results demonstrate the efficiency of the proposed decision making structure in order partitioning and OPP location.

Keywords: Hybrid make-to-stock/make-to-order, Multi-attribute decision making, Order partitioning, Order penetration point.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2178
7180 Topic Modeling Using Latent Dirichlet Allocation and Latent Semantic Indexing on South African Telco Twitter Data

Authors: Phumelele P. Kubheka, Pius A. Owolawi, Gbolahan Aiyetoro

Abstract:

Twitter is one of the most popular social media platforms where users share their opinions on different subjects. Twitter can be considered a great source for mining text due to the high volumes of data generated through the platform daily. Many industries such as telecommunication companies can leverage the availability of Twitter data to better understand their markets and make an appropriate business decision. This study performs topic modeling on Twitter data using Latent Dirichlet Allocation (LDA). The obtained results are benchmarked with another topic modeling technique, Latent Semantic Indexing (LSI). The study aims to retrieve topics on a Twitter dataset containing user tweets on South African Telcos. Results from this study show that LSI is much faster than LDA. However, LDA yields better results with higher topic coherence by 8% for the best-performing model in this experiment. A higher topic coherence score indicates better performance of the model.

Keywords: Big data, latent Dirichlet allocation, latent semantic indexing, Telco, topic modeling, Twitter.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 386
7179 Customer Churn Prediction: A Cognitive Approach

Authors: Damith Senanayake, Lakmal Muthugama, Laksheen Mendis, Tiroshan Madushanka

Abstract:

Customer churn prediction is one of the most useful areas of study in customer analytics. Due to the enormous amount of data available for such predictions, machine learning and data mining have been heavily used in this domain. There exist many machine learning algorithms directly applicable for the problem of customer churn prediction, and here, we attempt to experiment on a novel approach by using a cognitive learning based technique in an attempt to improve the results obtained by using a combination of supervised learning methods, with cognitive unsupervised learning methods.

Keywords: Growing Self Organizing Maps, Kernel Methods, Churn Prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2518
7178 An Automatic Bayesian Classification System for File Format Selection

Authors: Roman Graf, Sergiu Gordea, Heather M. Ryan

Abstract:

This paper presents an approach for the classification of an unstructured format description for identification of file formats. The main contribution of this work is the employment of data mining techniques to support file format selection with just the unstructured text description that comprises the most important format features for a particular organisation. Subsequently, the file format indentification method employs file format classifier and associated configurations to support digital preservation experts with an estimation of required file format. Our goal is to make use of a format specification knowledge base aggregated from a different Web sources in order to select file format for a particular institution. Using the naive Bayes method, the decision support system recommends to an expert, the file format for his institution. The proposed methods facilitate the selection of file format and the quality of a digital preservation process. The presented approach is meant to facilitate decision making for the preservation of digital content in libraries and archives using domain expert knowledge and specifications of file formats. To facilitate decision-making, the aggregated information about the file formats is presented as a file format vocabulary that comprises most common terms that are characteristic for all researched formats. The goal is to suggest a particular file format based on this vocabulary for analysis by an expert. The sample file format calculation and the calculation results including probabilities are presented in the evaluation section.

Keywords: Data mining, digital libraries, digital preservation, file format.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1620
7177 Operational risks Classification for Information Systems with Service-Oriented Architecture (Including Loss Calculation Example)

Authors: Irina Pyrlina

Abstract:

This article presents the results of a study conducted to identify operational risks for information systems (IS) with service-oriented architecture (SOA). Analysis of current approaches to risk and system error classifications revealed that the system error classes were never used for SOA risk estimation. Additionally system error classes are not normallyexperimentally supported with realenterprise error data. Through the study several categories of various existing error classifications systems are applied and three new error categories with sub-categories are identified. As a part of operational risks a new error classification scheme is proposed for SOA applications. It is based on errors of real information systems which are service providers for application with service-oriented architecture. The proposed classification approach has been used to classify SOA system errors for two different enterprises (oil and gas industry, metal and mining industry). In addition we have conducted a research to identify possible losses from operational risks.

Keywords: Enterprise architecture, Error classification, Oil&Gas and Metal&Mining industries, Operational risks, Serviceoriented architecture

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1574
7176 Join and Meet Block Based Default Definite Decision Rule Mining from IDT and an Incremental Algorithm

Authors: Chen Wu, Jingyu Yang

Abstract:

Using maximal consistent blocks of tolerance relation on the universe in incomplete decision table, the concepts of join block and meet block are introduced and studied. Including tolerance class, other blocks such as tolerant kernel and compatible kernel of an object are also discussed at the same time. Upper and lower approximations based on those blocks are also defined. Default definite decision rules acquired from incomplete decision table are proposed in the paper. An incremental algorithm to update default definite decision rules is suggested for effective mining tasks from incomplete decision table into which data is appended. Through an example, we demonstrate how default definite decision rules based on maximal consistent blocks, join blocks and meet blocks are acquired and how optimization is done in support of discernibility matrix and discernibility function in the incomplete decision table.

Keywords: rough set, incomplete decision table, maximalconsistent block, default definite decision rule, join and meet block.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1257
7175 Total and Leachable Concentration of Trace Elements in Soil towards Human Health Risk, Related with Coal Mine in Jorong, South Kalimantan, Indonesia

Authors: Arie Pujiwati, Kengo Nakamura, Noriaki Watanabe, Takeshi Komai

Abstract:

Coal mining is well known to cause considerable environmental impacts, including trace element contamination of soil. This study aimed to assess the trace element (As, Cd, Co, Cu, Ni, Pb, Sb, and Zn) contamination of soil in the vicinity of coal mining activities, using the case study of Asam-asam River basin, South Kalimantan, Indonesia, and to assess the human health risk, incorporating total and bioavailable (water-leachable and acid-leachable) concentrations. The results show the enrichment of As and Co in soil, surpassing the background soil value. Contamination was evaluated based on the index of geo-accumulation, Igeo and the pollution index, PI. Igeo values showed that the soil was generally uncontaminated (Igeo ≤ 0), except for elevated As and Co. Mean PI for Ni and Cu indicated slight contamination. Regarding the assessment of health risks, the Hazard Index, HI showed adverse risks (HI > 1) for Ni, Co, and As. Further, Ni and As were found to pose unacceptable carcinogenic risk (risk > 1.10-5). Farming, settlement, and plantation were found to present greater risk than coal mines. These results show that coal mining activity in the study area contaminates the soils by particular elements and may pose potential human health risk in its surrounding area. This study is important for setting appropriate countermeasure actions and improving basic coal mining management in Indonesia.

Keywords: Coal mine, risk, soil, trace elements.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1129
7174 Improving Classification in Bayesian Networks using Structural Learning

Authors: Hong Choon Ong

Abstract:

Naïve Bayes classifiers are simple probabilistic classifiers. Classification extracts patterns by using data file with a set of labeled training examples and is currently one of the most significant areas in data mining. However, Naïve Bayes assumes the independence among the features. Structural learning among the features thus helps in the classification problem. In this study, the use of structural learning in Bayesian Network is proposed to be applied where there are relationships between the features when using the Naïve Bayes. The improvement in the classification using structural learning is shown if there exist relationship between the features or when they are not independent.

Keywords: Bayesian Network, Classification, Naïve Bayes, Structural Learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2561
7173 The Power of Indigenous Peoples in Decision-Making Processes of Mining Projects: The Pilbara Region

Authors: K. N. Penna, J. P. English

Abstract:

The destruction of the Juukan Gorge rock shelters in 2020 has catalysed impetus within Australian society for a significant change in engagement with Indigenous Peoples, and the approach to Indigenous cultural heritage, both within the Pilbara region and more broadly across Australia. Culture-based and people-centred approaches are inherent to inclusive sustainable development and Free, Prior, Informed Consent, outcomes encouraged by international and local recommendations on the human rights and cultural heritage preservation of Indigenous peoples. In this paper, we present an interpretive model of an evolved process for mining project development, incorporating culture-based and people-centred approaches, based on the Theory U system change method. The evolved process advocates a change in organisational mindset and culture, and a comprehensive understanding of Indigenous Peoples’ culture and values, as the foundations for increasing their influence and achieving mutually beneficial developments.

Keywords: Indigenous Engagement, mining industry, culture-based approach, people-centred approach, Theory U.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 357
7172 A Bayesian Classification System for Facilitating an Institutional Risk Profile Definition

Authors: Roman Graf, Sergiu Gordea, Heather M. Ryan

Abstract:

This paper presents an approach for easy creation and classification of institutional risk profiles supporting endangerment analysis of file formats. The main contribution of this work is the employment of data mining techniques to support set up of the most important risk factors. Subsequently, risk profiles employ risk factors classifier and associated configurations to support digital preservation experts with a semi-automatic estimation of endangerment group for file format risk profiles. Our goal is to make use of an expert knowledge base, accuired through a digital preservation survey in order to detect preservation risks for a particular institution. Another contribution is support for visualisation of risk factors for a requried dimension for analysis. Using the naive Bayes method, the decision support system recommends to an expert the matching risk profile group for the previously selected institutional risk profile. The proposed methods improve the visibility of risk factor values and the quality of a digital preservation process. The presented approach is designed to facilitate decision making for the preservation of digital content in libraries and archives using domain expert knowledge and values of file format risk profiles. To facilitate decision-making, the aggregated information about the risk factors is presented as a multidimensional vector. The goal is to visualise particular dimensions of this vector for analysis by an expert and to define its profile group. The sample risk profile calculation and the visualisation of some risk factor dimensions is presented in the evaluation section.

Keywords: linked open data, information integration, digital libraries, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 677
7171 An Efficient and Generic Hybrid Framework for High Dimensional Data Clustering

Authors: Dharmveer Singh Rajput , P. K. Singh, Mahua Bhattacharya

Abstract:

Clustering in high dimensional space is a difficult problem which is recurrent in many fields of science and engineering, e.g., bioinformatics, image processing, pattern reorganization and data mining. In high dimensional space some of the dimensions are likely to be irrelevant, thus hiding the possible clustering. In very high dimensions it is common for all the objects in a dataset to be nearly equidistant from each other, completely masking the clusters. Hence, performance of the clustering algorithm decreases. In this paper, we propose an algorithmic framework which combines the (reduct) concept of rough set theory with the k-means algorithm to remove the irrelevant dimensions in a high dimensional space and obtain appropriate clusters. Our experiment on test data shows that this framework increases efficiency of the clustering process and accuracy of the results.

Keywords: High dimensional clustering, sub-space, k-means, rough set, discernibility matrix.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1893
7170 Visual Text Analytics Technologies for Real-Time Big Data: Chronological Evolution and Issues

Authors: Siti Azrina B. A. Aziz, Siti Hafizah A. Hamid

Abstract:

New approaches to analyze and visualize data stream in real-time basis is important in making a prompt decision by the decision maker. Financial market trading and surveillance, large-scale emergency response and crowd control are some example scenarios that require real-time analytic and data visualization. This situation has led to the development of techniques and tools that support humans in analyzing the source data. With the emergence of Big Data and social media, new techniques and tools are required in order to process the streaming data. Today, ranges of tools which implement some of these functionalities are available. In this paper, we present chronological evolution evaluation of technologies for supporting of real-time analytic and visualization of the data stream. Based on the past research papers published from 2002 to 2014, we gathered the general information, main techniques, challenges and open issues. The techniques for streaming text visualization are identified based on Text Visualization Browser in chronological order. This paper aims to review the evolution of streaming text visualization techniques and tools, as well as to discuss the problems and challenges for each of identified tools.

Keywords: Information visualization, visual analytics, text mining, visual text analytics tools, big data visualization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 967
7169 DCBOR: A Density Clustering Based on Outlier Removal

Authors: A. M. Fahim, G. Saake, A. M. Salem, F. A. Torkey, M. A. Ramadan

Abstract:

Data clustering is an important data exploration technique with many applications in data mining. We present an enhanced version of the well known single link clustering algorithm. We will refer to this algorithm as DCBOR. The proposed algorithm alleviates the chain effect by removing the outliers from the given dataset. So this algorithm provides outlier detection and data clustering simultaneously. This algorithm does not need to update the distance matrix, since the algorithm depends on merging the most k-nearest objects in one step and the cluster continues grow as long as possible under specified condition. So the algorithm consists of two phases; at the first phase, it removes the outliers from the input dataset. At the second phase, it performs the clustering process. This algorithm discovers clusters of different shapes, sizes, densities and requires only one input parameter; this parameter represents a threshold for outlier points. The value of the input parameter is ranging from 0 to 1. The algorithm supports the user in determining an appropriate value for it. We have tested this algorithm on different datasets contain outlier and connecting clusters by chain of density points, and the algorithm discovers the correct clusters. The results of our experiments demonstrate the effectiveness and the efficiency of DCBOR.

Keywords: Data Clustering, Clustering Algorithms, Handling Noise, Arbitrary Shape of Clusters.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1890