Search results for: Data mining andInformation Extraction
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 8133

Search results for: Data mining andInformation Extraction

7863 Acute Coronary Syndrome Prediction Using Data Mining Techniques- An Application

Authors: Tahseen A. Jilani, Huda Yasin, Madiha Yasin, C. Ardil

Abstract:

In this paper we use data mining techniques to investigate factors that contribute significantly to enhancing the risk of acute coronary syndrome. We assume that the dependent variable is diagnosis – with dichotomous values showing presence or  absence of disease. We have applied binary regression to the factors affecting the dependent variable. The data set has been taken from two different cardiac hospitals of Karachi, Pakistan. We have total sixteen variables out of which one is assumed dependent and other 15 are independent variables. For better performance of the regression model in predicting acute coronary syndrome, data reduction techniques like principle component analysis is applied. Based on results of data reduction, we have considered only 14 out of sixteen factors.

Keywords: Acute coronary syndrome (ACS), binary logistic regression analyses, myocardial ischemia (MI), principle component analysis, unstable angina (U.A.).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2115
7862 Data Mining Determination of Sunlight Average Input for Solar Power Plant

Authors: Fl. Loury, P. Sablonière, C. Lamoureux, G. Magnier, Th. Gutierrez

Abstract:

A method is proposed to extract faithful representative patterns from data set of observations when they are suffering from non-negligible fluctuations. Supposing time interval between measurements to be extremely small compared to observation time, it consists in defining first a subset of intermediate time intervals characterizing coherent behavior. Data projection on these intervals gives a set of curves out of which an ideally “perfect” one is constructed by taking the sup limit of them. Then comparison with average real curve in corresponding interval gives an efficiency parameter expressing the degradation consecutive to fluctuation effect. The method is applied to sunlight data collected in a specific place, where ideal sunlight is the one resulting from direct exposure at location latitude over the year, and efficiency is resulting from action of meteorological parameters, mainly cloudiness, at different periods of the year. The extracted information already gives interesting element of decision, before being used for analysis of plant control.

Keywords: Base Input Reconstruction, Data Mining, Efficiency Factor, Information Pattern Operator.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1528
7861 Comparison of Different Solvents and Extraction Methods for Isolation of Phenolic Compounds from Horseradish Roots (Armoracia rusticana)

Authors: Lolita Tomsone, Zanda Kruma, Ruta Galoburda

Abstract:

Horseradish (Armoracia rusticana) is a perennial herb belonging to the Brassicaceae family and contains biologically active substances. The aim of the current research was to determine best method for extraction of phenolic compounds from horseradish roots showing high antiradical activity. Three genotypes (No. 105; No. 106 and variety ‘Turku’) of horseradish roots were extracted with eight different solvents: n-hexane, ethyl acetate, diethyl ether, 2-propanol, acetone, ethanol (95%), ethanol / water / acetic acid (80/20/1 v/v/v) and ethanol / water (80/20 by volume) using two extraction methods (conventional and Soxhlet). As the best solvents ethanol and ethanol / water solutions can be chosen. Although in Soxhlet extracts TPC was higher, scavenging activity of DPPH˙ radicals did not increase. It can be concluded that using Soxhlet extraction method more compounds that are not effective antioxidants.

Keywords: DPPH˙, extraction, solvent, Soxhlet, TPC

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14497
7860 Using the Combined Model of PROMETHEE and Fuzzy Analytic Network Process for Determining Question Weights in Scientific Exams through Data Mining Approach

Authors: Hassan Haleh, Amin Ghaffari, Parisa Farahpour

Abstract:

Need for an appropriate system of evaluating students- educational developments is a key problem to achieve the predefined educational goals. Intensity of the related papers in the last years; that tries to proof or disproof the necessity and adequacy of the students assessment; is the corroborator of this matter. Some of these studies tried to increase the precision of determining question weights in scientific examinations. But in all of them there has been an attempt to adjust the initial question weights while the accuracy and precision of those initial question weights are still under question. Thus In order to increase the precision of the assessment process of students- educational development, the present study tries to propose a new method for determining the initial question weights by considering the factors of questions like: difficulty, importance and complexity; and implementing a combined method of PROMETHEE and fuzzy analytic network process using a data mining approach to improve the model-s inputs. The result of the implemented case study proves the development of performance and precision of the proposed model.

Keywords: Assessing students, Analytic network process, Clustering, Data mining, Fuzzy sets, Multi-criteria decision making, and Preference function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1582
7859 BIDENS: Iterative Density Based Biclustering Algorithm With Application to Gene Expression Analysis

Authors: Mohamed A. Mahfouz, M. A. Ismail

Abstract:

Biclustering is a very useful data mining technique for identifying patterns where different genes are co-related based on a subset of conditions in gene expression analysis. Association rules mining is an efficient approach to achieve biclustering as in BIMODULE algorithm but it is sensitive to the value given to its input parameters and the discretization procedure used in the preprocessing step, also when noise is present, classical association rules miners discover multiple small fragments of the true bicluster, but miss the true bicluster itself. This paper formally presents a generalized noise tolerant bicluster model, termed as μBicluster. An iterative algorithm termed as BIDENS based on the proposed model is introduced that can discover a set of k possibly overlapping biclusters simultaneously. Our model uses a more flexible method to partition the dimensions to preserve meaningful and significant biclusters. The proposed algorithm allows discovering biclusters that hard to be discovered by BIMODULE. Experimental study on yeast, human gene expression data and several artificial datasets shows that our algorithm offers substantial improvements over several previously proposed biclustering algorithms.

Keywords: Machine learning, biclustering, bi-dimensional clustering, gene expression analysis, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1964
7858 Edge-end Pixel Extraction for Edge-based Image Segmentation

Authors: Mahinda P. Pathegama, Özdemir Göl

Abstract:

Extraction of edge-end-pixels is an important step for the edge linking process to achieve edge-based image segmentation. This paper presents an algorithm to extract edge-end pixels together with their directional sensitivities as an augmentation to the currently available mathematical models. The algorithm is implemented in the Java environment because of its inherent compatibility with web interfaces since its main use is envisaged to be for remote image analysis on a virtual instrumentation platform.

Keywords: edge-end pixels, image processing, imagesegmentation, pixel extraction

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2154
7857 Study on Extraction of Ceric Oxide from Monazite Concentrate

Authors: Lwin Thuzar Shwe, Nwe Nwe Soe, Kay Thi Lwin

Abstract:

Cerium oxide is to be recovered from monazite, which contains about 27.35% CeO2. The principal objective of this study is to be able to extract cerium oxide from monazite of Moemeik Myitsone Area. The treatment of monazite in this study involves three main steps; extraction of cerium hydroxide from monazite, solvent extraction of cerium hydroxide, and precipitation with oxalic acid and calcination of cerium oxalate.

Keywords: Calcination, Digestion, Precipitation, SolventExtraction

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2589
7856 Q-Map: Clinical Concept Mining from Clinical Documents

Authors: Sheikh Shams Azam, Manoj Raju, Venkatesh Pagidimarri, Vamsi Kasivajjala

Abstract:

Over the past decade, there has been a steep rise in the data-driven analysis in major areas of medicine, such as clinical decision support system, survival analysis, patient similarity analysis, image analytics etc. Most of the data in the field are well-structured and available in numerical or categorical formats which can be used for experiments directly. But on the opposite end of the spectrum, there exists a wide expanse of data that is intractable for direct analysis owing to its unstructured nature which can be found in the form of discharge summaries, clinical notes, procedural notes which are in human written narrative format and neither have any relational model nor any standard grammatical structure. An important step in the utilization of these texts for such studies is to transform and process the data to retrieve structured information from the haystack of irrelevant data using information retrieval and data mining techniques. To address this problem, the authors present Q-Map in this paper, which is a simple yet robust system that can sift through massive datasets with unregulated formats to retrieve structured information aggressively and efficiently. It is backed by an effective mining technique which is based on a string matching algorithm that is indexed on curated knowledge sources, that is both fast and configurable. The authors also briefly examine its comparative performance with MetaMap, one of the most reputed tools for medical concepts retrieval and present the advantages the former displays over the latter.

Keywords: Information retrieval (IR), unified medical language system (UMLS), Syntax Based Analysis, natural language processing (NLP), medical informatics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 779
7855 Deep iCrawl: An Intelligent Vision-Based Deep Web Crawler

Authors: R.Anita, V.Ganga Bharani, N.Nityanandam, Pradeep Kumar Sahoo

Abstract:

The explosive growth of World Wide Web has posed a challenging problem in extracting relevant data. Traditional web crawlers focus only on the surface web while the deep web keeps expanding behind the scene. Deep web pages are created dynamically as a result of queries posed to specific web databases. The structure of the deep web pages makes it impossible for traditional web crawlers to access deep web contents. This paper, Deep iCrawl, gives a novel and vision-based approach for extracting data from the deep web. Deep iCrawl splits the process into two phases. The first phase includes Query analysis and Query translation and the second covers vision-based extraction of data from the dynamically created deep web pages. There are several established approaches for the extraction of deep web pages but the proposed method aims at overcoming the inherent limitations of the former. This paper also aims at comparing the data items and presenting them in the required order.

Keywords: Crawler, Deep web, Web Database

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2156
7854 GA Based Optimal Feature Extraction Method for Functional Data Classification

Authors: Jun Wan, Zehua Chen, Yingwu Chen, Zhidong Bai

Abstract:

Classification is an interesting problem in functional data analysis (FDA), because many science and application problems end up with classification problems, such as recognition, prediction, control, decision making, management, etc. As the high dimension and high correlation in functional data (FD), it is a key problem to extract features from FD whereas keeping its global characters, which relates to the classification efficiency and precision to heavens. In this paper, a novel automatic method which combined Genetic Algorithm (GA) and classification algorithm to extract classification features is proposed. In this method, the optimal features and classification model are approached via evolutional study step by step. It is proved by theory analysis and experiment test that this method has advantages in improving classification efficiency, precision and robustness whereas using less features and the dimension of extracted classification features can be controlled.

Keywords: Classification, functional data, feature extraction, genetic algorithm, wavelet.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1555
7853 Human Digital Twin for Personal Conversation Automation Using Supervised Machine Learning Approaches

Authors: Aya Salama

Abstract:

Digital Twin has emerged as a compelling research area, capturing the attention of scholars over the past decade. It finds applications across diverse fields, including smart manufacturing and healthcare, offering significant time and cost savings. Notably, it often intersects with other cutting-edge technologies such as Data Mining, Artificial Intelligence, and Machine Learning. However, the concept of a Human Digital Twin (HDT) is still in its infancy and requires further demonstration of its practicality. HDT takes the notion of Digital Twin a step further by extending it to living entities, notably humans, who are vastly different from inanimate physical objects. The primary objective of this research was to create an HDT capable of automating real-time human responses by simulating human behavior. To achieve this, the study delved into various areas, including clustering, supervised classification, topic extraction, and sentiment analysis. The paper successfully demonstrated the feasibility of HDT for generating personalized responses in social messaging applications. Notably, the proposed approach achieved an overall accuracy of 63%, a highly promising result that could pave the way for further exploration of the HDT concept. The methodology employed Random Forest for clustering the question database and matching new questions, while K-nearest neighbor was utilized for sentiment analysis.

Keywords: Human Digital twin, sentiment analysis, topic extraction, supervised machine learning, unsupervised machine learning, classification and clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 188
7852 A Recommender System Fusing Collaborative Filtering and User’s Review Mining

Authors: Seulbi Choi, Hyunchul Ahn

Abstract:

Collaborative filtering (CF) algorithm has been popularly used for recommender systems in both academic and practical applications. It basically generates recommendation results using users’ numeric ratings. However, the additional use of the information other than user ratings may lead to better accuracy of CF. Considering that a lot of people are likely to share their honest opinion on the items they purchased recently due to the advent of the Web 2.0, user's review can be regarded as the new informative source for identifying user's preference with accuracy. Under this background, this study presents a hybrid recommender system that fuses CF and user's review mining. Our system adopts conventional memory-based CF, but it is designed to use both user’s numeric ratings and his/her text reviews on the items when calculating similarities between users.

Keywords: Recommender system, collaborative filtering, text mining, review mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1587
7851 n-Butanol as an Extractant for Lactic Acid Recovery

Authors: Kanungnit Chawong, Panarat Rattanaphanee

Abstract:

Extraction of lactic acid from aqueous solution using n-butanol as an extractant was studied. Effect of mixing time, pH of the aqueous solution, initial lactic acid concentration, and volume ratio between the organic and the aqueous phase were investigated. Distribution coefficient and degree of lactic acid extraction was found to increase when the pH of aqueous solution was decreased. The pH Effect was substantially pronounced at pH of the aqueous solution less than 1. Initial lactic acid concentration and organic-toaqueous volume ratio appeared to have positive effect on the distribution coefficient and the degree of extraction. Due to the nature of n-butanol that is partially miscible in water, incorporation of aqueous solution into organic phase was observed in the extraction with large organic-to-aqueous volume ratio.

Keywords: Lactic acid, liquid-liquid extraction, n-Butanol, Solvating extractant.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3173
7850 Optimization of Process Parameters using Response Surface Methodology for the Removal of Zinc(II) by Solvent Extraction

Authors: B. Guezzen, M.A. Didi, B. Medjahed

Abstract:

A factorial design of experiments and a response surface methodology were implemented to investigate the liquid-liquid extraction process of zinc (II) from acetate medium using the 1-Butyl-imidazolium di(2-ethylhexyl) phosphate [BIm+][D2EHP-]. The optimization process of extraction parameters such as the initial pH effect (2.5, 4.5, and 6.6), ionic liquid concentration (1, 5.5, and 10 mM) and salt effect (0.01, 5, and 10 mM) was carried out using a three-level full factorial design (33). The results of the factorial design demonstrate that all these factors are statistically significant, including the square effects of pH and ionic liquid concentration. The results showed that the order of significance: IL concentration > salt effect > initial pH. Analysis of variance (ANOVA) showing high coefficient of determination (R2 = 0.91) and low probability values (P < 0.05) signifies the validity of the predicted second-order quadratic model for Zn (II) extraction. The optimum conditions for the extraction of zinc (II) at the constant temperature (20 °C), initial Zn (II) concentration (1mM) and A/O ratio of unity were: initial pH (4.8), extractant concentration (9.9 mM), and NaCl concentration (8.2 mM). At the optimized condition, the metal ion could be quantitatively extracted.

Keywords: Ionic liquid, response surface methodology, solvent extraction, zinc acetate.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1152
7849 Performance Comparison of ADTree and Naive Bayes Algorithms for Spam Filtering

Authors: Thanh Nguyen, Andrei Doncescu, Pierre Siegel

Abstract:

Classification is an important data mining technique and could be used as data filtering in artificial intelligence. The broad application of classification for all kind of data leads to be used in nearly every field of our modern life. Classification helps us to put together different items according to the feature items decided as interesting and useful. In this paper, we compare two classification methods Naïve Bayes and ADTree use to detect spam e-mail. This choice is motivated by the fact that Naive Bayes algorithm is based on probability calculus while ADTree algorithm is based on decision tree. The parameter settings of the above classifiers use the maximization of true positive rate and minimization of false positive rate. The experiment results present classification accuracy and cost analysis in view of optimal classifier choice for Spam Detection. It is point out the number of attributes to obtain a tradeoff between number of them and the classification accuracy.

Keywords: Classification, data mining, spam filtering, naive Bayes, decision tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1500
7848 Analysis of Road Repairs in Undermined Areas

Authors: Tomáš Seidler, Marek Mihola, Denisa Cihlarova

Abstract:

The article presents analysis results of maps of expected subsidence in undermined areas for road repair management. The analysis was done in the area of Karvina district in the Czech Republic, including undermined areas with ongoing deep mining activities or finished deep mining in years 2003 - 2009. The article discusses the possibilities of local road maintenance authorities to determine areas that will need most repairs in the future with limited data available. Using the expected subsidence maps new map of surface curvature was calculated. Combined with road maps and historical data about repairs the result came for five main categories of undermined areas, proving very simple tool for management.

Keywords: GIS, Map of Subsidence, Road, Undermined Area

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1327
7847 Optimization of Air Pollution Control Model for Mining

Authors: Zunaira Asif, Zhi Chen

Abstract:

The sustainable measures on air quality management are recognized as one of the most serious environmental concerns in the mining region. The mining operations emit various types of pollutants which have significant impacts on the environment. This study presents a stochastic control strategy by developing the air pollution control model to achieve a cost-effective solution. The optimization method is formulated to predict the cost of treatment using linear programming with an objective function and multi-constraints. The constraints mainly focus on two factors which are: production of metal should not exceed the available resources, and air quality should meet the standard criteria of the pollutant. The applicability of this model is explored through a case study of an open pit metal mine, Utah, USA. This method simultaneously uses meteorological data as a dispersion transfer function to support the practical local conditions. The probabilistic analysis and the uncertainties in the meteorological conditions are accomplished by Monte Carlo simulation. Reasonable results have been obtained to select the optimized treatment technology for PM2.5, PM10, NOx, and SO2. Additional comparison analysis shows that baghouse is the least cost option as compared to electrostatic precipitator and wet scrubbers for particulate matter, whereas non-selective catalytical reduction and dry-flue gas desulfurization are suitable for NOx and SO2 reduction respectively. Thus, this model can aid planners to reduce these pollutants at a marginal cost by suggesting control pollution devices, while accounting for dynamic meteorological conditions and mining activities.

Keywords: Air pollution, linear programming, mining, optimization, treatment technologies.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1607
7846 The Use of Classifiers in Image Analysis of Oil Wells Profiling Process and the Automatic Identification of Events

Authors: Jaqueline M. R. Vieira

Abstract:

Different strategies and tools are available at the oil and gas industry for detecting and analyzing tension and possible fractures in borehole walls. Most of these techniques are based on manual observation of the captured borehole images. While this strategy may be possible and convenient with small images and few data, it may become difficult and suitable to errors when big databases of images must be treated. While the patterns may differ among the image area, depending on many characteristics (drilling strategy, rock components, rock strength, etc.). In this work we propose the inclusion of data-mining classification strategies in order to create a knowledge database of the segmented curves. These classifiers allow that, after some time using and manually pointing parts of borehole images that correspond to tension regions and breakout areas, the system will indicate and suggest automatically new candidate regions, with higher accuracy. We suggest the use of different classifiers methods, in order to achieve different knowledge dataset configurations.

Keywords: Brazil, classifiers, data-mining, Image Segmentation, oil well visualization, classifiers.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2544
7845 Synthesis and Use of Thiourea Derivative (1-Phenyl-3- Benzoyl-2-Thiourea) for Extraction of Cadmium Ion

Authors: Abdulfattah M. Alkherraz, Zaineb I. Lusta, Ahmed E. Zubi

Abstract:

The environmental pollution by heavy metals became  more problematic nowadays. To solve the problem of Cadmium  accumulation in human organs which lead to dangerous effects on  human health, and to determine its concentration, the organic legand  1-phenyl-3-benzoyl-2-thiourea was used to extract the cadmium ions  from its solution. This legand as one of thiourea derivatives was  successfully synthesized. The legand was characterized by NMR and  CHN elemental analysis, and used to extract the cadmium from its  solutions by formation of a stable complex at neutral pH. The  complex was characterized by elemental analysis and melting point.  The concentrations of cadmium ions before and after the extraction  were determined by Atomic Absorption Spectrophotometer (AAS).  The data show the percentage of the extract was more than 98.7% of  the concentration of cadmium used in the study

Keywords: Thiourea derivatives, cadmium extraction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7171
7844 Comparative Study of Decision Trees and Rough Sets Theory as Knowledge ExtractionTools for Design and Control of Industrial Processes

Authors: Marcin Perzyk, Artur Soroczynski

Abstract:

General requirements for knowledge representation in the form of logic rules, applicable to design and control of industrial processes, are formulated. Characteristic behavior of decision trees (DTs) and rough sets theory (RST) in rules extraction from recorded data is discussed and illustrated with simple examples. The significance of the models- drawbacks was evaluated, using simulated and industrial data sets. It is concluded that performance of DTs may be considerably poorer in several important aspects, compared to RST, particularly when not only a characterization of a problem is required, but also detailed and precise rules are needed, according to actual, specific problems to be solved.

Keywords: Knowledge extraction, decision trees, rough setstheory, industrial processes.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1633
7843 Response Surface Modeling of Lactic Acid Extraction by Emulsion Liquid Membrane: Box-Behnken Experimental Design

Authors: A. Thakur, P. S. Panesar, M. S. Saini

Abstract:

Extraction of lactic acid by emulsion liquid membrane technology (ELM) using n-trioctyl amine (TOA) in n-heptane as carrier within the organic membrane along with sodium carbonate as acceptor phase was optimized by using response surface methodology (RSM). A three level Box-Behnken design was employed for experimental design, analysis of the results and to depict the combined effect of five independent variables, vizlactic acid concentration in aqueous phase (cl), sodium carbonate concentration in stripping phase (cs), carrier concentration in membrane phase (ψ), treat ratio, and batch extraction time (τ)  with equal volume of organic and external aqueous phase on lactic acid extraction efficiency. The maximum lactic acid extraction efficiency (ηext) of 98.21%from aqueous phase in a batch reactor using ELM was found at the optimized values for test variables, cl, cs, ψ, and τ as 0.06 [M], 0.18 [M], 4.72 (%,v/v), 1.98 (v/v) and 13.36 min respectively. 

Keywords: Emulsion liquid membrane, extraction, lactic acid, n-trioctylamine, response surface methodology.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2323
7842 PIELG: A Protein Interaction Extraction Systemusing a Link Grammar Parser from Biomedical Abstracts

Authors: Rania A. Abul Seoud, Nahed H. Solouma, Abou-Baker M. Youssef, Yasser M. Kadah

Abstract:

Due to the ever growing amount of publications about protein-protein interactions, information extraction from text is increasingly recognized as one of crucial technologies in bioinformatics. This paper presents a Protein Interaction Extraction System using a Link Grammar Parser from biomedical abstracts (PIELG). PIELG uses linkage given by the Link Grammar Parser to start a case based analysis of contents of various syntactic roles as well as their linguistically significant and meaningful combinations. The system uses phrasal-prepositional verbs patterns to overcome preposition combinations problems. The recall and precision are 74.4% and 62.65%, respectively. Experimental evaluations with two other state-of-the-art extraction systems indicate that PIELG system achieves better performance. For further evaluation, the system is augmented with a graphical package (Cytoscape) for extracting protein interaction information from sequence databases. The result shows that the performance is remarkably promising.

Keywords: Link Grammar Parser, Interaction extraction, protein-protein interaction, Natural language processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2254
7841 A Comprehensive Review on Different Mixed Data Clustering Ensemble Methods

Authors: S. Sarumathi, N. Shanthi, S. Vidhya, M. Sharmila

Abstract:

An extensive amount of work has been done in data clustering research under the unsupervised learning technique in Data Mining during the past two decades. Moreover, several approaches and methods have been emerged focusing on clustering diverse data types, features of cluster models and similarity rates of clusters. However, none of the single clustering algorithm exemplifies its best nature in extracting efficient clusters. Consequently, in order to rectify this issue, a new challenging technique called Cluster Ensemble method was bloomed. This new approach tends to be the alternative method for the cluster analysis problem. The main objective of the Cluster Ensemble is to aggregate the diverse clustering solutions in such a way to attain accuracy and also to improve the eminence the individual clustering algorithms. Due to the massive and rapid development of new methods in the globe of data mining, it is highly mandatory to scrutinize a vital analysis of existing techniques and the future novelty. This paper shows the comparative analysis of different cluster ensemble methods along with their methodologies and salient features. Henceforth this unambiguous analysis will be very useful for the society of clustering experts and also helps in deciding the most appropriate one to resolve the problem in hand.

Keywords: Clustering, Cluster Ensemble Methods, Coassociation matrix, Consensus Function, Median Partition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2105
7840 Anomaly Based On Frequent-Outlier for Outbreak Detection in Public Health Surveillance

Authors: Zalizah Awang Long, Abdul Razak Hamdan, Azuraliza Abu Bakar

Abstract:

Public health surveillance system focuses on outbreak detection and data sources used. Variation or aberration in the frequency distribution of health data, compared to historical data is often used to detect outbreaks. It is important that new techniques be developed to improve the detection rate, thereby reducing wastage of resources in public health. Thus, the objective is to developed technique by applying frequent mining and outlier mining techniques in outbreak detection. 14 datasets from the UCI were tested on the proposed technique. The performance of the effectiveness for each technique was measured by t-test. The overall performance shows that DTK can be used to detect outlier within frequent dataset. In conclusion the outbreak detection technique using anomaly-based on frequent-outlier technique can be used to identify the outlier within frequent dataset.

Keywords: Outlier detection, frequent-outlier, outbreak, anomaly, surveillance, public health

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2275
7839 Automatic Extraction of Roads from High Resolution Aerial and Satellite Images with Heavy Noise

Authors: Yan Li, Ronald Briggs

Abstract:

Aerial and satellite images are information rich. They are also complex to analyze. For GIS systems, many features require fast and reliable extraction of roads and intersections. In this paper, we study efficient and reliable automatic extraction algorithms to address some difficult issues that are commonly seen in high resolution aerial and satellite images, nonetheless not well addressed in existing solutions, such as blurring, broken or missing road boundaries, lack of road profiles, heavy shadows, and interfering surrounding objects. The new scheme is based on a new method, namely reference circle, to properly identify the pixels that belong to the same road and use this information to recover the whole road network. This feature is invariable to the shape and direction of roads and tolerates heavy noise and disturbances. Road extraction based on reference circles is much more noise tolerant and flexible than the previous edge-detection based algorithms. The scheme is able to extract roads reliably from images with complex contents and heavy obstructions, such as the high resolution aerial/satellite images available from Google maps.

Keywords: Automatic road extraction, Image processing, Feature extraction, GIS update, Remote sensing, Geo-referencing

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1701
7838 Exploring the Correlation between Population Distribution and Urban Heat Island under Urban Data: Taking Shenzhen Urban Heat Island as an Example

Authors: Wang Yang

Abstract:

Shenzhen is a modern city of China's reform and opening-up policy, the development of urban morphology has been established on the administration of the Chinese government. This city`s planning paradigm is primarily affected by the spatial structure and human behavior. The subjective urban agglomeration center is divided into several groups and centers. In comparisons of this effect, the city development law has better to be neglected. With the continuous development of the internet, extensive data technology has been introduced in China. Data mining and data analysis has become important tools in municipal research. Data mining has been utilized to improve data cleaning such as receiving business data, traffic data and population data. Prior to data mining, government data were collected by traditional means, then were analyzed using city-relationship research, delaying the timeliness of urban development, especially for the contemporary city. Data update speed is very fast and based on the Internet. The city's point of interest (POI) in the excavation serves as data source affecting the city design, while satellite remote sensing is used as a reference object, city analysis is conducted in both directions, the administrative paradigm of government is broken and urban research is restored. Therefore, the use of data mining in urban analysis is very important. The satellite remote sensing data of the Shenzhen city in July 2018 were measured by the satellite Modis sensor and can be utilized to perform land surface temperature inversion, and analyze city heat island distribution of Shenzhen. This article acquired and classified the data from Shenzhen by using Data crawler technology. Data of Shenzhen heat island and interest points were simulated and analyzed in the GIS platform to discover the main features of functional equivalent distribution influence. Shenzhen is located in the east-west area of China. The city’s main streets are also determined according to the direction of city development. Therefore, it is determined that the functional area of the city is also distributed in the east-west direction. The urban heat island can express the heat map according to the functional urban area. Regional POI has correspondence. The research result clearly explains that the distribution of the urban heat island and the distribution of urban POIs are one-to-one correspondence. Urban heat island is primarily influenced by the properties of the underlying surface, avoiding the impact of urban climate. Using urban POIs as analysis object, the distribution of municipal POIs and population aggregation are closely connected, so that the distribution of the population corresponded with the distribution of the urban heat island.

Keywords: POI, satellite remote sensing, the population distribution, urban heat island thermal map.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 929
7837 A Novel Approach to Optimal Cutting Tool Replacement

Authors: Cem Karacal, Sohyung Cho, William Yu

Abstract:

In metal cutting industries, mathematical/statistical models are typically used to predict tool replacement time. These off-line methods usually result in less than optimum replacement time thereby either wasting resources or causing quality problems. The few online real-time methods proposed use indirect measurement techniques and are prone to similar errors. Our idea is based on identifying the optimal replacement time using an electronic nose to detect the airborne compounds released when the tool wear reaches to a chemical substrate doped into tool material during the fabrication. The study investigates the feasibility of the idea, possible doping materials and methods along with data stream mining techniques for detection and monitoring different phases of tool wear.

Keywords: Tool condition monitoring, cutting tool replacement, data stream mining, e-Nose.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1882
7836 Improving University Operations with Data Mining: Predicting Student Performance

Authors: Mladen Dragičević, Mirjana Pejić Bach, Vanja Šimičević

Abstract:

The purpose of this paper is to develop models that would enable predicting student success. These models could improve allocation of students among colleges and optimize the newly introduced model of government subsidies for higher education. For the purpose of collecting data, an anonymous survey was carried out in the last year of undergraduate degree student population using random sampling method. Decision trees were created of which two have been chosen that were most successful in predicting student success based on two criteria: Grade Point Average (GPA) and time that a student needs to finish the undergraduate program (time-to-degree). Decision trees have been shown as a good method of classification student success and they could be even more improved by increasing survey sample and developing specialized decision trees for each type of college. These types of methods have a big potential for use in decision support systems.

Keywords: Data mining, knowledge discovery in databases, prediction models, student success.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2540
7835 A Simplified and Effective Algorithm Used to Mine Similar Processes: An Illustrated Example

Authors: Min-Hsun Kuo, Yun-Shiow Chen

Abstract:

The running logs of a process hold valuable information about its executed activity behavior and generated activity logic structure. Theses informative logs can be extracted, analyzed and utilized to improve the efficiencies of the process's execution and conduction. One of the techniques used to accomplish the process improvement is called as process mining. To mine similar processes is such an improvement mission in process mining. Rather than directly mining similar processes using a single comparing coefficient or a complicate fitness function, this paper presents a simplified heuristic process mining algorithm with two similarity comparisons that are able to relatively conform the activity logic sequences (traces) of mining processes with those of a normalized (regularized) one. The relative process conformance is to find which of the mining processes match the required activity sequences and relationships, further for necessary and sufficient applications of the mined processes to process improvements. One similarity presented is defined by the relationships in terms of the number of similar activity sequences existing in different processes; another similarity expresses the degree of the similar (identical) activity sequences among the conforming processes. Since these two similarities are with respect to certain typical behavior (activity sequences) occurred in an entire process, the common problems, such as the inappropriateness of an absolute comparison and the incapability of an intrinsic information elicitation, which are often appeared in other process conforming techniques, can be solved by the relative process comparison presented in this paper. To demonstrate the potentiality of the proposed algorithm, a numerical example is illustrated.

Keywords: process mining, process similarity, artificial intelligence, process conformance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1443
7834 Mine Production Index (MPI): New Method to Evaluate Effectiveness of Mining Machinery

Authors: Amol Lanke, Hadi Hoseinie, Behzad Ghodrati

Abstract:

OEE has been used in many industries as measure of performance. However due to limitations of original OEE, it has been modified by various researchers. OEE for mining application is special version of classic equation, carries these limitation over. In this paper it has been aimed to modify the OEE for mining application by introducing the weights to the elements of it and termed as Mine Production index (MPi). As a special application of new index MPishovel has been developed by authors. This can be used for evaluating the shovel effectiveness. Based on analysis, utilization followed by performance and availability were ranked in this order. To check the applicability of this index, a case study was done on four electrical and one hydraulic shovel in a Swedish mine. The results shows that MPishovel can evaluate production effectiveness of shovels and can determine effectiveness values in optimistic view compared to OEE. MPi with calculation not only give the effectiveness but also can predict which elements should be focused for improving the productivity.

Keywords: Mining, Overall equipment efficiency (OEE), Mine Production index, Shovels.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4744