Search results for: Data Mining
7401 A General Framework for Knowledge Discovery Using High Performance Machine Learning Algorithms
Authors: S. Nandagopalan, N. Pradeep
Abstract:
The aim of this paper is to propose a general framework for storing, analyzing, and extracting knowledge from two-dimensional echocardiographic images, color Doppler images, non-medical images, and general data sets. A number of high performance data mining algorithms have been used to carry out this task. Our framework encompasses four layers namely physical storage, object identification, knowledge discovery, user level. Techniques such as active contour model to identify the cardiac chambers, pixel classification to segment the color Doppler echo image, universal model for image retrieval, Bayesian method for classification, parallel algorithms for image segmentation, etc., were employed. Using the feature vector database that have been efficiently constructed, one can perform various data mining tasks like clustering, classification, etc. with efficient algorithms along with image mining given a query image. All these facilities are included in the framework that is supported by state-of-the-art user interface (UI). The algorithms were tested with actual patient data and Coral image database and the results show that their performance is better than the results reported already.Keywords: Active Contour, Bayesian, Echocardiographic image, Feature vector.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17137400 Linguistic Summarization of Structured Patent Data
Authors: E. Y. Igde, S. Aydogan, F. E. Boran, D. Akay
Abstract:
Patent data have an increasingly important role in economic growth, innovation, technical advantages and business strategies and even in countries competitions. Analyzing of patent data is crucial since patents cover large part of all technological information of the world. In this paper, we have used the linguistic summarization technique to prove the validity of the hypotheses related to patent data stated in the literature.Keywords: Data mining, fuzzy sets, linguistic summarization, patent data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12177399 Performance Comparison of Particle Swarm Optimization with Traditional Clustering Algorithms used in Self-Organizing Map
Authors: Anurag Sharma, Christian W. Omlin
Abstract:
Self-organizing map (SOM) is a well known data reduction technique used in data mining. It can reveal structure in data sets through data visualization that is otherwise hard to detect from raw data alone. However, interpretation through visual inspection is prone to errors and can be very tedious. There are several techniques for the automatic detection of clusters of code vectors found by SOM, but they generally do not take into account the distribution of code vectors; this may lead to unsatisfactory clustering and poor definition of cluster boundaries, particularly where the density of data points is low. In this paper, we propose the use of an adaptive heuristic particle swarm optimization (PSO) algorithm for finding cluster boundaries directly from the code vectors obtained from SOM. The application of our method to several standard data sets demonstrates its feasibility. PSO algorithm utilizes a so-called U-matrix of SOM to determine cluster boundaries; the results of this novel automatic method compare very favorably to boundary detection through traditional algorithms namely k-means and hierarchical based approach which are normally used to interpret the output of SOM.Keywords: cluster boundaries, clustering, code vectors, data mining, particle swarm optimization, self-organizing maps, U-matrix.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19097398 Discovery of Time Series Event Patterns based on Time Constraints from Textual Data
Authors: Shigeaki Sakurai, Ken Ueno, Ryohei Orihara
Abstract:
This paper proposes a method that discovers time series event patterns from textual data with time information. The patterns are composed of sequences of events and each event is extracted from the textual data, where an event is characteristic content included in the textual data such as a company name, an action, and an impression of a customer. The method introduces 7 types of time constraints based on the analysis of the textual data. The method also evaluates these constraints when the frequency of a time series event pattern is calculated. We can flexibly define the time constraints for interesting combinations of events and can discover valid time series event patterns which satisfy these conditions. The paper applies the method to daily business reports collected by a sales force automation system and verifies its effectiveness through numerical experiments.
Keywords: Text mining, sequential mining, time constraints, daily business reports.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14887397 Mining Image Features in an Automatic Two-Dimensional Shape Recognition System
Authors: R. A. Salam, M.A. Rodrigues
Abstract:
The number of features required to represent an image can be very huge. Using all available features to recognize objects can suffer from curse dimensionality. Feature selection and extraction is the pre-processing step of image mining. Main issues in analyzing images is the effective identification of features and another one is extracting them. The mining problem that has been focused is the grouping of features for different shapes. Experiments have been conducted by using shape outline as the features. Shape outline readings are put through normalization and dimensionality reduction process using an eigenvector based method to produce a new set of readings. After this pre-processing step data will be grouped through their shapes. Through statistical analysis, these readings together with peak measures a robust classification and recognition process is achieved. Tests showed that the suggested methods are able to automatically recognize objects through their shapes. Finally, experiments also demonstrate the system invariance to rotation, translation, scale, reflection and to a small degree of distortion.Keywords: Image mining, feature selection, shape recognition, peak measures.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14587396 A Supervised Learning Data Mining Approach for Object Recognition and Classification in High Resolution Satellite Data
Authors: Mais Nijim, Rama Devi Chennuboyina, Waseem Al Aqqad
Abstract:
Advances in spatial and spectral resolution of satellite images have led to tremendous growth in large image databases. The data we acquire through satellites, radars, and sensors consists of important geographical information that can be used for remote sensing applications such as region planning, disaster management. Spatial data classification and object recognition are important tasks for many applications. However, classifying objects and identifying them manually from images is a difficult task. Object recognition is often considered as a classification problem, this task can be performed using machine-learning techniques. Despite of many machine-learning algorithms, the classification is done using supervised classifiers such as Support Vector Machines (SVM) as the area of interest is known. We proposed a classification method, which considers neighboring pixels in a region for feature extraction and it evaluates classifications precisely according to neighboring classes for semantic interpretation of region of interest (ROI). A dataset has been created for training and testing purpose; we generated the attributes by considering pixel intensity values and mean values of reflectance. We demonstrated the benefits of using knowledge discovery and data-mining techniques, which can be on image data for accurate information extraction and classification from high spatial resolution remote sensing imagery.Keywords: Remote sensing, object recognition, classification, data mining, waterbody identification, feature extraction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20537395 A Study of Soil Heavy Metal Pollution in the Manganese Mining in Drama, Greece
Authors: A. Argiri, A. Molla, Tzouvalekas, E. Skoufogianni, N. Danalatos
Abstract:
The release of heavy metals into the environment has increased over the last years. In this study, 25 soil samples (0-15 cm) from the fields near the mining area in Drama region were selected. The samples were analyzed in the laboratory for their physicochemical properties and for seven “pseudo-total’’ heavy metals content, namely Pb, Zn, Cd, Cr, Cu, Ni, and Mn. The total metal concentrations (Pb, Zn, Cd, Cr, Cu, Ni and Mn) in digests were determined by using the atomic absorption spectrophotometer. According to the results, the mean concentration of the listed heavy metals in 25 soil samples are Cd 1.1 mg/kg, Cr 15 mg/kg, Cu 21.7 mg/kg, Ni 30.1 mg/kg, Pd 50.8 mg/kg, Zn 99.5 mg/kg and Mn 815.3 mg/kg. The results show that the heavy metals remain in the soil even if the mining closed many years ago.
Keywords: Greece, heavy metals, mining, pollution
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5827394 Association of Smoking with Chest Radiographic and Lung Function Findings in Retired Bauxite Mining Workers
Authors: L. R. Ferreira, R. C. G. Bianchi, L. C.R. Ferreira, C. M. Galhardi, E. P. Baciuk, L. H. Oliveira
Abstract:
Inhalation hazards are associated with potentially injurious exposure and increased risk for lung diseases, within the bauxite mining industry, especially for the smelter workers. Smoking is related to decreased lung function and leads to chronic lung diseases. This study had the objective to evaluate whether smoking is related to functional and radiographic respiratory changes in retired bauxite mining workers. Methods: This was a retrospective and cross-sectional study involving the analysis of database information of 140 retired bauxite mining workers from Poços de Caldas-MG evaluated at Worker’s Health Reference Center and at the Social Security Brazilian National Institute, from July 1st, 2015 until June 30th, 2016. The workers were divided into three groups: non-smokers (n = 47), ex-smokers (n = 46), and smokers (n = 47). The data included: age, gender, spirometry results, and the presence or not of pulmonary pleural and/or parenchymal changes in chest radiographs. Chi-Squared test was used (p < 0,05). Results: In the smokers’ group, 83% of spirometry tests and 64% of chest x-rays were altered. In the non-smokers’ group, 19% of spirometry tests and 13% of chest x-rays were altered. In the ex-smokers’ group, 35% of spirometry tests and 30% of chest x-rays were altered. Most of the results were statistically significant. Results demonstrated a significant difference between smokers’ and non-smokers’ groups in regard to spirometric and radiographic pulmonary alterations. Ex-smokers’ and non-smokers’ group demonstrated better results when compared to the smokers’ group in relation to altered spirometry and radiograph findings. These data may contribute to planning strategies to enhance smoking cessation programs within the bauxite mining industry.
Keywords: Bauxite mining, spirometry, chest radiography, smoking.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7017393 Towards Clustering of Web-based Document Structures
Authors: Matthias Dehmer, Frank Emmert Streib, Jürgen Kilian, Andreas Zulauf
Abstract:
Methods for organizing web data into groups in order to analyze web-based hypertext data and facilitate data availability are very important in terms of the number of documents available online. Thereby, the task of clustering web-based document structures has many applications, e.g., improving information retrieval on the web, better understanding of user navigation behavior, improving web users requests servicing, and increasing web information accessibility. In this paper we investigate a new approach for clustering web-based hypertexts on the basis of their graph structures. The hypertexts will be represented as so called generalized trees which are more general than usual directed rooted trees, e.g., DOM-Trees. As a important preprocessing step we measure the structural similarity between the generalized trees on the basis of a similarity measure d. Then, we apply agglomerative clustering to the obtained similarity matrix in order to create clusters of hypertext graph patterns representing navigation structures. In the present paper we will run our approach on a data set of hypertext structures and obtain good results in Web Structure Mining. Furthermore we outline the application of our approach in Web Usage Mining as future work.Keywords: Clustering methods, graph-based patterns, graph similarity, hypertext structures, web structure mining
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15067392 Improved FP-growth Algorithm with Multiple Minimum Supports Using Maximum Constraints
Authors: Elsayeda M. Elgaml, Dina M. Ibrahim, Elsayed A. Sallam
Abstract:
Association rule mining is one of the most important fields of data mining and knowledge discovery. In this paper, we propose an efficient multiple support frequent pattern growth algorithm which we called “MSFP-growth” that enhancing the FPgrowth algorithm by making infrequent child node pruning step with multiple minimum support using maximum constrains. The algorithm is implemented, and it is compared with other common algorithms: Apriori-multiple minimum supports using maximum constraints and FP-growth. The experimental results show that the rule mining from the proposed algorithm are interesting and our algorithm achieved better performance than other algorithms without scarifying the accuracy.
Keywords: Association Rules, FP-growth, Multiple minimum supports, Weka Tool
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 33187391 Mining Association Rules from Unstructured Documents
Authors: Hany Mahgoub
Abstract:
This paper presents a system for discovering association rules from collections of unstructured documents called EART (Extract Association Rules from Text). The EART system treats texts only not images or figures. EART discovers association rules amongst keywords labeling the collection of textual documents. The main characteristic of EART is that the system integrates XML technology (to transform unstructured documents into structured documents) with Information Retrieval scheme (TF-IDF) and Data Mining technique for association rules extraction. EART depends on word feature to extract association rules. It consists of four phases: structure phase, index phase, text mining phase and visualization phase. Our work depends on the analysis of the keywords in the extracted association rules through the co-occurrence of the keywords in one sentence in the original text and the existing of the keywords in one sentence without co-occurrence. Experiments applied on a collection of scientific documents selected from MEDLINE that are related to the outbreak of H5N1 avian influenza virus.Keywords: Association rules, information retrieval, knowledgediscovery in text, text mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24427390 Quantification of GHGs Emissions from Electricity and Diesel Fuel Consumption in Basalt Mining Industry in Thailand
Authors: S. Kittipongvises, A. Dubsok
Abstract:
The mineral and mining industry is necessary for countries to have an adequate and reliable supply of materials to meet their socio-economic development. Despite its importance, the environmental impacts from mineral exploration are hugely significant. This study aimed to investigate and quantify the amount of GHGs emissions emitted from both electricity and diesel vehicle fuel consumption in basalt mining in Thailand. Plant A, located in the northeastern region of Thailand, was selected as a case study. Results indicated that total GHGs emissions from basalt mining and operation (Plant A) were approximately 2,501,086 kgCO2e and 1,997,412 kgCO2e in 2014 and 2015, respectively. The estimated carbon intensity ranged between 1.824 kgCO2e to 2.284 kgCO2e per ton of rock product. Scope 1 (direct emissions) was the dominant driver of its total GHGs compared to scope 2 (indirect emissions). As such, transport related combustion of diesel fuels generated the highest GHGs emission (65%) compared to emissions from purchased electricity (35%). Some of the potential implications for mining entities were also presented.
Keywords: Basalt mining, diesel fuel, electricity, GHGs emissions, Thailand.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10557389 Improving Classification Accuracy with Discretization on Datasets Including Continuous Valued Features
Authors: Mehmet Hacibeyoglu, Ahmet Arslan, Sirzat Kahramanli
Abstract:
This study analyzes the effect of discretization on classification of datasets including continuous valued features. Six datasets from UCI which containing continuous valued features are discretized with entropy-based discretization method. The performance improvement between the dataset with original features and the dataset with discretized features is compared with k-nearest neighbors, Naive Bayes, C4.5 and CN2 data mining classification algorithms. As the result the classification accuracies of the six datasets are improved averagely by 1.71% to 12.31%.Keywords: Data mining classification algorithms, entropy-baseddiscretization method
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24617388 Cirrhosis Mortality Prediction as Classification Using Frequent Subgraph Mining
Authors: Abdolghani Ebrahimi, Diego Klabjan, Chenxi Ge, Daniela Ladner, Parker Stride
Abstract:
In this work, we use machine learning and data analysis techniques to predict the one-year mortality of cirrhotic patients. Data from 2,322 patients with liver cirrhosis are collected at a single medical center. Different machine learning models are applied to predict one-year mortality. A comprehensive feature space including demographic information, comorbidity, clinical procedure and laboratory tests is being analyzed. A temporal pattern mining technic called Frequent Subgraph Mining (FSM) is being used. Model for End-stage liver disease (MELD) prediction of mortality is used as a comparator. All of our models statistically significantly outperform the MELD-score model and show an average 10% improvement of the area under the curve (AUC). The FSM technic itself does not improve the model significantly, but FSM, together with a machine learning technique called an ensemble, further improves the model performance. With the abundance of data available in healthcare through electronic health records (EHR), existing predictive models can be refined to identify and treat patients at risk for higher mortality. However, due to the sparsity of the temporal information needed by FSM, the FSM model does not yield significant improvements. Our work applies modern machine learning algorithms and data analysis methods on predicting one-year mortality of cirrhotic patients and builds a model that predicts one-year mortality significantly more accurate than the MELD score. We have also tested the potential of FSM and provided a new perspective of the importance of clinical features.
Keywords: machine learning, liver cirrhosis, subgraph mining, supervised learning
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4497387 Using Data Mining Techniques for Finding Cardiac Outlier Patients
Authors: Farhan Ismaeel Dakheel, Raoof Smko, K. Negrat, Abdelsalam Almarimi
Abstract:
In this paper we used data mining techniques to identify outlier patients who are using large amount of drugs over a long period of time. Any healthcare or health insurance system should deal with the quantities of drugs utilized by chronic diseases patients. In Kingdom of Bahrain, about 20% of health budget is spent on medications. For the managers of healthcare systems, there is no enough information about the ways of drug utilization by chronic diseases patients, is there any misuse or is there outliers patients. In this work, which has been done in cooperation with information department in the Bahrain Defence Force hospital; we select the data for Cardiac patients in the period starting from 1/1/2008 to December 31/12/2008 to be the data for the model in this paper. We used three techniques for finding the drug utilization for cardiac patients. First we applied a clustering technique, followed by measuring of clustering validity, and finally we applied a decision tree as classification algorithm. The clustering results is divided into three clusters according to the drug utilization, for 1603 patients, who received 15,806 prescriptions during this period can be partitioned into three groups, where 23 patients (2.59%) who received 1316 prescriptions (8.32%) are classified to be outliers. The classification algorithm shows that the use of average drug utilization and the age, and the gender of the patient can be considered to be the main predictive factors in the induced model.Keywords: Data Mining, Clustering, Classification, Drug Utilization..
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18987386 Road Traffic Accidents Analysis in Mexico City through Crowdsourcing Data and Data Mining Techniques
Authors: Gabriela V. Angeles Perez, Jose Castillejos Lopez, Araceli L. Reyes Cabello, Emilio Bravo Grajales, Adriana Perez Espinosa, Jose L. Quiroz Fabian
Abstract:
Road traffic accidents are among the principal causes of traffic congestion, causing human losses, damages to health and the environment, economic losses and material damages. Studies about traditional road traffic accidents in urban zones represents very high inversion of time and money, additionally, the result are not current. However, nowadays in many countries, the crowdsourced GPS based traffic and navigation apps have emerged as an important source of information to low cost to studies of road traffic accidents and urban congestion caused by them. In this article we identified the zones, roads and specific time in the CDMX in which the largest number of road traffic accidents are concentrated during 2016. We built a database compiling information obtained from the social network known as Waze. The methodology employed was Discovery of knowledge in the database (KDD) for the discovery of patterns in the accidents reports. Furthermore, using data mining techniques with the help of Weka. The selected algorithms was the Maximization of Expectations (EM) to obtain the number ideal of clusters for the data and k-means as a grouping method. Finally, the results were visualized with the Geographic Information System QGIS.Keywords: Data mining, K-means, road traffic accidents, Waze, Weka.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12157385 Discovering Complex Regularities: from Tree to Semi-Lattice Classifications
Authors: A. Faro, D. Giordano, F. Maiorana
Abstract:
Data mining uses a variety of techniques each of which is useful for some particular task. It is important to have a deep understanding of each technique and be able to perform sophisticated analysis. In this article we describe a tool built to simulate a variation of the Kohonen network to perform unsupervised clustering and support the entire data mining process up to results visualization. A graphical representation helps the user to find out a strategy to optimize classification by adding, moving or delete a neuron in order to change the number of classes. The tool is able to automatically suggest a strategy to optimize the number of classes optimization, but also support both tree classifications and semi-lattice organizations of the classes to give to the users the possibility of passing from one class to the ones with which it has some aspects in common. Examples of using tree and semi-lattice classifications are given to illustrate advantages and problems. The tool is applied to classify macroeconomic data that report the most developed countries- import and export. It is possible to classify the countries based on their economic behaviour and use the tool to characterize the commercial behaviour of a country in a selected class from the analysis of positive and negative features that contribute to classes formation. Possible interrelationships between the classes and their meaning are also discussed.Keywords: Unsupervised classification, Kohonen networks, macroeconomics, Visual data mining, Cluster interpretation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15427384 Application of Granular Computing Paradigm in Knowledge Induction
Authors: Iftikhar U. Sikder
Abstract:
This paper illustrates an application of granular computing approach, namely rough set theory in data mining. The paper outlines the formalism of granular computing and elucidates the mathematical underpinning of rough set theory, which has been widely used by the data mining and the machine learning community. A real-world application is illustrated, and the classification performance is compared with other contending machine learning algorithms. The predictive performance of the rough set rule induction model shows comparative success with respect to other contending algorithms.
Keywords: Concept approximation, granular computing, reducts, rough set theory, rule induction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8347383 Benefits and Issues of Open-Cut Coal Mining on the Socio-Economic Environment - The Iban Community in Mukah, Sarawak, Malaysia
Authors: Edward Lim
Abstract:
This paper deals principally with the socio-economic impact on the local Iban community in Mukah Division, Sarawak; with the commencement of the open-cut coal mining industry since 2003. To-date there are no actual studies being carried out by either the public or private sector to truly analyze how the Iban community is coping with the advent of a large influx of cash into their society. The Iban community has traditionally been practicing shifting cultivation and farming of domesticated animals; with a portion of the younger generation working as laborers and professional. This paper represents the views and observations of the author supported by some statistical facts extracted from published articles and non-published reports. The paper deals primarily in the following areas: • Background of the coal mining industry in Mukah Division, Sarawak; • Benefits of the coal mining industry towards the Iban community; • Issues / Problems arise in the Iban community because of the presence of the coal mining industry; and • Possible actions that need to be taken to overcome these issues/ problems.
Keywords: Coal Mining, Iban Community, Malaysia, Sub-Bituminous Coal.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24427382 AniMoveMineR: Animal Behavior Exploratory Analysis Using Association Rules Mining
Authors: Suelane Garcia Fontes, Silvio Luiz Stanzani, Pedro L. Pizzigatti Corrła Ronaldo G. Morato
Abstract:
Environmental changes and major natural disasters are most prevalent in the world due to the damage that humanity has caused to nature and these damages directly affect the lives of animals. Thus, the study of animal behavior and their interactions with the environment can provide knowledge that guides researchers and public agencies in preservation and conservation actions. Exploratory analysis of animal movement can determine the patterns of animal behavior and with technological advances the ability of animals to be tracked and, consequently, behavioral studies have been expanded. There is a lot of research on animal movement and behavior, but we note that a proposal that combines resources and allows for exploratory analysis of animal movement and provide statistical measures on individual animal behavior and its interaction with the environment is missing. The contribution of this paper is to present the framework AniMoveMineR, a unified solution that aggregates trajectory analysis and data mining techniques to explore animal movement data and provide a first step in responding questions about the animal individual behavior and their interactions with other animals over time and space. We evaluated the framework through the use of monitored jaguar data in the city of Miranda Pantanal, Brazil, in order to verify if the use of AniMoveMineR allows to identify the interaction level between these jaguars. The results were positive and provided indications about the individual behavior of jaguars and about which jaguars have the highest or lowest correlation.Keywords: Data mining, data science, trajectory, animal behavior.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9187381 Finding Fuzzy Association Rules Using FWFP-Growth with Linguistic Supports and Confidences
Authors: Chien-Hua Wang, Chin-Tzong Pang
Abstract:
In data mining, the association rules are used to search for the relations of items of the transactions database. Following the data is collected and stored, it can find rules of value through association rules, and assist manager to proceed marketing strategy and plan market framework. In this paper, we attempt fuzzy partition methods and decide membership function of quantitative values of each transaction item. Also, by managers we can reflect the importance of items as linguistic terms, which are transformed as fuzzy sets of weights. Next, fuzzy weighted frequent pattern growth (FWFP-Growth) is used to complete the process of data mining. The method above is expected to improve Apriori algorithm for its better efficiency of the whole association rules. An example is given to clearly illustrate the proposed approach.Keywords: Association Rule, Fuzzy Partition Methods, FWFP-Growth, Apiroir algorithm
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16527380 An Application for Web Mining Systems with Services Oriented Architecture
Authors: Thiago M. R. Dias, Gray F. Moita, Paulo E. M. Almeida
Abstract:
Although the World Wide Web is considered the largest source of information there exists nowadays, due to its inherent dynamic characteristics, the task of finding useful and qualified information can become a very frustrating experience. This study presents a research on the information mining systems in the Web; and proposes an implementation of these systems by means of components that can be built using the technology of Web services. This implies that they can encompass features offered by a services oriented architecture (SOA) and specific components may be used by other tools, independent of platforms or programming languages. Hence, the main objective of this work is to provide an architecture to Web mining systems, divided into stages, where each step is a component that will incorporate the characteristics of SOA. The separation of these steps was designed based upon the existing literature. Interesting results were obtained and are shown here.Keywords: Web Mining, Service Oriented Architecture, WebServices.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14727379 An Innovation of Travel Information Gathering Framework
Authors: Pairaya J., Buddhagarn R., Sukree S., Punthumadee K.
Abstract:
Application of Information Technology (IT) has revolutionized the functioning of business all over the world. Its impact has been felt mostly among the information of dependent industries. Tourism is one of such industry. The conceptual framework in this study represents an innovation of travel information searching system on mobile devices which is used as tools to deliver travel information (such as hotels, restaurants, tourist attractions and souvenir shops) for each user by travelers segmentation based on data mining technique to segment the tourists- behavior patterns then match them with tourism products and services. This system innovation is designed to be a knowledge incremental learning. It is a marketing strategy to support business to respond traveler-s demand effectively.Keywords: Tourism, Innovation, Information Searching, Data Mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18697378 Online Forums Hotspot Detection and Analysis Using Aging Theory
Authors: K. Nirmala Devi, V. Murali Bhaskaran
Abstract:
The exponential growth of social media arouses much attention on public opinion information. The online forums, blogs, micro blogs are proving to be extremely valuable resources and are having bulk volume of information. However, most of the social media data is unstructured and semi structured form. So that it is more difficult to decipher automatically. Therefore, it is very much essential to understand and analyze those data for making a right decision. The online forums hotspot detection is a promising research field in the web mining and it guides to motivate the user to take right decision in right time. The proposed system consist of a novel approach to detect a hotspot forum for any given time period. It uses aging theory to find the hot terms and E-K-means for detecting the hotspot forum. Experimental results demonstrate that the proposed approach outperforms k-means for detecting the hotspot forums with the improved accuracy.
Keywords: Hotspot forums, Micro blog, Blog, Sentiment Analysis, Opinion Mining, Social media, Twitter, Web mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21837377 Granularity Analysis for Spatio-Temporal Web Sensors
Authors: Shun Hattori
Abstract:
In recent years, many researches to mine the exploding Web world, especially User Generated Content (UGC) such as weblogs, for knowledge about various phenomena and events in the physical world have been done actively, and also Web services with the Web-mined knowledge have begun to be developed for the public. However, there are few detailed investigations on how accurately Web-mined data reflect physical-world data. It must be problematic to idolatrously utilize the Web-mined data in public Web services without ensuring their accuracy sufficiently. Therefore, this paper introduces the simplest Web Sensor and spatiotemporallynormalized Web Sensor to extract spatiotemporal data about a target phenomenon from weblogs searched by keyword(s) representing the target phenomenon, and tries to validate the potential and reliability of the Web-sensed spatiotemporal data by four kinds of granularity analyses of coefficient correlation with temperature, rainfall, snowfall, and earthquake statistics per day by region of Japan Meteorological Agency as physical-world data: spatial granularity (region-s population density), temporal granularity (time period, e.g., per day vs. per week), representation granularity (e.g., “rain" vs. “heavy rain"), and media granularity (weblogs vs. microblogs such as Tweets).Keywords: Granularity analysis, knowledge extraction, spatiotemporal data mining, Web credibility, Web mining, Web sensor.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18827376 A Novel Approach to Improve Users Search Goal in Web Usage Mining
Authors: R. Lokeshkumar, P. Sengottuvelan
Abstract:
Web mining is to discover and extract useful Information. Different users may have different search goals when they search by giving queries and submitting it to a search engine. The inference and analysis of user search goals can be very useful for providing an experience result for a user search query. In this project, we propose a novel approach to infer user search goals by analyzing search web logs. First, we propose a novel approach to infer user search goals by analyzing search engine query logs, the feedback sessions are constructed from user click-through logs and it efficiently reflect the information needed for users. Second we propose a preprocessing technique to clean the unnecessary data’s from web log file (feedback session). Third we propose a technique to generate pseudo-documents to representation of feedback sessions for clustering. Finally we implement k-medoids clustering algorithm to discover different user search goals and to provide a more optimal result for a search query based on feedback sessions for the user.Keywords: Data Preprocessing, Session Identification, Web log mining, Web Personalization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20227375 Acute Coronary Syndrome Prediction Using Data Mining Techniques- An Application
Authors: Tahseen A. Jilani, Huda Yasin, Madiha Yasin, C. Ardil
Abstract:
In this paper we use data mining techniques to investigate factors that contribute significantly to enhancing the risk of acute coronary syndrome. We assume that the dependent variable is diagnosis – with dichotomous values showing presence or absence of disease. We have applied binary regression to the factors affecting the dependent variable. The data set has been taken from two different cardiac hospitals of Karachi, Pakistan. We have total sixteen variables out of which one is assumed dependent and other 15 are independent variables. For better performance of the regression model in predicting acute coronary syndrome, data reduction techniques like principle component analysis is applied. Based on results of data reduction, we have considered only 14 out of sixteen factors.
Keywords: Acute coronary syndrome (ACS), binary logistic regression analyses, myocardial ischemia (MI), principle component analysis, unstable angina (U.A.).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21147374 Data Mining Determination of Sunlight Average Input for Solar Power Plant
Authors: Fl. Loury, P. Sablonière, C. Lamoureux, G. Magnier, Th. Gutierrez
Abstract:
A method is proposed to extract faithful representative patterns from data set of observations when they are suffering from non-negligible fluctuations. Supposing time interval between measurements to be extremely small compared to observation time, it consists in defining first a subset of intermediate time intervals characterizing coherent behavior. Data projection on these intervals gives a set of curves out of which an ideally “perfect” one is constructed by taking the sup limit of them. Then comparison with average real curve in corresponding interval gives an efficiency parameter expressing the degradation consecutive to fluctuation effect. The method is applied to sunlight data collected in a specific place, where ideal sunlight is the one resulting from direct exposure at location latitude over the year, and efficiency is resulting from action of meteorological parameters, mainly cloudiness, at different periods of the year. The extracted information already gives interesting element of decision, before being used for analysis of plant control.
Keywords: Base Input Reconstruction, Data Mining, Efficiency Factor, Information Pattern Operator.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15287373 Using the Combined Model of PROMETHEE and Fuzzy Analytic Network Process for Determining Question Weights in Scientific Exams through Data Mining Approach
Authors: Hassan Haleh, Amin Ghaffari, Parisa Farahpour
Abstract:
Need for an appropriate system of evaluating students- educational developments is a key problem to achieve the predefined educational goals. Intensity of the related papers in the last years; that tries to proof or disproof the necessity and adequacy of the students assessment; is the corroborator of this matter. Some of these studies tried to increase the precision of determining question weights in scientific examinations. But in all of them there has been an attempt to adjust the initial question weights while the accuracy and precision of those initial question weights are still under question. Thus In order to increase the precision of the assessment process of students- educational development, the present study tries to propose a new method for determining the initial question weights by considering the factors of questions like: difficulty, importance and complexity; and implementing a combined method of PROMETHEE and fuzzy analytic network process using a data mining approach to improve the model-s inputs. The result of the implemented case study proves the development of performance and precision of the proposed model.Keywords: Assessing students, Analytic network process, Clustering, Data mining, Fuzzy sets, Multi-criteria decision making, and Preference function.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15817372 BIDENS: Iterative Density Based Biclustering Algorithm With Application to Gene Expression Analysis
Authors: Mohamed A. Mahfouz, M. A. Ismail
Abstract:
Biclustering is a very useful data mining technique for identifying patterns where different genes are co-related based on a subset of conditions in gene expression analysis. Association rules mining is an efficient approach to achieve biclustering as in BIMODULE algorithm but it is sensitive to the value given to its input parameters and the discretization procedure used in the preprocessing step, also when noise is present, classical association rules miners discover multiple small fragments of the true bicluster, but miss the true bicluster itself. This paper formally presents a generalized noise tolerant bicluster model, termed as μBicluster. An iterative algorithm termed as BIDENS based on the proposed model is introduced that can discover a set of k possibly overlapping biclusters simultaneously. Our model uses a more flexible method to partition the dimensions to preserve meaningful and significant biclusters. The proposed algorithm allows discovering biclusters that hard to be discovered by BIMODULE. Experimental study on yeast, human gene expression data and several artificial datasets shows that our algorithm offers substantial improvements over several previously proposed biclustering algorithms.Keywords: Machine learning, biclustering, bi-dimensional clustering, gene expression analysis, data mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1963