Search results for: Folksonomy data mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7481

Search results for: Folksonomy data mining

7211 Evaluating 8D Reports Using Text-Mining

Authors: Benjamin Kuester, Bjoern Eilert, Malte Stonis, Ludger Overmeyer

Abstract:

Increasing quality requirements make reliable and effective quality management indispensable. This includes the complaint handling in which the 8D method is widely used. The 8D report as a written documentation of the 8D method is one of the key quality documents as it internally secures the quality standards and acts as a communication medium to the customer. In practice, however, the 8D report is mostly faulty and of poor quality. There is no quality control of 8D reports today. This paper describes the use of natural language processing for the automated evaluation of 8D reports. Based on semantic analysis and text-mining algorithms the presented system is able to uncover content and formal quality deficiencies and thus increases the quality of the complaint processing in the long term.

Keywords: 8D report, complaint management, evaluation system, text-mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 974
7210 Analysis of Causality between Defect Causes Using Association Rule Mining

Authors: Sangdeok Lee, Sangwon Han, Changtaek Hyun

Abstract:

Construction defects are major components that result in negative impacts on project performance including schedule delays and cost overruns. Since construction defects generally occur when a few associated causes combine, a thorough understanding of defect causality is required in order to more systematically prevent construction defects. To address this issue, this paper uses association rule mining (ARM) to quantify the causality between defect causes, and social network analysis (SNA) to find indirect causality among them. The suggested approach is validated with 350 defect instances from concrete works in 32 projects in Korea. The results show that the interrelationships revealed by the approach reflect the characteristics of the concrete task and the important causes that should be prevented.

Keywords: Causality, defect causes, social network analysis, association rule mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1267
7209 Dynamical Analysis of Circadian Gene Expression

Authors: Carla Layana Luis Diambra

Abstract:

Microarrays technique allows the simultaneous measurements of the expression levels of thousands of mRNAs. By mining this data one can identify the dynamics of the gene expression time series. By recourse of principal component analysis, we uncover the circadian rhythmic patterns underlying the gene expression profiles from Cyanobacterium Synechocystis. We applied PCA to reduce the dimensionality of the data set. Examination of the components also provides insight into the underlying factors measured in the experiments. Our results suggest that all rhythmic content of data can be reduced to three main components.

Keywords: circadian rhythms, clustering, gene expression, PCA.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1549
7208 A Modified Fuzzy C-Means Algorithm for Natural Data Exploration

Authors: Binu Thomas, Raju G., Sonam Wangmo

Abstract:

In Data mining, Fuzzy clustering algorithms have demonstrated advantage over crisp clustering algorithms in dealing with the challenges posed by large collections of vague and uncertain natural data. This paper reviews concept of fuzzy logic and fuzzy clustering. The classical fuzzy c-means algorithm is presented and its limitations are highlighted. Based on the study of the fuzzy c-means algorithm and its extensions, we propose a modification to the cmeans algorithm to overcome the limitations of it in calculating the new cluster centers and in finding the membership values with natural data. The efficiency of the new modified method is demonstrated on real data collected for Bhutan-s Gross National Happiness (GNH) program.

Keywords: Adaptive fuzzy clustering, clustering, fuzzy logic, fuzzy clustering, c-means.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1928
7207 Feature-Based Summarizing and Ranking from Customer Reviews

Authors: Dim En Nyaung, Thin Lai Lai Thein

Abstract:

Due to the rapid increase of Internet, web opinion sources dynamically emerge which is useful for both potential customers and product manufacturers for prediction and decision purposes. These are the user generated contents written in natural languages and are unstructured-free-texts scheme. Therefore, opinion mining techniques become popular to automatically process customer reviews for extracting product features and user opinions expressed over them. Since customer reviews may contain both opinionated and factual sentences, a supervised machine learning technique applies for subjectivity classification to improve the mining performance. In this paper, we dedicate our work is the task of opinion summarization. Therefore, product feature and opinion extraction is critical to opinion summarization, because its effectiveness significantly affects the identification of semantic relationships. The polarity and numeric score of all the features are determined by Senti-WordNet Lexicon. The problem of opinion summarization refers how to relate the opinion words with respect to a certain feature. Probabilistic based model of supervised learning will improve the result that is more flexible and effective.

Keywords: Opinion Mining, Opinion Summarization, Sentiment Analysis, Text Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2891
7206 Data Mining to Capture User-Experience: A Case Study in Notebook Product Appearance Design

Authors: Rhoann Kerh, Chen-Fu Chien, Kuo-Yi Lin

Abstract:

In the era of rapidly increasing notebook market, consumer electronics manufacturers are facing a highly dynamic and competitive environment. In particular, the product appearance is the first part for user to distinguish the product from the product of other brands. Notebook product should differ in its appearance to engage users and contribute to the user experience (UX). The UX evaluates various product concepts to find the design for user needs; in addition, help the designer to further understand the product appearance preference of different market segment. However, few studies have been done for exploring the relationship between consumer background and the reaction of product appearance. This study aims to propose a data mining framework to capture the user’s information and the important relation between product appearance factors. The proposed framework consists of problem definition and structuring, data preparation, rules generation, and results evaluation and interpretation. An empirical study has been done in Taiwan that recruited 168 subjects from different background to experience the appearance performance of 11 different portable computers. The results assist the designers to develop product strategies based on the characteristics of consumers and the product concept that related to the UX, which help to launch the products to the right customers and increase the market shares. The results have shown the practical feasibility of the proposed framework.

Keywords: Consumers Decision Making, Product Design, Rough Set Theory, User Experience.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3427
7205 Analysis of Textual Data Based On Multiple 2-Class Classification Models

Authors: Shigeaki Sakurai, Ryohei Orihara

Abstract:

This paper proposes a new method for analyzing textual data. The method deals with items of textual data, where each item is described based on various viewpoints. The method acquires 2- class classification models of the viewpoints by applying an inductive learning method to items with multiple viewpoints. The method infers whether the viewpoints are assigned to the new items or not by using the models. The method extracts expressions from the new items classified into the viewpoints and extracts characteristic expressions corresponding to the viewpoints by comparing the frequency of expressions among the viewpoints. This paper also applies the method to questionnaire data given by guests at a hotel and verifies its effect through numerical experiments.

Keywords: Text mining, Multiple viewpoints, Differential analysis, Questionnaire data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1253
7204 Multidimensional Data Mining by Means of Randomly Travelling Hyper-Ellipsoids

Authors: Pavel Y. Tabakov, Kevin Duffy

Abstract:

The present study presents a new approach to automatic data clustering and classification problems in large and complex databases and, at the same time, derives specific types of explicit rules describing each cluster. The method works well in both sparse and dense multidimensional data spaces. The members of the data space can be of the same nature or represent different classes. A number of N-dimensional ellipsoids are used for enclosing the data clouds. Due to the geometry of an ellipsoid and its free rotation in space the detection of clusters becomes very efficient. The method is based on genetic algorithms that are used for the optimization of location, orientation and geometric characteristics of the hyper-ellipsoids. The proposed approach can serve as a basis for the development of general knowledge systems for discovering hidden knowledge and unexpected patterns and rules in various large databases.

Keywords: Classification, clustering, data minig, genetic algorithms.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1731
7203 Applying Sequential Pattern Mining to Generate Block for Scheduling Problems

Authors: Meng-Hui Chen, Chen-Yu Kao, Chia-Yu Hsu, Pei-Chann Chang

Abstract:

The main idea in this paper is using sequential pattern mining to find the information which is helpful for finding high performance solutions. By combining this information, it is defined as blocks. Using the blocks to generate artificial chromosomes (ACs) could improve the structure of solutions. Estimation of Distribution Algorithms (EDAs) is adapted to solve the combinatorial problems. Nevertheless many of these approaches are advantageous for this application, but only some of them are used to enhance the efficiency of application. Generating ACs uses patterns and EDAs could increase the diversity. According to the experimental result, the algorithm which we proposed has a better performance to solve the permutation flow-shop problems.

Keywords: Combinatorial problems, Sequential Pattern Mining, Estimation of Distribution Algorithms, Artificial Chromosomes.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1673
7202 Effects of Heavy Pumping and Artificial Groundwater Recharge Pond on the Aquifer System of Langat Basin, Malaysia

Authors: R. May, K. Jinno, I. Yusoff

Abstract:

The paper aims at evaluating the effects of heavy groundwater withdrawal and artificial groundwater recharge of an ex-mining pond to the aquifer system of the Langat Basin through the three-dimensional (3D) numerical modeling. Many mining sites have been left behind from the massive mining exploitations in Malaysia during the England colonization era and from the last few decades. These sites are able to accommodate more than a million cubic meters of water from precipitation, runoff, groundwater, and river. Most of the time, the mining sites are turned into ponds for recreational activities. In the current study, an artificial groundwater recharge from an ex-mining pond in the Langat Basin was proposed due to its capacity to store >50 million m3 of water. The location of the pond is near the Langat River and opposite a steel company where >4 million gallons of groundwater is withdrawn on a daily basis. The 3D numerical simulation was developed using the Groundwater Modeling System (GMS). The calibrated model (error about 0.7 m) was utilized to simulate two scenarios (1) Case 1: artificial recharge pond with no pumping and (2) Case 2: artificial pond with pumping. The results showed that in Case 1, the pond played a very important role in supplying additional water to the aquifer and river. About 90,916 m3/d of water from the pond, 1,173 m3/d from the Langat River, and 67,424 m3/d from the direct recharge of precipitation infiltrated into the aquifer system. In Case 2, due to the abstraction of groundwater from a company, it caused a steep depression around the wells, river, and pond. The result of the water budget showed an increase rate of inflow in the pond and river with 92,493m3/d and 3,881m3/d respectively. The outcome of the current study provides useful information of the aquifer behavior of the Langat Basin.

Keywords: Groundwater and surface water interaction, groundwater modeling, GMS, artificial recharge pond, ex-mining site.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2604
7201 Heavy Metal Pollution of the Soils around the Mining Area near Shamlugh Town (Armenia) and Related Risks to the Environment

Authors: G. A. Gevorgyan, K. A. Ghazaryan, T. H. Derdzyan

Abstract:

The heavy metal pollution of the soils around the mining area near Shamlugh town and related risks to human health were assessed. The investigations showed that the soils were polluted with heavy metals that can be ranked by anthropogenic pollution degree as follows: Cu>Pb>As>Co>Ni>Zn. The main sources of the anthropogenic metal pollution of the soils were the copper mining area near Shamlugh town, the Chochkan tailings storage facility and the trucks transferring ore from the mining area. Copper pollution degree in some observation sites was unallowable for agricultural production. The total non-carcinogenic chronic hazard index (THI) values in some places, including observation sites in Shamlugh town, were above the safe level (THI<1) for children living in this territory. Although the highest heavy metal enrichment degree in the soils was registered in case of copper, however, the highest health risks to humans especially children were posed by cobalt which is explained by the fact that heavy metals have different toxicity levels and penetration characteristics.

Keywords: Armenia, copper mine, heavy metal pollution of soil, health risks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2332
7200 A Location Routing Model for the Logistic System in the Mining Collection Centers of the Northern Region of Boyacá-Colombia

Authors: Erika Ruíz, Luis Amaya, Diego Carreño

Abstract:

The main objective of this study is to design a mathematical model for the logistics of mining collection centers in the northern region of the department of Boyacá (Colombia), determining the structure that facilitates the flow of products along the supply chain. In order to achieve this, it is necessary to define a suitable design of the distribution network, taking into account the products, customer’s characteristics and the availability of information. Likewise, some other aspects must be defined, such as number and capacity of collection centers to establish, routes that must be taken to deliver products to the customers, among others. This research will use one of the operation research problems, which is used in the design of distribution networks known as Location Routing Problem (LRP).

Keywords: Location routing problem, logistic, mining collection, model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 726
7199 Unsupervised Outlier Detection in Streaming Data Using Weighted Clustering

Authors: Yogita, Durga Toshniwal

Abstract:

Outlier detection in streaming data is very challenging because streaming data cannot be scanned multiple times and also new concepts may keep evolving. Irrelevant attributes can be termed as noisy attributes and such attributes further magnify the challenge of working with data streams. In this paper, we propose an unsupervised outlier detection scheme for streaming data. This scheme is based on clustering as clustering is an unsupervised data mining task and it does not require labeled data, both density based and partitioning clustering are combined for outlier detection. In this scheme partitioning clustering is also used to assign weights to attributes depending upon their respective relevance and weights are adaptive. Weighted attributes are helpful to reduce or remove the effect of noisy attributes. Keeping in view the challenges of streaming data, the proposed scheme is incremental and adaptive to concept evolution. Experimental results on synthetic and real world data sets show that our proposed approach outperforms other existing approach (CORM) in terms of outlier detection rate, false alarm rate, and increasing percentages of outliers.

Keywords: Concept Evolution, Irrelevant Attributes, Streaming Data, Unsupervised Outlier Detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2584
7198 Learning and Evaluating Possibilistic Decision Trees using Information Affinity

Authors: Ilyes Jenhani, Salem Benferhat, Zied Elouedi

Abstract:

This paper investigates the issue of building decision trees from data with imprecise class values where imprecision is encoded in the form of possibility distributions. The Information Affinity similarity measure is introduced into the well-known gain ratio criterion in order to assess the homogeneity of a set of possibility distributions representing instances-s classes belonging to a given training partition. For the experimental study, we proposed an information affinity based performance criterion which we have used in order to show the performance of the approach on well-known benchmarks.

Keywords: Data mining from uncertain data, Decision Trees, Possibility Theory.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1461
7197 Analysis of Diverse Cluster Ensemble Techniques

Authors: S. Sarumathi, N. Shanthi, P. Ranjetha

Abstract:

Data mining is the procedure of determining interesting patterns from the huge amount of data. With the intention of accessing the data faster the most supporting processes needed is clustering. Clustering is the process of identifying similarity between data according to the individuality present in the data and grouping associated data objects into clusters. Cluster ensemble is the technique to combine various runs of different clustering algorithms to obtain a general partition of the original dataset, aiming for consolidation of outcomes from a collection of individual clustering outcomes. The performances of clustering ensembles are mainly affecting by two principal factors such as diversity and quality. This paper presents the overview about the different cluster ensemble algorithm along with their methods used in cluster ensemble to improve the diversity and quality in the several cluster ensemble related papers and shows the comparative analysis of different cluster ensemble also summarize various cluster ensemble methods. Henceforth this clear analysis will be very useful for the world of clustering experts and also helps in deciding the most appropriate one to determine the problem in hand.

Keywords: Cluster Ensemble, Consensus Function, CSPA, Diversity, HGPA, MCLA.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1791
7196 FCNN-MR: A Parallel Instance Selection Method Based on Fast Condensed Nearest Neighbor Rule

Authors: Lu Si, Jie Yu, Shasha Li, Jun Ma, Lei Luo, Qingbo Wu, Yongqi Ma, Zhengji Liu

Abstract:

Instance selection (IS) technique is used to reduce the data size to improve the performance of data mining methods. Recently, to process very large data set, several proposed methods divide the training set into some disjoint subsets and apply IS algorithms independently to each subset. In this paper, we analyze the limitation of these methods and give our viewpoint about how to divide and conquer in IS procedure. Then, based on fast condensed nearest neighbor (FCNN) rule, we propose a large data sets instance selection method with MapReduce framework. Besides ensuring the prediction accuracy and reduction rate, it has two desirable properties: First, it reduces the work load in the aggregation node; Second and most important, it produces the same result with the sequential version, which other parallel methods cannot achieve. We evaluate the performance of FCNN-MR on one small data set and two large data sets. The experimental results show that it is effective and practical.

Keywords: Instance selection, data reduction, MapReduce, kNN.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 964
7195 Arsenic Mobility from Mining Tailings of Monte San Nicolas to Presa de Mata in Guanajuato, Mexico

Authors: I. Cano-Aguilera, B. E. Rubio-Campos, G. De la Rosa, A. F. Aguilera-Alvarado

Abstract:

Mining tailings represent a generating source of rich heavy metal material with a potential danger the public health and the environment, since these metals, under certain conditions, can leach and contaminate aqueous systems that serve like supplying potable water sources. The strategy for this work is based on the observation, experimentation and the simulation that can be obtained by binding real answers of the hydrodynamic behavior of metals leached from mining tailings, and the applied mathematics that provides the logical structure to decipher the individual effects of the general physicochemical phenomenon. The case of study presented herein focuses on mining tailings deposits located in Monte San Nicolas, Guanajuato, Mexico, an abandoned mine. This was considered the contamination source that under certain physicochemical conditions can favor the metal leaching, and its transport towards aqueous systems. In addition, the cartography, meteorology, geology and the hydrodynamics and hydrological characteristics of the place, will be helpful in determining the way and the time in which these systems can interact. Preliminary results demonstrated that arsenic presents a great mobility, since this one was identified in several superficial aqueous systems of the micro watershed, as well as in sediments in concentrations that exceed the established maximum limits in the official norms. Also variations in pH and potential oxide-reduction were registered, conditions that favor the presence of different species from this element its solubility and therefore its mobility.

Keywords: Arsenic, mining tailings, transport.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1644
7194 A Preference-Based Multi-Agent Data Mining Framework for Social Network Service Users' Decision Making

Authors: Ileladewa Adeoye Abiodun, Cheng Wai Khuen

Abstract:

Multi-Agent Systems (MAS) emerged in the pursuit to improve our standard of living, and hence can manifest complex human behaviors such as communication, decision making, negotiation and self-organization. The Social Network Services (SNSs) have attracted millions of users, many of whom have integrated these sites into their daily practices. The domains of MAS and SNS have lots of similarities such as architecture, features and functions. Exploring social network users- behavior through multiagent model is therefore our research focus, in order to generate more accurate and meaningful information to SNS users. An application of MAS is the e-Auction and e-Rental services of the Universiti Cyber AgenT(UniCAT), a Social Network for students in Universiti Tunku Abdul Rahman (UTAR), Kampar, Malaysia, built around the Belief- Desire-Intention (BDI) model. However, in spite of the various advantages of the BDI model, it has also been discovered to have some shortcomings. This paper therefore proposes a multi-agent framework utilizing a modified BDI model- Belief-Desire-Intention in Dynamic and Uncertain Situations (BDIDUS), using UniCAT system as a case study.

Keywords: Distributed Data Mining, Multi-Agent Systems, Preference-Based, SNS.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1447
7193 Road Accidents Bigdata Mining and Visualization Using Support Vector Machines

Authors: Usha Lokala, Srinivas Nowduri, Prabhakar K. Sharma

Abstract:

Useful information has been extracted from the road accident data in United Kingdom (UK), using data analytics method, for avoiding possible accidents in rural and urban areas. This analysis make use of several methodologies such as data integration, support vector machines (SVM), correlation machines and multinomial goodness. The entire datasets have been imported from the traffic department of UK with due permission. The information extracted from these huge datasets forms a basis for several predictions, which in turn avoid unnecessary memory lapses. Since data is expected to grow continuously over a period of time, this work primarily proposes a new framework model which can be trained and adapt itself to new data and make accurate predictions. This work also throws some light on use of SVM’s methodology for text classifiers from the obtained traffic data. Finally, it emphasizes the uniqueness and adaptability of SVMs methodology appropriate for this kind of research work.

Keywords: Road accident, machine learning, support vector machines.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1081
7192 Mining Implicit Knowledge to Predict Political Risk by Providing Novel Framework with Using Bayesian Network

Authors: Siavash Asadi Ghajarloo

Abstract:

Nowadays predicting political risk level of country has become a critical issue for investors who intend to achieve accurate information concerning stability of the business environments. Since, most of the times investors are layman and nonprofessional IT personnel; this paper aims to propose a framework named GECR in order to help nonexpert persons to discover political risk stability across time based on the political news and events. To achieve this goal, the Bayesian Networks approach was utilized for 186 political news of Pakistan as sample dataset. Bayesian Networks as an artificial intelligence approach has been employed in presented framework, since this is a powerful technique that can be applied to model uncertain domains. The results showed that our framework along with Bayesian Networks as decision support tool, predicted the political risk level with a high degree of accuracy.

Keywords: Bayesian Networks, Data mining, GECRframework, Predicting political risk.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2130
7191 A Hybrid Approach for Thread Recommendation in MOOC Forums

Authors: Ahmad. A. Kardan, Amir Narimani, Foozhan Ataiefard

Abstract:

Recommender Systems have been developed to provide contents and services compatible to users based on their behaviors and interests. Due to information overload in online discussion forums and users diverse interests, recommending relative topics and threads is considered to be helpful for improving the ease of forum usage. In order to lead learners to find relevant information in educational forums, recommendations are even more needed. We present a hybrid thread recommender system for MOOC forums by applying social network analysis and association rule mining techniques. Initial results indicate that the proposed recommender system performs comparatively well with regard to limited available data from users' previous posts in the forum.

Keywords: Association rule mining, hybrid recommender system, massive open online courses, MOOCs, social network analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1211
7190 Agile Methodology for Modeling and Design of Data Warehouses -AM4DW-

Authors: Nieto Bernal Wilson, Carmona Suarez Edgar

Abstract:

The organizations have structured and unstructured information in different formats, sources, and systems. Part of these come from ERP under OLTP processing that support the information system, however these organizations in OLAP processing level, presented some deficiencies, part of this problematic lies in that does not exist interesting into extract knowledge from their data sources, as also the absence of operational capabilities to tackle with these kind of projects.  Data Warehouse and its applications are considered as non-proprietary tools, which are of great interest to business intelligence, since they are repositories basis for creating models or patterns (behavior of customers, suppliers, products, social networks and genomics) and facilitate corporate decision making and research. The following paper present a structured methodology, simple, inspired from the agile development models as Scrum, XP and AUP. Also the models object relational, spatial data models, and the base line of data modeling under UML and Big data, from this way sought to deliver an agile methodology for the developing of data warehouses, simple and of easy application. The methodology naturally take into account the application of process for the respectively information analysis, visualization and data mining, particularly for patterns generation and derived models from the objects facts structured.

Keywords: Data warehouse, model data, big data, object fact, object relational fact, process developed data warehouse.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1433
7189 A New Evolutionary Algorithm for Cluster Analysis

Authors: B.Bahmani Firouzi, T. Niknam, M. Nayeripour

Abstract:

Clustering is a very well known technique in data mining. One of the most widely used clustering techniques is the kmeans algorithm. Solutions obtained from this technique depend on the initialization of cluster centers and the final solution converges to local minima. In order to overcome K-means algorithm shortcomings, this paper proposes a hybrid evolutionary algorithm based on the combination of PSO, SA and K-means algorithms, called PSO-SA-K, which can find better cluster partition. The performance is evaluated through several benchmark data sets. The simulation results show that the proposed algorithm outperforms previous approaches, such as PSO, SA and K-means for partitional clustering problem.

Keywords: Data clustering, Hybrid evolutionary optimization algorithm, K-means algorithm, Simulated Annealing (SA), Particle Swarm Optimization (PSO).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2230
7188 Clustering Unstructured Text Documents Using Fading Function

Authors: Pallav Roxy, Durga Toshniwal

Abstract:

Clustering unstructured text documents is an important issue in data mining community and has a number of applications such as document archive filtering, document organization and topic detection and subject tracing. In the real world, some of the already clustered documents may not be of importance while new documents of more significance may evolve. Most of the work done so far in clustering unstructured text documents overlooks this aspect of clustering. This paper, addresses this issue by using the Fading Function. The unstructured text documents are clustered. And for each cluster a statistics structure called Cluster Profile (CP) is implemented. The cluster profile incorporates the Fading Function. This Fading Function keeps an account of the time-dependent importance of the cluster. The work proposes a novel algorithm Clustering n-ary Merge Algorithm (CnMA) for unstructured text documents, that uses Cluster Profile and Fading Function. Experimental results illustrating the effectiveness of the proposed technique are also included.

Keywords: Clustering, Text Mining, Unstructured TextDocuments, Fading Function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1939
7187 Yield Prediction Using Support Vectors Based Under-Sampling in Semiconductor Process

Authors: Sae-Rom Pak, Seung Hwan Park, Jeong Ho Cho, Daewoong An, Cheong-Sool Park, Jun Seok Kim, Jun-Geol Baek

Abstract:

It is important to predict yield in semiconductor test process in order to increase yield. In this study, yield prediction means finding out defective die, wafer or lot effectively. Semiconductor test process consists of some test steps and each test includes various test items. In other world, test data has a big and complicated characteristic. It also is disproportionably distributed as the number of data belonging to FAIL class is extremely low. For yield prediction, general data mining techniques have a limitation without any data preprocessing due to eigen properties of test data. Therefore, this study proposes an under-sampling method using support vector machine (SVM) to eliminate an imbalanced characteristic. For evaluating a performance, randomly under-sampling method is compared with the proposed method using actual semiconductor test data. As a result, sampling method using SVM is effective in generating robust model for yield prediction.

Keywords: Yield Prediction, Semiconductor Test Process, Support Vector Machine, Under Sampling

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2338
7186 Mining User-Generated Contents to Detect Service Failures with Topic Model

Authors: Kyung Bae Park, Sung Ho Ha

Abstract:

Online user-generated contents (UGC) significantly change the way customers behave (e.g., shop, travel), and a pressing need to handle the overwhelmingly plethora amount of various UGC is one of the paramount issues for management. However, a current approach (e.g., sentiment analysis) is often ineffective for leveraging textual information to detect the problems or issues that a certain management suffers from. In this paper, we employ text mining of Latent Dirichlet Allocation (LDA) on a popular online review site dedicated to complaint from users. We find that the employed LDA efficiently detects customer complaints, and a further inspection with the visualization technique is effective to categorize the problems or issues. As such, management can identify the issues at stake and prioritize them accordingly in a timely manner given the limited amount of resources. The findings provide managerial insights into how analytics on social media can help maintain and improve their reputation management. Our interdisciplinary approach also highlights several insights by applying machine learning techniques in marketing research domain. On a broader technical note, this paper illustrates the details of how to implement LDA in R program from a beginning (data collection in R) to an end (LDA analysis in R) since the instruction is still largely undocumented. In this regard, it will help lower the boundary for interdisciplinary researcher to conduct related research.

Keywords: Latent Dirichlet allocation, R program, text mining, topic model, user generated contents, visualization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1171
7185 Unsupervised Text Mining Approach to Early Warning System

Authors: Ichihan Tai, Bill Olson, Paul Blessner

Abstract:

Traditional early warning systems that alarm against crisis are generally based on structured or numerical data; therefore, a system that can make predictions based on unstructured textual data, an uncorrelated data source, is a great complement to the traditional early warning systems. The Chicago Board Options Exchange (CBOE) Volatility Index (VIX), commonly referred to as the fear index, measures the cost of insurance against market crash, and spikes in the event of crisis. In this study, news data is consumed for prediction of whether there will be a market-wide crisis by predicting the movement of the fear index, and the historical references to similar events are presented in an unsupervised manner. Topic modeling-based prediction and representation are made based on daily news data between 1990 and 2015 from The Wall Street Journal against VIX index data from CBOE.

Keywords: Early Warning System, Knowledge Management, Topic Modeling, Market Prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1869
7184 A Hybrid Recommendation System Based On Association Rules

Authors: Ahmed Mohammed K. Alsalama

Abstract:

Recommendation systems are widely used in e-commerce applications. The engine of a current recommendation system recommends items to a particular user based on user preferences and previous high ratings. Various recommendation schemes such as collaborative filtering and content-based approaches are used to build a recommendation system. Most of current recommendation systems were developed to fit a certain domain such as books, articles, and movies. We propose1 a hybrid framework recommendation system to be applied on two dimensional spaces (User × Item) with a large number of Users and a small number of Items. Moreover, our proposed framework makes use of both favorite and non-favorite items of a particular user. The proposed framework is built upon the integration of association rules mining and the content-based approach. The results of experiments show that our proposed framework can provide accurate recommendations to users.

Keywords: Data Mining, Association Rules, Recommendation Systems, Hybrid Systems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3938
7183 Multi-Dimensional Concerns Mining for Web Applications via Concept-Analysis

Authors: Carlo Bellettini, Alessandro Marchetto, Andrea Trentini

Abstract:

Web applications have become very complex and crucial, especially when combined with areas such as CRM (Customer Relationship Management) and BPR (Business Process Reengineering), the scientific community has focused attention to Web applications design, development, analysis, and testing, by studying and proposing methodologies and tools. This paper proposes an approach to automatic multi-dimensional concern mining for Web Applications, based on concepts analysis, impact analysis, and token-based concern identification. This approach lets the user to analyse and traverse Web software relevant to a particular concern (concept, goal, purpose, etc.) via multi-dimensional separation of concerns, to document, understand and test Web applications. This technique was developed in the context of WAAT (Web Applications Analysis and Testing) project. A semi-automatic tool to support this technique is currently under development.

Keywords: Concepts Analysis, Concerns Mining, Multi-Dimensional Separation of Concerns, Impact Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1427
7182 Discovery of Sequential Patterns Based On Constraint Patterns

Authors: Shigeaki Sakurai, Youichi Kitahata, Ryohei Orihara

Abstract:

This paper proposes a method that discovers sequential patterns corresponding to user-s interests from sequential data. This method expresses the interests as constraint patterns. The constraint patterns can define relationships among attributes of the items composing the data. The method recursively decomposes the constraint patterns into constraint subpatterns. The method evaluates the constraint subpatterns in order to efficiently discover sequential patterns satisfying the constraint patterns. Also, this paper applies the method to the sequential data composed of stock price indexes and verifies its effectiveness through comparing it with a method without using the constraint patterns.

Keywords: Sequential pattern mining, Constraint pattern, Attribute constraint, Stock price indexes

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1381