Search results for: Data mining technique
9631 Post Mining- Discovering Valid Rules from Different Sized Data Sources
Authors: R. Nedunchezhian, K. Anbumani
Abstract:
A big organization may have multiple branches spread across different locations. Processing of data from these branches becomes a huge task when innumerable transactions take place. Also, branches may be reluctant to forward their data for centralized processing but are ready to pass their association rules. Local mining may also generate a large amount of rules. Further, it is not practically possible for all local data sources to be of the same size. A model is proposed for discovering valid rules from different sized data sources where the valid rules are high weighted rules. These rules can be obtained from the high frequency rules generated from each of the data sources. A data source selection procedure is considered in order to efficiently synthesize rules. Support Equalization is another method proposed which focuses on eliminating low frequency rules at the local sites itself thus reducing the rules by a significant amount.
Keywords: Association rules, multiple data stores, synthesizing, valid rules.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14029630 Feature-Based Summarizing and Ranking from Customer Reviews
Authors: Dim En Nyaung, Thin Lai Lai Thein
Abstract:
Due to the rapid increase of Internet, web opinion sources dynamically emerge which is useful for both potential customers and product manufacturers for prediction and decision purposes. These are the user generated contents written in natural languages and are unstructured-free-texts scheme. Therefore, opinion mining techniques become popular to automatically process customer reviews for extracting product features and user opinions expressed over them. Since customer reviews may contain both opinionated and factual sentences, a supervised machine learning technique applies for subjectivity classification to improve the mining performance. In this paper, we dedicate our work is the task of opinion summarization. Therefore, product feature and opinion extraction is critical to opinion summarization, because its effectiveness significantly affects the identification of semantic relationships. The polarity and numeric score of all the features are determined by Senti-WordNet Lexicon. The problem of opinion summarization refers how to relate the opinion words with respect to a certain feature. Probabilistic based model of supervised learning will improve the result that is more flexible and effective.
Keywords: Opinion Mining, Opinion Summarization, Sentiment Analysis, Text Mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 29329629 Text Mining of Twitter Data Using a Latent Dirichlet Allocation Topic Model and Sentiment Analysis
Authors: Sidi Yang, Haiyi Zhang
Abstract:
Twitter is a microblogging platform, where millions of users daily share their attitudes, views, and opinions. Using a probabilistic Latent Dirichlet Allocation (LDA) topic model to discern the most popular topics in the Twitter data is an effective way to analyze a large set of tweets to find a set of topics in a computationally efficient manner. Sentiment analysis provides an effective method to show the emotions and sentiments found in each tweet and an efficient way to summarize the results in a manner that is clearly understood. The primary goal of this paper is to explore text mining, extract and analyze useful information from unstructured text using two approaches: LDA topic modelling and sentiment analysis by examining Twitter plain text data in English. These two methods allow people to dig data more effectively and efficiently. LDA topic model and sentiment analysis can also be applied to provide insight views in business and scientific fields.
Keywords: Text mining, Twitter, topic model, sentiment analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18079628 Artificial Intelligence Applications in Aggregate Quarries: A Reality
Authors: J. E. Ortiz, P. Plaza, J. Herrero, I. Cabria, J. L. Blanco, J. Gavilanes, J. I. Escavy, I. López-Cilla, V. Yagüe, C. Pérez, S. Rodríguez, J. Rico, C. Serrano, J. Bernat
Abstract:
The development of Artificial Intelligence services in mining processes, specifically in aggregate quarries, is facilitating automation and improving numerous aspects of operations. Ultimately, AI is transforming the mining industry by improving efficiency, safety and sustainability. With the ability to analyze large amounts of data and make autonomous decisions, AI offers great opportunities to optimize mining operations and maximize the economic and social benefits of this vital industry. Within the framework of the European DIGIECOQUARRY project, various services were developed for the identification of material quality, production estimation, detection of anomalies and prediction of consumption and production automatically with good results.
Keywords: Aggregates, artificial intelligence, automatization, mining operations.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 239627 Class Outliers Mining: Distance-Based Approach
Authors: Nabil M. Hewahi, Motaz K. Saad
Abstract:
In large datasets, identifying exceptional or rare cases with respect to a group of similar cases is considered very significant problem. The traditional problem (Outlier Mining) is to find exception or rare cases in a dataset irrespective of the class label of these cases, they are considered rare events with respect to the whole dataset. In this research, we pose the problem that is Class Outliers Mining and a method to find out those outliers. The general definition of this problem is “given a set of observations with class labels, find those that arouse suspicions, taking into account the class labels". We introduce a novel definition of Outlier that is Class Outlier, and propose the Class Outlier Factor (COF) which measures the degree of being a Class Outlier for a data object. Our work includes a proposal of a new algorithm towards mining of the Class Outliers, presenting experimental results applied on various domains of real world datasets and finally a comparison study with other related methods is performed.Keywords: Class Outliers, Distance-Based Approach, Outliers Mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 33879626 A Novel Digital Watermarking Technique Basedon ISB (Intermediate Significant Bit)
Authors: Akram M. Zeki, Azizah A. Manaf
Abstract:
Least Significant Bit (LSB) technique is the earliest developed technique in watermarking and it is also the most simple, direct and common technique. It essentially involves embedding the watermark by replacing the least significant bit of the image data with a bit of the watermark data. The disadvantage of LSB is that it is not robust against attacks. In this study intermediate significant bit (ISB) has been used in order to improve the robustness of the watermarking system. The aim of this model is to replace the watermarked image pixels by new pixels that can protect the watermark data against attacks and at the same time keeping the new pixels very close to the original pixels in order to protect the quality of watermarked image. The technique is based on testing the value of the watermark pixel according to the range of each bit-plane.Keywords: Watermarking, LSB, ISB, Robustness.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17079625 Cluster Algorithm for Genetic Diversity
Authors: Manpreet Singh, Keerat Kaur, Bhavdeep Singh
Abstract:
With the hardware technology advancing, the cost of storing is decreasing. Thus there is an urgent need for new techniques and tools that can intelligently and automatically assist us in transferring this data into useful knowledge. Different techniques of data mining are developed which are helpful for handling these large size databases [7]. Data mining is also finding its role in the field of biotechnology. Pedigree means the associated ancestry of a crop variety. Genetic diversity is the variation in the genetic composition of individuals within or among species. Genetic diversity depends upon the pedigree information of the varieties. Parents at lower hierarchic levels have more weightage for predicting genetic diversity as compared to the upper hierarchic levels. The weightage decreases as the level increases. For crossbreeding, the two varieties should be more and more genetically diverse so as to incorporate the useful characters of the two varieties in the newly developed variety. This paper discusses the searching and analyzing of different possible pairs of varieties selected on the basis of morphological characters, Climatic conditions and Nutrients so as to obtain the most optimal pair that can produce the required crossbreed variety. An algorithm was developed to determine the genetic diversity between the selected wheat varieties. Cluster analysis technique is used for retrieving the results.Keywords: Genetic diversity, pedigree, nutrients.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18019624 What the Future Holds for Social Media Data Analysis
Authors: P. Wlodarczak, J. Soar, M. Ally
Abstract:
The dramatic rise in the use of Social Media (SM) platforms such as Facebook and Twitter provide access to an unprecedented amount of user data. Users may post reviews on products and services they bought, write about their interests, share ideas or give their opinions and views on political issues. There is a growing interest in the analysis of SM data from organisations for detecting new trends, obtaining user opinions on their products and services or finding out about their online reputations. A recent research trend in SM analysis is making predictions based on sentiment analysis of SM. Often indicators of historic SM data are represented as time series and correlated with a variety of real world phenomena like the outcome of elections, the development of financial indicators, box office revenue and disease outbreaks. This paper examines the current state of research in the area of SM mining and predictive analysis and gives an overview of the analysis methods using opinion mining and machine learning techniques.
Keywords: Social Media, text mining, knowledge discovery, predictive analysis, machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 38479623 Classifier Based Text Mining for Neural Network
Authors: M. Govindarajan, R. M. Chandrasekaran
Abstract:
Text Mining is around applying knowledge discovery techniques to unstructured text is termed knowledge discovery in text (KDT), or Text data mining or Text Mining. In Neural Network that address classification problems, training set, testing set, learning rate are considered as key tasks. That is collection of input/output patterns that are used to train the network and used to assess the network performance, set the rate of adjustments. This paper describes a proposed back propagation neural net classifier that performs cross validation for original Neural Network. In order to reduce the optimization of classification accuracy, training time. The feasibility the benefits of the proposed approach are demonstrated by means of five data sets like contact-lenses, cpu, weather symbolic, Weather, labor-nega-data. It is shown that , compared to exiting neural network, the training time is reduced by more than 10 times faster when the dataset is larger than CPU or the network has many hidden units while accuracy ('percent correct') was the same for all datasets but contact-lences, which is the only one with missing attributes. For contact-lences the accuracy with Proposed Neural Network was in average around 0.3 % less than with the original Neural Network. This algorithm is independent of specify data sets so that many ideas and solutions can be transferred to other classifier paradigms.Keywords: Back propagation, classification accuracy, textmining, time complexity.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 42179622 Application of Data Mining Techniques for Tourism Knowledge Discovery
Authors: Teklu Urgessa, Wookjae Maeng, Joong Seek Lee
Abstract:
Application of five implementations of three data mining classification techniques was experimented for extracting important insights from tourism data. The aim was to find out the best performing algorithm among the compared ones for tourism knowledge discovery. Knowledge discovery process from data was used as a process model. 10-fold cross validation method is used for testing purpose. Various data preprocessing activities were performed to get the final dataset for model building. Classification models of the selected algorithms were built with different scenarios on the preprocessed dataset. The outperformed algorithm tourism dataset was Random Forest (76%) before applying information gain based attribute selection and J48 (C4.5) (75%) after selection of top relevant attributes to the class (target) attribute. In terms of time for model building, attribute selection improves the efficiency of all algorithms. Artificial Neural Network (multilayer perceptron) showed the highest improvement (90%). The rules extracted from the decision tree model are presented, which showed intricate, non-trivial knowledge/insight that would otherwise not be discovered by simple statistical analysis with mediocre accuracy of the machine using classification algorithms.
Keywords: Classification algorithms; data mining; tourism; knowledge discovery.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 25449621 Influence of Non-Structural Elements on Dynamic Response of Multi-Storey Rc Building to Mining Shock
Authors: Joanna M. Dulińska, Maria Fabijańska
Abstract:
In the paper the results of calculations of the dynamic response of a multi-storey reinforced concrete building to a strong mining shock originated from the main region of mining activity in Poland (i.e. the Legnica-Glogow Copper District) are presented. The representative time histories of accelerations registered in three directions were used as ground motion data in calculations of the dynamic response of the structure. Two variants of a numerical model were applied: the model including only structural elements of the building and the model including both structural and non-structural elements (i.e. partition walls and ventilation ducts made of brick). It turned out that non-structural elements of multi-storey RC buildings have a small impact of about 10 % on natural frequencies of these structures. It was also proved that the dynamic response of building to mining shock obtained in case of inclusion of all non-structural elements in the numerical model is about 20 % smaller than in case of consideration of structural elements only. The principal stresses obtained in calculations of dynamic response of multi-storey building to strong mining shock are situated on the level of about 30% of values obtained from static analysis (dead load).Keywords: Dynamic characteristics of buildings, mining shocks, dynamic response of buildings, non-structural elements
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18859620 A Heuristics Approach for Fast Detecting Suspicious Money Laundering Cases in an Investment Bank
Authors: Nhien-An Le-Khac, Sammer Markos, M-Tahar Kechadi
Abstract:
Today, money laundering (ML) poses a serious threat not only to financial institutions but also to the nation. This criminal activity is becoming more and more sophisticated and seems to have moved from the cliché of drug trafficking to financing terrorism and surely not forgetting personal gain. Most international financial institutions have been implementing anti-money laundering solutions (AML) to fight investment fraud. However, traditional investigative techniques consume numerous man-hours. Recently, data mining approaches have been developed and are considered as well-suited techniques for detecting ML activities. Within the scope of a collaboration project for the purpose of developing a new solution for the AML Units in an international investment bank, we proposed a data mining-based solution for AML. In this paper, we present a heuristics approach to improve the performance for this solution. We also show some preliminary results associated with this method on analysing transaction datasets.Keywords: data mining, anti money laundering, clustering, heuristics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 35849619 Building an Integrated Relational Database from Swiss Nutrition National Survey and Swiss Health Datasets for Data Mining Purposes
Authors: Ilona Mewes, Helena Jenzer, Farshideh Einsele
Abstract:
Objective: The objective of the study was to integrate two big databases from Swiss nutrition national survey (menuCH) and Swiss health national survey 2012 for data mining purposes. Each database has a demographic base data. An integrated Swiss database is built to later discover critical food consumption patterns linked with lifestyle diseases known to be strongly tied with food consumption. Design: Swiss nutrition national survey (menuCH) with approx. 2000 respondents from two different surveys, one by Phone and the other by questionnaire along with Swiss health national survey 2012 with 21500 respondents were pre-processed, cleaned and finally integrated to a unique relational database. Results: The result of this study is an integrated relational database from the Swiss nutritional and health databases.
Keywords: Health informatics, data mining, nutritional and health databases, nutritional and chronical databases.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17229618 Parallel and Distributed Mining of Association Rule on Knowledge Grid
Authors: U. Sakthi, R. Hemalatha, R. S. Bhuvaneswaran
Abstract:
In Virtual organization, Knowledge Discovery (KD) service contains distributed data resources and computing grid nodes. Computational grid is integrated with data grid to form Knowledge Grid, which implements Apriori algorithm for mining association rule on grid network. This paper describes development of parallel and distributed version of Apriori algorithm on Globus Toolkit using Message Passing Interface extended with Grid Services (MPICHG2). The creation of Knowledge Grid on top of data and computational grid is to support decision making in real time applications. In this paper, the case study describes design and implementation of local and global mining of frequent item sets. The experiments were conducted on different configurations of grid network and computation time was recorded for each operation. We analyzed our result with various grid configurations and it shows speedup of computation time is almost superlinear.Keywords: Association rule, Grid computing, Knowledge grid, Mobility prediction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21809617 Analysis of Sequence Moves in Successful Chess Openings Using Data Mining with Association Rules
Authors: R.M.Rani
Abstract:
Chess is one of the indoor games, which improves the level of human confidence, concentration, planning skills and knowledge. The main objective of this paper is to help the chess players to improve their chess openings using data mining techniques. Budding Chess Players usually do practices by analyzing various existing openings. When they analyze and correlate thousands of openings it becomes tedious and complex for them. The work done in this paper is to analyze the best lines of Blackmar- Diemer Gambit(BDG) which opens with White D4... using data mining analysis. It is carried out on the collection of winning games by applying association rules. The first step of this analysis is assigning variables to each different sequence moves. In the second step, the sequence association rules were generated to calculate support and confidence factor which help us to find the best subsequence chess moves that may lead to winning position.Keywords: Blackmar-Diemer Gambit(BDG), Confidence, sequence Association Rules, Support.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 30909616 Video Mining for Creative Rendering
Authors: Mei Chen
Abstract:
More and more home videos are being generated with the ever growing popularity of digital cameras and camcorders. For many home videos, a photo rendering, whether capturing a moment or a scene within the video, provides a complementary representation to the video. In this paper, a video motion mining framework for creative rendering is presented. The user-s capture intent is derived by analyzing video motions, and respective metadata is generated for each capture type. The metadata can be used in a number of applications, such as creating video thumbnail, generating panorama posters, and producing slideshows of video.
Keywords: Motion mining, semantic abstraction, video mining, video representation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16489615 Navigation Patterns Mining Approach based on Expectation Maximization Algorithm
Authors: Norwati Mustapha, Manijeh Jalali, Abolghasem Bozorgniya, Mehrdad Jalali
Abstract:
Web usage mining algorithms have been widely utilized for modeling user web navigation behavior. In this study we advance a model for mining of user-s navigation pattern. The model makes user model based on expectation-maximization (EM) algorithm.An EM algorithm is used in statistics for finding maximum likelihood estimates of parameters in probabilistic models, where the model depends on unobserved latent variables. The experimental results represent that by decreasing the number of clusters, the log likelihood converges toward lower values and probability of the largest cluster will be decreased while the number of the clusters increases in each treatment.Keywords: Web Usage Mining, Expectation maximization, navigation pattern mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15789614 Numerical Modeling of Artisanal and Small-Scale Mining of Coltan in the African Great Lakes Region
Authors: Sergio Perez Rodriguez
Abstract:
Findings of a production model of Artisanal and Small-Scale Mining (ASM) of coltan ore by an average Democratic Republic of Congo (DRC) mineworker are presented in this paper. These can be used as a reference for a similar characterization of the daily labor of counterparts from other countries in the Africa's Great Lakes region. To that end, the Fundamental Equation of Mineral Production has been applied in this paper, considering a miner's average daily output of coltan, estimated in the base of gross statistical data gathered from reputable sources. Results indicate daily yields of individual miners in the order of 300 g of coltan ore, with hourly peaks of production in the range of 30 to 40 g of the mineral. Yields are expected to be in the order of 5 g or less during the least productive hours. These outputs are expected to be achieved during the halves of the eight to 10 hours of daily working sessions that these artisanal laborers can attend during the mining season.
Keywords: Coltan, mineral production, Production to Reserve ratio, artisanal mining, small-scale mining, ASM, human work, Great Lakes region, Democratic Republic of Congo.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1939613 Using Textual Pre-Processing and Text Mining to Create Semantic Links
Authors: Ricardo Avila, Gabriel Lopes, Vania Vidal, Jose Macedo
Abstract:
This article offers a approach to the automatic discovery of semantic concepts and links in the domain of Oil Exploration and Production (E&P). Machine learning methods combined with textual pre-processing techniques were used to detect local patterns in texts and, thus, generate new concepts and new semantic links. Even using more specific vocabularies within the oil domain, our approach has achieved satisfactory results, suggesting that the proposal can be applied in other domains and languages, requiring only minor adjustments.Keywords: Semantic links, data mining, linked data, SKOS.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10599612 Application of Advanced Remote Sensing Data in Mineral Exploration in the Vicinity of Heavy Dense Forest Cover Area of Jharkhand and Odisha State Mining Area
Authors: Hemant Kumar, R. N. K. Sharma, A. P. Krishna
Abstract:
The study has been carried out on the Saranda in Jharkhand and a part of Odisha state. Geospatial data of Hyperion, a remote sensing satellite, have been used. This study has used a wide variety of patterns related to image processing to enhance and extract the mining class of Fe and Mn ores.Landsat-8, OLI sensor data have also been used to correctly explore related minerals. In this way, various processes have been applied to increase the mineralogy class and comparative evaluation with related frequency done. The Hyperion dataset for hyperspectral remote sensing has been specifically verified as an effective tool for mineral or rock information extraction within the band range of shortwave infrared used. The abundant spatial and spectral information contained in hyperspectral images enables the differentiation of different objects of any object into targeted applications for exploration such as exploration detection, mining.
Keywords: Hyperion, hyperspectral, sensor, Landsat-8.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6209611 Integrated Method for Detection of Unknown Steganographic Content
Authors: Magdalena Pejas
Abstract:
This article concerns the presentation of an integrated method for detection of steganographic content embedded by new unknown programs. The method is based on data mining and aggregated hypothesis testing. The article contains the theoretical basics used to deploy the proposed detection system and the description of improvement proposed for the basic system idea. Further main results of experiments and implementation details are collected and described. Finally example results of the tests are presented.Keywords: Steganography, steganalysis, data embedding, data mining, feature extraction, knowledge base, system learning, hypothesis testing, error estimation, black box program, file structure.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15639610 On Pattern-Based Programming towards the Discovery of Frequent Patterns
Authors: Kittisak Kerdprasop, Nittaya Kerdprasop
Abstract:
The problem of frequent pattern discovery is defined as the process of searching for patterns such as sets of features or items that appear in data frequently. Finding such frequent patterns has become an important data mining task because it reveals associations, correlations, and many other interesting relationships hidden in a database. Most of the proposed frequent pattern mining algorithms have been implemented with imperative programming languages. Such paradigm is inefficient when set of patterns is large and the frequent pattern is long. We suggest a high-level declarative style of programming apply to the problem of frequent pattern discovery. We consider two languages: Haskell and Prolog. Our intuitive idea is that the problem of finding frequent patterns should be efficiently and concisely implemented via a declarative paradigm since pattern matching is a fundamental feature supported by most functional languages and Prolog. Our frequent pattern mining implementation using the Haskell and Prolog languages confirms our hypothesis about conciseness of the program. The comparative performance studies on line-of-code, speed and memory usage of declarative versus imperative programming have been reported in the paper.Keywords: Frequent pattern mining, functional programming, pattern matching, logic programming.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13429609 Combining Bagging and Boosting
Authors: S. B. Kotsiantis, P. E. Pintelas
Abstract:
Bagging and boosting are among the most popular resampling ensemble methods that generate and combine a diversity of classifiers using the same learning algorithm for the base-classifiers. Boosting algorithms are considered stronger than bagging on noisefree data. However, there are strong empirical indications that bagging is much more robust than boosting in noisy settings. For this reason, in this work we built an ensemble using a voting methodology of bagging and boosting ensembles with 10 subclassifiers in each one. We performed a comparison with simple bagging and boosting ensembles with 25 sub-classifiers, as well as other well known combining methods, on standard benchmark datasets and the proposed technique was the most accurate.
Keywords: data mining, machine learning, pattern recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 25619608 Mining Implicit Knowledge to Predict Political Risk by Providing Novel Framework with Using Bayesian Network
Authors: Siavash Asadi Ghajarloo
Abstract:
Nowadays predicting political risk level of country has become a critical issue for investors who intend to achieve accurate information concerning stability of the business environments. Since, most of the times investors are layman and nonprofessional IT personnel; this paper aims to propose a framework named GECR in order to help nonexpert persons to discover political risk stability across time based on the political news and events. To achieve this goal, the Bayesian Networks approach was utilized for 186 political news of Pakistan as sample dataset. Bayesian Networks as an artificial intelligence approach has been employed in presented framework, since this is a powerful technique that can be applied to model uncertain domains. The results showed that our framework along with Bayesian Networks as decision support tool, predicted the political risk level with a high degree of accuracy.Keywords: Bayesian Networks, Data mining, GECRframework, Predicting political risk.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21739607 Customer Need Type Classification Model using Data Mining Techniques for Recommender Systems
Authors: Kyoung-jae Kim
Abstract:
Recommender systems are usually regarded as an important marketing tool in the e-commerce. They use important information about users to facilitate accurate recommendation. The information includes user context such as location, time and interest for personalization of mobile users. We can easily collect information about location and time because mobile devices communicate with the base station of the service provider. However, information about user interest can-t be easily collected because user interest can not be captured automatically without user-s approval process. User interest usually represented as a need. In this study, we classify needs into two types according to prior research. This study investigates the usefulness of data mining techniques for classifying user need type for recommendation systems. We employ several data mining techniques including artificial neural networks, decision trees, case-based reasoning, and multivariate discriminant analysis. Experimental results show that CHAID algorithm outperforms other models for classifying user need type. This study performs McNemar test to examine the statistical significance of the differences of classification results. The results of McNemar test also show that CHAID performs better than the other models with statistical significance.Keywords: Customer need type, Data mining techniques, Recommender system, Personalization, Mobile user.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21459606 Efficient Implementation of Serial and Parallel Support Vector Machine Training with a Multi-Parameter Kernel for Large-Scale Data Mining
Authors: Tatjana Eitrich, Bruno Lang
Abstract:
This work deals with aspects of support vector learning for large-scale data mining tasks. Based on a decomposition algorithm that can be run in serial and parallel mode we introduce a data transformation that allows for the usage of an expensive generalized kernel without additional costs. In order to speed up the decomposition algorithm we analyze the problem of working set selection for large data sets and analyze the influence of the working set sizes onto the scalability of the parallel decomposition scheme. Our modifications and settings lead to improvement of support vector learning performance and thus allow using extensive parameter search methods to optimize classification accuracy.
Keywords: Support Vector Machines, Shared Memory Parallel Computing, Large Data
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15769605 Web Traffic Mining using Neural Networks
Authors: Farhad F. Yusifov
Abstract:
With the explosive growth of data available on the Internet, personalization of this information space become a necessity. At present time with the rapid increasing popularity of the WWW, Websites are playing a crucial role to convey knowledge and information to the end users. Discovering hidden and meaningful information about Web users usage patterns is critical to determine effective marketing strategies to optimize the Web server usage for accommodating future growth. The task of mining useful information becomes more challenging when the Web traffic volume is enormous and keeps on growing. In this paper, we propose a intelligent model to discover and analyze useful knowledge from the available Web log data.Keywords: Clustering, Self organizing map, Web log files, Web traffic.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16029604 Opinion Mining and Sentiment Analysis on DEFT
Authors: Najiba Ouled Omar, Azza Harbaoui, Henda Ben Ghezala
Abstract:
Current research practices sentiment analysis with a focus on social networks, DEfi Fouille de Texte (DEFT) (Text Mining Challenge) evaluation campaign focuses on opinion mining and sentiment analysis on social networks, especially social network Twitter. It aims to confront the systems produced by several teams from public and private research laboratories. DEFT offers participants the opportunity to work on regularly renewed themes and proposes to work on opinion mining in several editions. The purpose of this article is to scrutinize and analyze the works relating to opinions mining and sentiment analysis in the Twitter social network realized by DEFT. It examines the tasks proposed by the organizers of the challenge and the methods used by the participants.
Keywords: Opinion mining, sentiment analysis, emotion, polarity, annotation, OSEE, figurative language, DEFT, Twitter, Tweet.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8209603 Idiopathic Constipation can be Subdivided in Clinical Subtypes: Data Mining by Cluster Analysis on a Population based Study
Authors: Mauro Giacomini, Stefania Bertone, Carlo Mansi, Pietro Dulbecco, Vincenzo Savarino
Abstract:
The prevalence of non organic constipation differs from country to country and the reliability of the estimate rates is uncertain. Moreover, the clinical relevance of subdividing the heterogeneous functional constipation disorders into pre-defined subgroups is largely unknown.. Aim: to estimate the prevalence of constipation in a population-based sample and determine whether clinical subgroups can be identified. An age and gender stratified sample population from 5 Italian cities was evaluated using a previously validated questionnaire. Data mining by cluster analysis was used to determine constipation subgroups. Results: 1,500 complete interviews were obtained from 2,083 contacted households (72%). Self-reported constipation correlated poorly with symptombased constipation found in 496 subjects (33.1%). Cluster analysis identified four constipation subgroups which correlated to subgroups identified according to pre-defined symptom criteria. Significant differences in socio-demographics and lifestyle were observed among subgroups.Keywords: Cluster analysis, constipation, data mining, statistical analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12939602 Mining Genes Relations in Microarray Data Combined with Ontology in Colon Cancer Automated Diagnosis System
Authors: A. Gruzdz, A. Ihnatowicz, J. Siddiqi, B. Akhgar
Abstract:
MATCH project [1] entitle the development of an automatic diagnosis system that aims to support treatment of colon cancer diseases by discovering mutations that occurs to tumour suppressor genes (TSGs) and contributes to the development of cancerous tumours. The constitution of the system is based on a) colon cancer clinical data and b) biological information that will be derived by data mining techniques from genomic and proteomic sources The core mining module will consist of the popular, well tested hybrid feature extraction methods, and new combined algorithms, designed especially for the project. Elements of rough sets, evolutionary computing, cluster analysis, self-organization maps and association rules will be used to discover the annotations between genes, and their influence on tumours [2]-[11]. The methods used to process the data have to address their high complexity, potential inconsistency and problems of dealing with the missing values. They must integrate all the useful information necessary to solve the expert's question. For this purpose, the system has to learn from data, or be able to interactively specify by a domain specialist, the part of the knowledge structure it needs to answer a given query. The program should also take into account the importance/rank of the particular parts of data it analyses, and adjusts the used algorithms accordingly.Keywords: Bioinformatics, gene expression, ontology, selforganizingmaps.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1973