Search results for: stream data mining
25971 Data Stream Association Rule Mining with Cloud Computing
Authors: B. Suraj Aravind, M. H. M. Krishna Prasad
Abstract:
There exist emerging applications of data streams that require association rule mining, such as network traffic monitoring, web click streams analysis, sensor data, data from satellites etc. Data streams typically arrive continuously in high speed with huge amount and changing data distribution. This raises new issues that need to be considered when developing association rule mining techniques for stream data. This paper proposes to introduce an improved data stream association rule mining algorithm by eliminating the limitation of resources. For this, the concept of cloud computing is used. Inclusion of this may lead to additional unknown problems which needs further research.Keywords: data stream, association rule mining, cloud computing, frequent itemsets
Procedia PDF Downloads 50125970 Static vs. Stream Mining Trajectories Similarity Measures
Authors: Musaab Riyadh, Norwati Mustapha, Dina Riyadh
Abstract:
Trajectory similarity can be defined as the cost of transforming one trajectory into another based on certain similarity method. It is the core of numerous mining tasks such as clustering, classification, and indexing. Various approaches have been suggested to measure similarity based on the geometric and dynamic properties of trajectory, the overlapping between trajectory segments, and the confined area between entire trajectories. In this article, an evaluation of these approaches has been done based on computational cost, usage memory, accuracy, and the amount of data which is needed in advance to determine its suitability to stream mining applications. The evaluation results show that the stream mining applications support similarity methods which have low computational cost and memory, single scan on data, and free of mathematical complexity due to the high-speed generation of data.Keywords: global distance measure, local distance measure, semantic trajectory, spatial dimension, stream data mining
Procedia PDF Downloads 39625969 Efficient Frequent Itemset Mining Methods over Real-Time Spatial Big Data
Authors: Hamdi Sana, Emna Bouazizi, Sami Faiz
Abstract:
In recent years, there is a huge increase in the use of spatio-temporal applications where data and queries are continuously moving. As a result, the need to process real-time spatio-temporal data seems clear and real-time stream data management becomes a hot topic. Sliding window model and frequent itemset mining over dynamic data are the most important problems in the context of data mining. Thus, sliding window model for frequent itemset mining is a widely used model for data stream mining due to its emphasis on recent data and its bounded memory requirement. These methods use the traditional transaction-based sliding window model where the window size is based on a fixed number of transactions. Actually, this model supposes that all transactions have a constant rate which is not suited for real-time applications. And the use of this model in such applications endangers their performance. Based on these observations, this paper relaxes the notion of window size and proposes the use of a timestamp-based sliding window model. In our proposed frequent itemset mining algorithm, support conditions are used to differentiate frequents and infrequent patterns. Thereafter, a tree is developed to incrementally maintain the essential information. We evaluate our contribution. The preliminary results are quite promising.Keywords: real-time spatial big data, frequent itemset, transaction-based sliding window model, timestamp-based sliding window model, weighted frequent patterns, tree, stream query
Procedia PDF Downloads 16125968 Optimizing Communications Overhead in Heterogeneous Distributed Data Streams
Authors: Rashi Bhalla, Russel Pears, M. Asif Naeem
Abstract:
In this 'Information Explosion Era' analyzing data 'a critical commodity' and mining knowledge from vertically distributed data stream incurs huge communication cost. However, an effort to decrease the communication in the distributed environment has an adverse influence on the classification accuracy; therefore, a research challenge lies in maintaining a balance between transmission cost and accuracy. This paper proposes a method based on Bayesian inference to reduce the communication volume in a heterogeneous distributed environment while retaining prediction accuracy. Our experimental evaluation reveals that a significant reduction in communication can be achieved across a diverse range of dataset types.Keywords: big data, bayesian inference, distributed data stream mining, heterogeneous-distributed data
Procedia PDF Downloads 16125967 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity
Authors: Hoda A. Abdel Hafez
Abstract:
Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.Keywords: mining big data, big data, machine learning, telecommunication
Procedia PDF Downloads 40925966 Frequent Itemset Mining Using Rough-Sets
Authors: Usman Qamar, Younus Javed
Abstract:
Frequent pattern mining is the process of finding a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set. It was proposed in the context of frequent itemsets and association rule mining. Frequent pattern mining is used to find inherent regularities in data. What products were often purchased together? Its applications include basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis. However, one of the bottlenecks of frequent itemset mining is that as the data increase the amount of time and resources required to mining the data increases at an exponential rate. In this investigation a new algorithm is proposed which can be uses as a pre-processor for frequent itemset mining. FASTER (FeAture SelecTion using Entropy and Rough sets) is a hybrid pre-processor algorithm which utilizes entropy and rough-sets to carry out record reduction and feature (attribute) selection respectively. FASTER for frequent itemset mining can produce a speed up of 3.1 times when compared to original algorithm while maintaining an accuracy of 71%.Keywords: rough-sets, classification, feature selection, entropy, outliers, frequent itemset mining
Procedia PDF Downloads 43725965 A Review Paper on Data Mining and Genetic Algorithm
Authors: Sikander Singh Cheema, Jasmeen Kaur
Abstract:
In this paper, the concept of data mining is summarized and its one of the important process i.e KDD is summarized. The data mining based on Genetic Algorithm is researched in and ways to achieve the data mining Genetic Algorithm are surveyed. This paper also conducts a formal review on the area of data mining tasks and genetic algorithm in various fields.Keywords: data mining, KDD, genetic algorithm, descriptive mining, predictive mining
Procedia PDF Downloads 59125964 A Comprehensive Survey and Improvement to Existing Privacy Preserving Data Mining Techniques
Authors: Tosin Ige
Abstract:
Ethics must be a condition of the world, like logic. (Ludwig Wittgenstein, 1889-1951). As important as data mining is, it possess a significant threat to ethics, privacy, and legality, since data mining makes it difficult for an individual or consumer (in the case of a company) to control the accessibility and usage of his data. This research focuses on Current issues and the latest research and development on Privacy preserving data mining methods as at year 2022. It also discusses some advances in those techniques while at the same time highlighting and providing a new technique as a solution to an existing technique of privacy preserving data mining methods. This paper also bridges the wide gap between Data mining and the Web Application Programing Interface (web API), where research is urgently needed for an added layer of security in data mining while at the same time introducing a seamless and more efficient way of data mining.Keywords: data, privacy, data mining, association rule, privacy preserving, mining technique
Procedia PDF Downloads 17225963 Frequent Item Set Mining for Big Data Using MapReduce Framework
Authors: Tamanna Jethava, Rahul Joshi
Abstract:
Frequent Item sets play an essential role in many data Mining tasks that try to find interesting patterns from the database. Typically it refers to a set of items that frequently appear together in transaction dataset. There are several mining algorithm being used for frequent item set mining, yet most do not scale to the type of data we presented with today, so called “BIG DATA”. Big Data is a collection of large data sets. Our approach is to work on the frequent item set mining over the large dataset with scalable and speedy way. Big Data basically works with Map Reduce along with HDFS is used to find out frequent item sets from Big Data on large cluster. This paper focuses on using pre-processing & mining algorithm as hybrid approach for big data over Hadoop platform.Keywords: frequent item set mining, big data, Hadoop, MapReduce
Procedia PDF Downloads 43425962 Algorithms used in Spatial Data Mining GIS
Authors: Vahid Bairami Rad
Abstract:
Extracting knowledge from spatial data like GIS data is important to reduce the data and extract information. Therefore, the development of new techniques and tools that support the human in transforming data into useful knowledge has been the focus of the relatively new and interdisciplinary research area ‘knowledge discovery in databases’. Thus, we introduce a set of database primitives or basic operations for spatial data mining which are sufficient to express most of the spatial data mining algorithms from the literature. This approach has several advantages. Similar to the relational standard language SQL, the use of standard primitives will speed-up the development of new data mining algorithms and will also make them more portable. We introduced a database-oriented framework for spatial data mining which is based on the concepts of neighborhood graphs and paths. A small set of basic operations on these graphs and paths were defined as database primitives for spatial data mining. Furthermore, techniques to efficiently support the database primitives by a commercial DBMS were presented.Keywords: spatial data base, knowledge discovery database, data mining, spatial relationship, predictive data mining
Procedia PDF Downloads 46025961 Reviewing Privacy Preserving Distributed Data Mining
Authors: Sajjad Baghernezhad, Saeideh Baghernezhad
Abstract:
Nowadays considering human involved in increasing data development some methods such as data mining to extract science are unavoidable. One of the discussions of data mining is inherent distribution of the data usually the bases creating or receiving such data belong to corporate or non-corporate persons and do not give their information freely to others. Yet there is no guarantee to enable someone to mine special data without entering in the owner’s privacy. Sending data and then gathering them by each vertical or horizontal software depends on the type of their preserving type and also executed to improve data privacy. In this study it was attempted to compare comprehensively preserving data methods; also general methods such as random data, coding and strong and weak points of each one are examined.Keywords: data mining, distributed data mining, privacy protection, privacy preserving
Procedia PDF Downloads 52525960 Block Mining: Block Chain Enabled Process Mining Database
Authors: James Newman
Abstract:
Process mining is an emerging technology that looks to serialize enterprise data in time series data. It has been used by many companies and has been the subject of a variety of research papers. However, the majority of current efforts have looked at how to best create process mining from standard relational databases. This paper is the first pass at outlining a database custom-built for the minimal viable product of process mining. We present Block Miner, a blockchain protocol to store process mining data across a distributed network. We demonstrate the feasibility of storing process mining data on the blockchain. We present a proof of concept and show how the intersection of these two technologies helps to solve a variety of issues, including but not limited to ransomware attacks, tax documentation, and conflict resolution.Keywords: blockchain, process mining, memory optimization, protocol
Procedia PDF Downloads 10225959 Healthcare Data Mining Innovations
Authors: Eugenia Jilinguirian
Abstract:
In the healthcare industry, data mining is essential since it transforms the field by collecting useful data from large datasets. Data mining is the process of applying advanced analytical methods to large patient records and medical histories in order to identify patterns, correlations, and trends. Healthcare professionals can improve diagnosis accuracy, uncover hidden linkages, and predict disease outcomes by carefully examining these statistics. Additionally, data mining supports personalized medicine by personalizing treatment according to the unique attributes of each patient. This proactive strategy helps allocate resources more efficiently, enhances patient care, and streamlines operations. However, to effectively apply data mining, however, and ensure the use of private healthcare information, issues like data privacy and security must be carefully considered. Data mining continues to be vital for searching for more effective, efficient, and individualized healthcare solutions as technology evolves.Keywords: data mining, healthcare, big data, individualised healthcare, healthcare solutions, database
Procedia PDF Downloads 6625958 Data Mining Practices: Practical Studies on the Telecommunication Companies in Jordan
Authors: Dina Ahmad Alkhodary
Abstract:
This study aimed to investigate the practices of Data Mining on the telecommunication companies in Jordan, from the viewpoint of the respondents. In order to achieve the goal of the study, and test the validity of hypotheses, the researcher has designed a questionnaire to collect data from managers and staff members from main department in the researched companies. The results shows improvements stages of the telecommunications companies towered Data Mining.Keywords: data, mining, development, business
Procedia PDF Downloads 49725957 Recent Advances in Data Warehouse
Authors: Fahad Hanash Alzahrani
Abstract:
This paper describes some recent advances in a quickly developing area of data storing and processing based on Data Warehouses and Data Mining techniques, which are associated with software, hardware, data mining algorithms and visualisation techniques having common features for any specific problems and tasks of their implementation.Keywords: data warehouse, data mining, knowledge discovery in databases, on-line analytical processing
Procedia PDF Downloads 40425956 Cloud Computing in Data Mining: A Technical Survey
Authors: Ghaemi Reza, Abdollahi Hamid, Dashti Elham
Abstract:
Cloud computing poses a diversity of challenges in data mining operation arising out of the dynamic structure of data distribution as against the use of typical database scenarios in conventional architecture. Due to immense number of users seeking data on daily basis, there is a serious security concerns to cloud providers as well as data providers who put their data on the cloud computing environment. Big data analytics use compute intensive data mining algorithms (Hidden markov, MapReduce parallel programming, Mahot Project, Hadoop distributed file system, K-Means and KMediod, Apriori) that require efficient high performance processors to produce timely results. Data mining algorithms to solve or optimize the model parameters. The challenges that operation has to encounter is the successful transactions to be established with the existing virtual machine environment and the databases to be kept under the control. Several factors have led to the distributed data mining from normal or centralized mining. The approach is as a SaaS which uses multi-agent systems for implementing the different tasks of system. There are still some problems of data mining based on cloud computing, including design and selection of data mining algorithms.Keywords: cloud computing, data mining, computing models, cloud services
Procedia PDF Downloads 47925955 Modular Data and Calculation Framework for a Technology-based Mapping of the Manufacturing Process According to the Value Stream Management Approach
Authors: Tim Wollert, Fabian Behrendt
Abstract:
Value Stream Management (VSM) is a widely used methodology in the context of Lean Management for improving end-to-end material and information flows from a supplier to a customer from a company’s perspective. Whereas the design principles, e.g. Pull, value-adding, customer-orientation and further ones are still valid against the background of an increasing digitalized and dynamic environment, the methodology itself for mapping a value stream is characterized as time- and resource-intensive due to the high degree of manual activities. The digitalization of processes in the context of Industry 4.0 enables new opportunities to reduce these manual efforts and make the VSM approach more agile. The paper at hand aims at providing a modular data and calculation framework, utilizing the available business data, provided by information and communication technologies for automizing the value stream mapping process with focus on the manufacturing process.Keywords: lean management 4.0, value stream management (VSM) 4.0, dynamic value stream mapping, enterprise resource planning (ERP)
Procedia PDF Downloads 15025954 Determination of Flow Arrangement for Optimum Performance in Heat Exchangers
Authors: Ahmed Salisu Atiku
Abstract:
This task involves the determination of the flow arrangement for optimum performance and the calculation of total heat transfer of two identical double pipe heat exchangers in series. The inner pipe contains the cold water stream at 27°C, whilst the outer pipe contains the two hot stream of water at 50°C and 90 °C which can be mixed in any way desired. The analysis was carried out using counter flow arrangement due to its good heat transfer ability. The best way of heating this cold stream was found out to be passing the 90°C hot stream through the two heat exchangers. The outlet temperature of the cold stream was found to be 39.6°C and overall heat transfer of 131.3 kW. Though starting with 50°C hot stream in the first heat exchanger followed by 90°C hot stream in the second heat exchanger gives an outlet temperature almost the same as 90°C hot stream alone, but the heat transfer is low. The reason for the low heat transfer was that only the heat transfer in the second heat exchanger is considered. Whilst the reason behind high outlet temperature was that the cold stream was already preheated by the first stream.Keywords: cold stream, flow arrangement, heat exchanger, hot stream
Procedia PDF Downloads 32325953 A Modular Framework for Enabling Analysis for Educators with Different Levels of Data Mining Skills
Authors: Kyle De Freitas, Margaret Bernard
Abstract:
Enabling data mining analysis among a wider audience of educators is an active area of research within the educational data mining (EDM) community. The paper proposes a framework for developing an environment that caters for educators who have little technical data mining skills as well as for more advanced users with some data mining expertise. This framework architecture was developed through the review of the strengths and weaknesses of existing models in the literature. The proposed framework provides a modular architecture for future researchers to focus on the development of specific areas within the EDM process. Finally, the paper also highlights a strategy of enabling analysis through either the use of predefined questions or a guided data mining process and highlights how the developed questions and analysis conducted can be reused and extended over time.Keywords: educational data mining, learning management system, learning analytics, EDM framework
Procedia PDF Downloads 32625952 Association Rules Mining Task Using Metaheuristics: Review
Authors: Abir Derouiche, Abdesslem Layeb
Abstract:
Association Rule Mining (ARM) is one of the most popular data mining tasks and it is widely used in various areas. The search for association rules is an NP-complete problem that is why metaheuristics have been widely used to solve it. The present paper presents the ARM as an optimization problem and surveys the proposed approaches in the literature based on metaheuristics.Keywords: Optimization, Metaheuristics, Data Mining, Association rules Mining
Procedia PDF Downloads 15925951 A New Approach for Improving Accuracy of Multi Label Stream Data
Authors: Kunal Shah, Swati Patel
Abstract:
Many real world problems involve data which can be considered as multi-label data streams. Efficient methods exist for multi-label classification in non streaming scenarios. However, learning in evolving streaming scenarios is more challenging, as the learners must be able to adapt to change using limited time and memory. Classification is used to predict class of unseen instance as accurate as possible. Multi label classification is a variant of single label classification where set of labels associated with single instance. Multi label classification is used by modern applications, such as text classification, functional genomics, image classification, music categorization etc. This paper introduces the task of multi-label classification, methods for multi-label classification and evolution measure for multi-label classification. Also, comparative analysis of multi label classification methods on the basis of theoretical study, and then on the basis of simulation was done on various data sets.Keywords: binary relevance, concept drift, data stream mining, MLSC, multiple window with buffer
Procedia PDF Downloads 58425950 Association Rules Mining and NOSQL Oriented Document in Big Data
Authors: Sarra Senhadji, Imene Benzeguimi, Zohra Yagoub
Abstract:
Big Data represents the recent technology of manipulating voluminous and unstructured data sets over multiple sources. Therefore, NOSQL appears to handle the problem of unstructured data. Association rules mining is one of the popular techniques of data mining to extract hidden relationship from transactional databases. The algorithm for finding association dependencies is well-solved with Map Reduce. The goal of our work is to reduce the time of generating of frequent itemsets by using Map Reduce and NOSQL database oriented document. A comparative study is given to evaluate the performances of our algorithm with the classical algorithm Apriori.Keywords: Apriori, Association rules mining, Big Data, Data Mining, Hadoop, MapReduce, MongoDB, NoSQL
Procedia PDF Downloads 16025949 Business Intelligence for Profiling of Telecommunication Customer
Authors: Rokhmatul Insani, Hira Laksmiwati Soemitro
Abstract:
Business Intelligence is a methodology that exploits the data to produce information and knowledge systematically, business intelligence can support the decision-making process. Some methods in business intelligence are data warehouse and data mining. A data warehouse can store historical data from transactional data. For data modelling in data warehouse, we apply dimensional modelling by Kimball. While data mining is used to extracting patterns from the data and get insight from the data. Data mining has many techniques, one of which is segmentation. For profiling of telecommunication customer, we use customer segmentation according to customer’s usage of services, customer invoice and customer payment. Customers can be grouped according to their characteristics and can be identified the profitable customers. We apply K-Means Clustering Algorithm for segmentation. The input variable for that algorithm we use RFM (Recency, Frequency and Monetary) model. All process in data mining, we use tools IBM SPSS modeller.Keywords: business intelligence, customer segmentation, data warehouse, data mining
Procedia PDF Downloads 48325948 Indexing and Incremental Approach Using Map Reduce Bipartite Graph (MRBG) for Mining Evolving Big Data
Authors: Adarsh Shroff
Abstract:
Big data is a collection of dataset so large and complex that it becomes difficult to process using data base management tools. To perform operations like search, analysis, visualization on big data by using data mining; which is the process of extraction of patterns or knowledge from large data set. In recent years, the data mining applications become stale and obsolete over time. Incremental processing is a promising approach to refreshing mining results. It utilizes previously saved states to avoid the expense of re-computation from scratch. This project uses i2MapReduce, an incremental processing extension to Map Reduce, the most widely used framework for mining big data. I2MapReduce performs key-value pair level incremental processing rather than task level re-computation, supports not only one-step computation but also more sophisticated iterative computation, which is widely used in data mining applications, and incorporates a set of novel techniques to reduce I/O overhead for accessing preserved fine-grain computation states. To optimize the mining results, evaluate i2MapReduce using a one-step algorithm and three iterative algorithms with diverse computation characteristics for efficient mining.Keywords: big data, map reduce, incremental processing, iterative computation
Procedia PDF Downloads 35025947 A Review on Existing Challenges of Data Mining and Future Research Perspectives
Authors: Hema Bhardwaj, D. Srinivasa Rao
Abstract:
Technology for analysing, processing, and extracting meaningful data from enormous and complicated datasets can be termed as "big data." The technique of big data mining and big data analysis is extremely helpful for business movements such as making decisions, building organisational plans, researching the market efficiently, improving sales, etc., because typical management tools cannot handle such complicated datasets. Special computational and statistical issues, such as measurement errors, noise accumulation, spurious correlation, and storage and scalability limitations, are brought on by big data. These unique problems call for new computational and statistical paradigms. This research paper offers an overview of the literature on big data mining, its process, along with problems and difficulties, with a focus on the unique characteristics of big data. Organizations have several difficulties when undertaking data mining, which has an impact on their decision-making. Every day, terabytes of data are produced, yet only around 1% of that data is really analyzed. The idea of the mining and analysis of data and knowledge discovery techniques that have recently been created with practical application systems is presented in this study. This article's conclusion also includes a list of issues and difficulties for further research in the area. The report discusses the management's main big data and data mining challenges.Keywords: big data, data mining, data analysis, knowledge discovery techniques, data mining challenges
Procedia PDF Downloads 11025946 Review of Different Machine Learning Algorithms
Authors: Syed Romat Ali Shah, Bilal Shoaib, Saleem Akhtar, Munib Ahmad, Shahan Sadiqui
Abstract:
Classification is a data mining technique, which is recognizedon Machine Learning (ML) algorithm. It is used to classifythe individual articlein a knownofinformation into a set of predefinemodules or group. Web mining is also a portion of that sympathetic of data mining methods. The main purpose of this paper to analysis and compare the performance of Naïve Bayse Algorithm, Decision Tree, K-Nearest Neighbor (KNN), Artificial Neural Network (ANN)and Support Vector Machine (SVM). This paper consists of different ML algorithm and their advantages and disadvantages and also define research issues.Keywords: Data Mining, Web Mining, classification, ML Algorithms
Procedia PDF Downloads 30325945 Data Mining As A Tool For Knowledge Management: A Review
Authors: Maram Saleh
Abstract:
Knowledge has become an essential resource in today’s economy and become the most important asset of maintaining competition advantage in organizations. The importance of knowledge has made organizations to manage their knowledge assets and resources through all multiple knowledge management stages such as: Knowledge Creation, knowledge storage, knowledge sharing and knowledge use. Researches on data mining are continues growing over recent years on both business and educational fields. Data mining is one of the most important steps of the knowledge discovery in databases process aiming to extract implicit, unknown but useful knowledge and it is considered as significant subfield in knowledge management. Data miming have the great potential to help organizations to focus on extracting the most important information on their data warehouses. Data mining tools and techniques can predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. This review paper explores the applications of data mining techniques in supporting knowledge management process as an effective knowledge discovery technique. In this paper, we identify the relationship between data mining and knowledge management, and then focus on introducing some application of date mining techniques in knowledge management for some real life domains.Keywords: Data Mining, Knowledge management, Knowledge discovery, Knowledge creation.
Procedia PDF Downloads 20825944 Sensor Data Analysis for a Large Mining Major
Authors: Sudipto Shanker Dasgupta
Abstract:
One of the largest mining companies wanted to look at health analytics for their driverless trucks. These trucks were the key to their supply chain logistics. The automated trucks had multi-level sub-assemblies which would send out sensor information. The use case that was worked on was to capture the sensor signal from the truck subcomponents and analyze the health of the trucks from repair and replacement purview. Open source software was used to stream the data into a clustered Hadoop setup in Amazon Web Services cloud and Apache Spark SQL was used to analyze the data. All of this was achieved through a 10 node amazon 32 core, 64 GB RAM setup real-time analytics was achieved on ‘300 million records’. To check the scalability of the system, the cluster was increased to 100 node setup. This talk will highlight how Open Source software was used to achieve the above use case and the insights on the high data throughput on a cloud set up.Keywords: streaming analytics, data science, big data, Hadoop, high throughput, sensor data
Procedia PDF Downloads 40425943 Recognizing Customer Preferences Using Review Documents: A Hybrid Text and Data Mining Approach
Authors: Oshin Anand, Atanu Rakshit
Abstract:
The vast increment in the e-commerce ventures makes this area a prominent research stream. Besides several quantified parameters, the textual content of reviews is a storehouse of many information that can educate companies and help them earn profit. This study is an attempt in this direction. The article attempts to categorize data based on a computed metric that quantifies the influencing capacity of reviews rendering two categories of high and low influential reviews. Further, each of these document is studied to conclude several product feature categories. Each of these categories along with the computed metric is converted to linguistic identifiers and are used in an association mining model. The article makes a novel attempt to combine feature attraction with quantified metric to categorize review text and finally provide frequent patterns that depict customer preferences. Frequent mentions in a highly influential score depict customer likes or preferred features in the product whereas prominent pattern in low influencing reviews highlights what is not important for customers. This is achieved using a hybrid approach of text mining for feature and term extraction, sentiment analysis, multicriteria decision-making technique and association mining model.Keywords: association mining, customer preference, frequent pattern, online reviews, text mining
Procedia PDF Downloads 38825942 Review and Comparison of Associative Classification Data Mining Approaches
Authors: Suzan Wedyan
Abstract:
Data mining is one of the main phases in the Knowledge Discovery Database (KDD) which is responsible of finding hidden and useful knowledge from databases. There are many different tasks for data mining including regression, pattern recognition, clustering, classification, and association rule. In recent years a promising data mining approach called associative classification (AC) has been proposed, AC integrates classification and association rule discovery to build classification models (classifiers). This paper surveys and critically compares several AC algorithms with reference of the different procedures are used in each algorithm, such as rule learning, rule sorting, rule pruning, classifier building, and class allocation for test cases.Keywords: associative classification, classification, data mining, learning, rule ranking, rule pruning, prediction
Procedia PDF Downloads 537