Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 16

Search results for: Apriori

16 RA-Apriori: An Efficient and Faster MapReduce-Based Algorithm for Frequent Itemset Mining on Apache Flink

Authors: Sanjay Rathee, Arti Kashyap

Abstract:

Extraction of useful information from large datasets is one of the most important research problems. Association rule mining is one of the best methods for this purpose. Finding possible associations between items in large transaction based datasets (finding frequent patterns) is most important part of the association rule mining. There exist many algorithms to find frequent patterns but Apriori algorithm always remains a preferred choice due to its ease of implementation and natural tendency to be parallelized. Many single-machine based Apriori variants exist but massive amount of data available these days is above capacity of a single machine. Therefore, to meet the demands of this ever-growing huge data, there is a need of multiple machines based Apriori algorithm. For these types of distributed applications, MapReduce is a popular fault-tolerant framework. Hadoop is one of the best open-source software frameworks with MapReduce approach for distributed storage and distributed processing of huge datasets using clusters built from commodity hardware. However, heavy disk I/O operation at each iteration of a highly iterative algorithm like Apriori makes Hadoop inefficient. A number of MapReduce-based platforms are being developed for parallel computing in recent years. Among them, two platforms, namely, Spark and Flink have attracted a lot of attention because of their inbuilt support to distributed computations. Earlier we proposed a reduced- Apriori algorithm on Spark platform which outperforms parallel Apriori, one because of use of Spark and secondly because of the improvement we proposed in standard Apriori. Therefore, this work is a natural sequel of our work and targets on implementing, testing and benchmarking Apriori and Reduced-Apriori and our new algorithm ReducedAll-Apriori on Apache Flink and compares it with Spark implementation. Flink, a streaming dataflow engine, overcomes disk I/O bottlenecks in MapReduce, providing an ideal platform for distributed Apriori. Flink's pipelining based structure allows starting a next iteration as soon as partial results of earlier iteration are available. Therefore, there is no need to wait for all reducers result to start a next iteration. We conduct in-depth experiments to gain insight into the effectiveness, efficiency and scalability of the Apriori and RA-Apriori algorithm on Flink.

Keywords: apriori, apache flink, Mapreduce, spark, Hadoop, R-Apriori, frequent itemset mining

Procedia PDF Downloads 178
15 Knowledge-Driven Decision Support System Based on Knowledge Warehouse and Data Mining by Improving Apriori Algorithm with Fuzzy Logic

Authors: Pejman Hosseinioun, Hasan Shakeri, Ghasem Ghorbanirostam

Abstract:

In recent years, we have seen an increasing importance of research and study on knowledge source, decision support systems, data mining and procedure of knowledge discovery in data bases and it is considered that each of these aspects affects the others. In this article, we have merged information source and knowledge source to suggest a knowledge based system within limits of management based on storing and restoring of knowledge to manage information and improve decision making and resources. In this article, we have used method of data mining and Apriori algorithm in procedure of knowledge discovery one of the problems of Apriori algorithm is that, a user should specify the minimum threshold for supporting the regularity. Imagine that a user wants to apply Apriori algorithm for a database with millions of transactions. Definitely, the user does not have necessary knowledge of all existing transactions in that database, and therefore cannot specify a suitable threshold. Our purpose in this article is to improve Apriori algorithm. To achieve our goal, we tried using fuzzy logic to put data in different clusters before applying the Apriori algorithm for existing data in the database and we also try to suggest the most suitable threshold to the user automatically.

Keywords: decision support system, data mining, knowledge discovery, data discovery, fuzzy logic

Procedia PDF Downloads 256
14 Spatio-Temporal Data Mining with Association Rules for Lake Van

Authors: Tolga Aydin, M. Fatih Alaeddinoğlu

Abstract:

People, throughout the history, have made estimates and inferences about the future by using their past experiences. Developing information technologies and the improvements in the database management systems make it possible to extract useful information from knowledge in hand for the strategic decisions. Therefore, different methods have been developed. Data mining by association rules learning is one of such methods. Apriori algorithm, one of the well-known association rules learning algorithms, is not commonly used in spatio-temporal data sets. However, it is possible to embed time and space features into the data sets and make Apriori algorithm a suitable data mining technique for learning spatio-temporal association rules. Lake Van, the largest lake of Turkey, is a closed basin. This feature causes the volume of the lake to increase or decrease as a result of change in water amount it holds. In this study, evaporation, humidity, lake altitude, amount of rainfall and temperature parameters recorded in Lake Van region throughout the years are used by the Apriori algorithm and a spatio-temporal data mining application is developed to identify overflows and newly-formed soil regions (underflows) occurring in the coastal parts of Lake Van. Identifying possible reasons of overflows and underflows may be used to alert the experts to take precautions and make the necessary investments.

Keywords: apriori algorithm, association rules, data mining, spatio-temporal data

Procedia PDF Downloads 295
13 Analysis on Thermococcus achaeans with Frequent Pattern Mining

Authors: Jeongyeob Hong, Myeonghoon Park, Taeson Yoon

Abstract:

After the advent of Achaeans which utilize different metabolism pathway and contain conspicuously different cellular structure, they have been recognized as possible materials for developing quality of human beings. Among diverse Achaeans, in this paper, we compared 16s RNA Sequences of four different species of Thermococcus: Achaeans genus specialized in sulfur-dealing metabolism. Four Species, Barophilus, Kodakarensis, Hydrothermalis, and Onnurineus, live near the hydrothermal vent that emits extreme amount of sulfur and heat. By comparing ribosomal sequences of aforementioned four species, we found similarities in their sequences and expressed protein, enabling us to expect that certain ribosomal sequence or proteins are vital for their survival. Apriori algorithms and Decision Tree were used. for comparison.

Keywords: Achaeans, Thermococcus, apriori algorithm, decision tree

Procedia PDF Downloads 220
12 Association Rules Mining and NOSQL Oriented Document in Big Data

Authors: Sarra Senhadji, Imene Benzeguimi, Zohra Yagoub

Abstract:

Big Data represents the recent technology of manipulating voluminous and unstructured data sets over multiple sources. Therefore, NOSQL appears to handle the problem of unstructured data. Association rules mining is one of the popular techniques of data mining to extract hidden relationship from transactional databases. The algorithm for finding association dependencies is well-solved with Map Reduce. The goal of our work is to reduce the time of generating of frequent itemsets by using Map Reduce and NOSQL database oriented document. A comparative study is given to evaluate the performances of our algorithm with the classical algorithm Apriori.

Keywords: Apriori, Association rules mining, Big Data, Data Mining, Hadoop, MapReduce, MongoDB, NoSQL

Procedia PDF Downloads 56
11 Developing a Recommendation Library System based on Android Application

Authors: Kunyanuth Kularbphettong, Kunnika Tenprakhon, Pattarapan Roonrakwit

Abstract:

In this paper, we present a recommendation library application on Android system. The objective of this system is to support and advice user to use library resources based on mobile application. We describe the design approaches and functional components of this system. The system was developed based on under association rules, Apriori algorithm. In this project, it was divided the result by the research purposes into 2 parts: developing the Mobile application for online library service and testing and evaluating the system. Questionnaires were used to measure user satisfaction with system usability by specialists and users. The results were satisfactory both specialists and users.

Keywords: online library, Apriori algorithm, Android application, black box

Procedia PDF Downloads 396
10 Analysis of Users’ Behavior on Book Loan Log Based on Association Rule Mining

Authors: Kanyarat Bussaban, Kunyanuth Kularbphettong

Abstract:

This research aims to create a model for analysis of student behavior using Library resources based on data mining technique in case of Suan Sunandha Rajabhat University. The model was created under association rules, apriori algorithm. The results were found 14 rules and the rules were tested with testing data set and it showed that the ability of classify data was 79.24 percent and the MSE was 22.91. The results showed that the user’s behavior model by using association rule technique can use to manage the library resources.

Keywords: behavior, data mining technique, a priori algorithm, knowledge discovery

Procedia PDF Downloads 329
9 Improved FP-Growth Algorithm with Multiple Minimum Supports Using Maximum Constraints

Authors: Elsayeda M. Elgaml, Dina M. Ibrahim, Elsayed A. Sallam

Abstract:

Association rule mining is one of the most important fields of data mining and knowledge discovery. In this paper, we propose an efficient multiple support frequent pattern growth algorithm which we called “MSFP-growth” that enhancing the FP-growth algorithm by making infrequent child node pruning step with multiple minimum support using maximum constrains. The algorithm is implemented, and it is compared with other common algorithms: Apriori-multiple minimum supports using maximum constraints and FP-growth. The experimental results show that the rule mining from the proposed algorithm are interesting and our algorithm achieved better performance than other algorithms without scarifying the accuracy.

Keywords: association rules, FP-growth, multiple minimum supports, Weka tool

Procedia PDF Downloads 368
8 Frequent-Pattern Tree Algorithm Application to S&P and Equity Indexes

Authors: E. Younsi, H. Andriamboavonjy, A. David, S. Dokou, B. Lemrabet

Abstract:

Software and time optimization are very important factors in financial markets, which are competitive fields, and emergence of new computer tools further stresses the challenge. In this context, any improvement of technical indicators which generate a buy or sell signal is a major issue. Thus, many tools have been created to make them more effective. This worry about efficiency has been leading in present paper to seek best (and most innovative) way giving largest improvement in these indicators. The approach consists in attaching a signature to frequent market configurations by application of frequent patterns extraction method which is here most appropriate to optimize investment strategies. The goal of proposed trading algorithm is to find most accurate signatures using back testing procedure applied to technical indicators for improving their performance. The problem is then to determine the signatures which, combined with an indicator, outperform this indicator alone. To do this, the FP-Tree algorithm has been preferred, as it appears to be the most efficient algorithm to perform this task.

Keywords: quantitative analysis, back-testing, computational models, apriori algorithm, pattern recognition, data mining, FP-tree

Procedia PDF Downloads 280
7 Application of Association Rule Using Apriori Algorithm for Analysis of Industrial Accidents in 2013-2014 in Indonesia

Authors: Triano Nurhikmat

Abstract:

Along with the progress of science and technology, the development of the industrialized world in Indonesia took place very rapidly. This leads to a process of industrialization of society Indonesia faster with the establishment of the company and the workplace are diverse. Development of the industry relates to the activity of the worker. Where in these work activities do not cover the possibility of an impending crash on either the workers or on a construction project. The cause of the occurrence of industrial accidents was the fault of electrical damage, work procedures, and error technique. The method of an association rule is one of the main techniques in data mining and is the most common form used in finding the patterns of data collection. In this research would like to know how relations of the association between the incidence of any industrial accidents. Therefore, by using methods of analysis association rule patterns associated with combination obtained two iterations item set (2 large item set) when every factor of industrial accidents with a West Jakarta so industrial accidents caused by the occurrence of an electrical value damage = 0.2 support and confidence value = 1, and the reverse pattern with value = 0.2 support and confidence = 0.75.

Keywords: association rule, data mining, industrial accidents, rules

Procedia PDF Downloads 195
6 Cloud Computing in Data Mining: A Technical Survey

Authors: Ghaemi Reza, Abdollahi Hamid, Dashti Elham

Abstract:

Cloud computing poses a diversity of challenges in data mining operation arising out of the dynamic structure of data distribution as against the use of typical database scenarios in conventional architecture. Due to immense number of users seeking data on daily basis, there is a serious security concerns to cloud providers as well as data providers who put their data on the cloud computing environment. Big data analytics use compute intensive data mining algorithms (Hidden markov, MapReduce parallel programming, Mahot Project, Hadoop distributed file system, K-Means and KMediod, Apriori) that require efficient high performance processors to produce timely results. Data mining algorithms to solve or optimize the model parameters. The challenges that operation has to encounter is the successful transactions to be established with the existing virtual machine environment and the databases to be kept under the control. Several factors have led to the distributed data mining from normal or centralized mining. The approach is as a SaaS which uses multi-agent systems for implementing the different tasks of system. There are still some problems of data mining based on cloud computing, including design and selection of data mining algorithms.

Keywords: cloud computing, data mining, computing models, cloud services

Procedia PDF Downloads 375
5 Analyzing Medical Workflows Using Market Basket Analysis

Authors: Mohit Kumar, Mayur Betharia

Abstract:

Healthcare domain, with the emergence of Electronic Medical Record (EMR), collects a lot of data which have been attracting Data Mining expert’s interest. In the past, doctors have relied on their intuition while making critical clinical decisions. This paper presents the means to analyze the Medical workflows to get business insights out of huge dumped medical databases. Market Basket Analysis (MBA) which is a special data mining technique, has been widely used in marketing and e-commerce field to discover the association between products bought together by customers. It helps businesses in increasing their sales by analyzing the purchasing behavior of customers and pitching the right customer with the right product. This paper is an attempt to demonstrate Market Basket Analysis applications in healthcare. In particular, it discusses the Market Basket Analysis Algorithm ‘Apriori’ applications within healthcare in major areas such as analyzing the workflow of diagnostic procedures, Up-selling and Cross-selling of Healthcare Systems, designing healthcare systems more user-friendly. In the paper, we have demonstrated the MBA applications using Angiography Systems, but can be extrapolated to other modalities as well.

Keywords: data mining, market basket analysis, healthcare applications, knowledge discovery in healthcare databases, customer relationship management, healthcare systems

Procedia PDF Downloads 76
4 An Optimized Association Rule Mining Algorithm

Authors: Archana Singh, Jyoti Agarwal, Ajay Rana

Abstract:

Data Mining is an efficient technology to discover patterns in large databases. Association Rule Mining techniques are used to find the correlation between the various item sets in a database, and this co-relation between various item sets are used in decision making and pattern analysis. In recent years, the problem of finding association rules from large datasets has been proposed by many researchers. Various research papers on association rule mining (ARM) are studied and analyzed first to understand the existing algorithms. Apriori algorithm is the basic ARM algorithm, but it requires so many database scans. In DIC algorithm, less amount of database scan is needed but complex data structure lattice is used. The main focus of this paper is to propose a new optimized algorithm (Friendly Algorithm) and compare its performance with the existing algorithms A data set is used to find out frequent itemsets and association rules with the help of existing and proposed (Friendly Algorithm) and it has been observed that the proposed algorithm also finds all the frequent itemsets and essential association rules from databases as compared to existing algorithms in less amount of database scan. In the proposed algorithm, an optimized data structure is used i.e. Graph and Adjacency Matrix.

Keywords: association rules, data mining, dynamic item set counting, FP-growth, friendly algorithm, graph

Procedia PDF Downloads 332
3 J-Integral Method for Assessment of Structural Integrity of a Pressure Vessel

Authors: Karthik K. R, Viswanath V, Asraff A. K

Abstract:

The first stage of a new-generation launch vehicle of ISRO makes use of large pressure vessels made of Aluminium alloy AA2219 to store fuel and oxidizer. These vessels have many weld joints that may contain cracks or crack-like defects during their fabrication. These defects may propagate across the vessel during pressure testing or while in service under the influence of tensile stresses leading to catastrophe. Though ductile materials exhibit significant stable crack growth prior to failure, it is not generally acceptable for an aerospace component. There is a need to predict the initiation of stable crack growth. The structural integrity of the vessel from fracture considerations can be studied by constructing the Failure Assessment Diagram (FAD) that accounts for both brittle fracture and plastic collapse. Critical crack sizes of the pressure vessel may be highly conservative if it is predicted from FAD alone. If the J-R curve for material under consideration is available apriori, the critical crack sizes can be predicted to a certain degree of accuracy. In this paper, a novel approach is proposed to predict the integrity of a weld in a pressure vessel made of AA2219 material. Fracture parameter ‘J-integral’ at the crack front, evaluated through finite element analyses, is used in the new procedure. Based on the simulation of tension tests carried out on SCT specimens by NASA, a cut-off value of J-integral value (J?ᵤₜ_ₒ??) is finalised. For the pressure vessel, J-integral at the crack front is evaluated through FE simulations incorporating different surface cracks at long seam weld in a cylinder and in dome petal welds. The obtained J-integral, at vessel level, is compared with a value of J?ᵤₜ_ₒ??, and the integrity of vessel weld in the presence of the surface crack is firmed up. The advantage of this methodology is that if SCT test data of any metal is available, the critical crack size in hardware fabricated using that material can be predicted to a better level of accuracy.

Keywords: FAD, j-integral, fracture, surface crack

Procedia PDF Downloads 95
2 Damage Detection in a Cantilever Beam under Different Excitation and Temperature Conditions

Authors: A. Kyprianou, A. Tjirkallis

Abstract:

Condition monitoring of structures in service is very important as it provides information about the risk of damage development. One of the essential constituents of structural condition monitoring is the damage detection methodology. In the context of condition monitoring of in service structures a damage detection methodology analyses data obtained from the structure while it is in operation. Usually, this means that the data could be affected by operational and environmental conditions in a way that could mask the effects of a possible damage on the data. This, depending on the damage detection methodology, could lead to either false alarms or miss existing damages. In this article a damage detection methodology that is based on the Spatio-temporal continuous wavelet transform (SPT-CWT) analysis of a sequence of experimental time responses of a cantilever beam is proposed. The cantilever is subjected to white and pink noise excitation to simulate different operating conditions. In addition, in order to simulate changing environmental conditions, the cantilever is subjected to heating by a heat gun. The response of the cantilever beam is measured by a high-speed camera. Edges are extracted from the series of images of the beam response captured by the camera. Subsequent processing of the edges gives a series of time responses on 439 points on the beam. This sequence is then analyzed using the SPT-CWT to identify damage. The algorithm proposed was able to clearly identify damage under any condition when the structure was excited by white noise force. In addition, in the case of white noise excitation, the analysis could also reveal the position of the heat gun when it was used to heat the structure. The analysis could identify the different operating conditions i.e. between responses due to white noise excitation and responses due to pink noise excitation. During the pink noise excitation whereas damage and changing temperature were identified it was not possible to clearly identify the effect of damage from that of temperature. The methodology proposed in this article for damage detection enables the separation the damage effect from that due to temperature and excitation on data obtained from measurements of a cantilever beam. This methodology does not require information about the apriori state of the structure.

Keywords: spatiotemporal continuous wavelet transform, damage detection, data normalization, varying temperature

Procedia PDF Downloads 208
1 Affects Associations Analysis in Emergency Situations

Authors: Joanna Grzybowska, Magdalena Igras, Mariusz Ziółko

Abstract:

Association rule learning is an approach for discovering interesting relationships in large databases. The analysis of relations, invisible at first glance, is a source of new knowledge which can be subsequently used for prediction. We used this data mining technique (which is an automatic and objective method) to learn about interesting affects associations in a corpus of emergency phone calls. We also made an attempt to match revealed rules with their possible situational context. The corpus was collected and subjectively annotated by two researchers. Each of 3306 recordings contains information on emotion: (1) type (sadness, weariness, anxiety, surprise, stress, anger, frustration, calm, relief, compassion, contentment, amusement, joy) (2) valence (negative, neutral, or positive) (3) intensity (low, typical, alternating, high). Also, additional information, that is a clue to speaker’s emotional state, was annotated: speech rate (slow, normal, fast), characteristic vocabulary (filled pauses, repeated words) and conversation style (normal, chaotic). Exponentially many rules can be extracted from a set of items (an item is a previously annotated single information). To generate the rules in the form of an implication X → Y (where X and Y are frequent k-itemsets) the Apriori algorithm was used - it avoids performing needless computations. Then, two basic measures (Support and Confidence) and several additional symmetric and asymmetric objective measures (e.g. Laplace, Conviction, Interest Factor, Cosine, correlation coefficient) were calculated for each rule. Each applied interestingness measure revealed different rules - we selected some top rules for each measure. Owing to the specificity of the corpus (emergency situations), most of the strong rules contain only negative emotions. There are though strong rules including neutral or even positive emotions. Three examples of the strongest rules are: {sadness} → {anxiety}; {sadness, weariness, stress, frustration} → {anger}; {compassion} → {sadness}. Association rule learning revealed the strongest configurations of affects (as well as configurations of affects with affect-related information) in our emergency phone calls corpus. The acquired knowledge can be used for prediction to fulfill the emotional profile of a new caller. Furthermore, a rule-related possible context analysis may be a clue to the situation a caller is in.

Keywords: data mining, emergency phone calls, emotional profiles, rules

Procedia PDF Downloads 336