Search results for: multidimensional process mining
5957 Object-Centric Process Mining Using Process Cubes
Authors: Anahita Farhang Ghahfarokhi, Alessandro Berti, Wil M.P. van der Aalst
Abstract:
Process mining provides ways to analyze business processes. Common process mining techniques consider the process as a whole. However, in real-life business processes, different behaviors exist that make the overall process too complex to interpret. Process comparison is a branch of process mining that isolates different behaviors of the process from each other by using process cubes. Process cubes organize event data using different dimensions. Each cell contains a set of events that can be used as an input to apply process mining techniques. Existing work on process cubes assume single case notions. However, in real processes, several case notions (e.g., order, item, package, etc.) are intertwined. Object-centric process mining is a new branch of process mining addressing multiple case notions in a process. To make a bridge between object-centric process mining and process comparison, we propose a process cube framework, which supports process cube operations such as slice and dice on object-centric event logs. To facilitate the comparison, the framework is integrated with several object-centric process discovery approaches.Keywords: Process mining, multidimensional process mining, multi-perspective business processes, OLAP, process cubes, process discovery.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11195956 Application and Limitation of Parallel Modelingin Multidimensional Sequential Pattern
Authors: Mahdi Esmaeili, Mansour Tarafdar
Abstract:
The goal of data mining algorithms is to discover useful information embedded in large databases. One of the most important data mining problems is discovery of frequently occurring patterns in sequential data. In a multidimensional sequence each event depends on more than one dimension. The search space is quite large and the serial algorithms are not scalable for very large datasets. To address this, it is necessary to study scalable parallel implementations of sequence mining algorithms. In this paper, we present a model for multidimensional sequence and describe a parallel algorithm based on data parallelism. Simulation experiments show good load balancing and scalable and acceptable speedup over different processors and problem sizes and demonstrate that our approach can works efficiently in a real parallel computing environment.Keywords: Sequential Patterns, Data Mining, ParallelAlgorithm, Multidimensional Sequence Data
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14755955 Multidimensional and Data Mining Analysis for Property Investment Risk Analysis
Authors: Nur Atiqah Rochin Demong, Jie Lu, Farookh Khadeer Hussain
Abstract:
Property investment in the real estate industry has a high risk due to the uncertainty factors that will affect the decisions made and high cost. Analytic hierarchy process has existed for some time in which referred to an expert-s opinion to measure the uncertainty of the risk factors for the risk analysis. Therefore, different level of experts- experiences will create different opinion and lead to the conflict among the experts in the field. The objective of this paper is to propose a new technique to measure the uncertainty of the risk factors based on multidimensional data model and data mining techniques as deterministic approach. The propose technique consist of a basic framework which includes four modules: user, technology, end-user access tools and applications. The property investment risk analysis defines as a micro level analysis as the features of the property will be considered in the analysis in this paper.Keywords: Uncertainty factors, data mining, multidimensional data model, risk analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 29225954 A Web Text Mining Flexible Architecture
Authors: M. Castellano, G. Mastronardi, A. Aprile, G. Tarricone
Abstract:
Text Mining is an important step of Knowledge Discovery process. It is used to extract hidden information from notstructured o semi-structured data. This aspect is fundamental because much of the Web information is semi-structured due to the nested structure of HTML code, much of the Web information is linked, much of the Web information is redundant. Web Text Mining helps whole knowledge mining process to mining, extraction and integration of useful data, information and knowledge from Web page contents. In this paper, we present a Web Text Mining process able to discover knowledge in a distributed and heterogeneous multiorganization environment. The Web Text Mining process is based on flexible architecture and is implemented by four steps able to examine web content and to extract useful hidden information through mining techniques. Our Web Text Mining prototype starts from the recovery of Web job offers in which, through a Text Mining process, useful information for fast classification of the same are drawn out, these information are, essentially, job offer place and skills.Keywords: Web text mining, flexible architecture, knowledgediscovery.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 26635953 Conceptual Multidimensional Model
Authors: Manpreet Singh, Parvinder Singh, Suman
Abstract:
The data is available in abundance in any business organization. It includes the records for finance, maintenance, inventory, progress reports etc. As the time progresses, the data keep on accumulating and the challenge is to extract the information from this data bank. Knowledge discovery from these large and complex databases is the key problem of this era. Data mining and machine learning techniques are needed which can scale to the size of the problems and can be customized to the application of business. For the development of accurate and required information for particular problem, business analyst needs to develop multidimensional models which give the reliable information so that they can take right decision for particular problem. If the multidimensional model does not possess the advance features, the accuracy cannot be expected. The present work involves the development of a Multidimensional data model incorporating advance features. The criterion of computation is based on the data precision and to include slowly change time dimension. The final results are displayed in graphical form.Keywords: Multidimensional, data precision.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14565952 Redesigning Business Processes: A Method Based on Simulation and Process Mining Techniques
Authors: Zahra Mohammadnazari, Fateme Rostambeygi, Fatemeh Dehrouyeh, Hwang Ki-Soon, Amir Aghsami
Abstract:
Corporations have always prioritized efforts to examine and improve processes. Various metrics, such as the cost and time required to implement the process and can be specified in this regard. Process improvement can be defined as an improvement of these indicators. This is accomplished by looking at prospective adjustments to the current executive process model or the resources allotted to it. Research has been conducted in this paper to the improve the procurement process and aims to explore assessment prospects in the project using a combination of process mining and simulation (benefiting from Play-In and Play-Out methodologies). To run the simulation, we will need to complete the control flow diagram, institution settings, resource settings, and activity settings. The process of mining event logs yields the process control flow. However, both the entry of institutions and the distribution of resources must be modeled. The rate of admission of institutions and the distribution of time for the implementation of activities will be determined in the next step.
Keywords: Business reengineering, Petri net, process-based simulation, process mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4825951 A Simplified and Effective Algorithm Used to Mine Similar Processes: An Illustrated Example
Authors: Min-Hsun Kuo, Yun-Shiow Chen
Abstract:
The running logs of a process hold valuable information about its executed activity behavior and generated activity logic structure. Theses informative logs can be extracted, analyzed and utilized to improve the efficiencies of the process's execution and conduction. One of the techniques used to accomplish the process improvement is called as process mining. To mine similar processes is such an improvement mission in process mining. Rather than directly mining similar processes using a single comparing coefficient or a complicate fitness function, this paper presents a simplified heuristic process mining algorithm with two similarity comparisons that are able to relatively conform the activity logic sequences (traces) of mining processes with those of a normalized (regularized) one. The relative process conformance is to find which of the mining processes match the required activity sequences and relationships, further for necessary and sufficient applications of the mined processes to process improvements. One similarity presented is defined by the relationships in terms of the number of similar activity sequences existing in different processes; another similarity expresses the degree of the similar (identical) activity sequences among the conforming processes. Since these two similarities are with respect to certain typical behavior (activity sequences) occurred in an entire process, the common problems, such as the inappropriateness of an absolute comparison and the incapability of an intrinsic information elicitation, which are often appeared in other process conforming techniques, can be solved by the relative process comparison presented in this paper. To demonstrate the potentiality of the proposed algorithm, a numerical example is illustrated.Keywords: process mining, process similarity, artificial intelligence, process conformance.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14425950 CSOLAP (Continuous Spatial On-Line Analytical Processing)
Authors: Taher Omran Ahmed, Abdullatif Mihdi Buras
Abstract:
Decision support systems are usually based on multidimensional structures which use the concept of hypercube. Dimensions are the axes on which facts are analyzed and form a space where a fact is located by a set of coordinates at the intersections of members of dimensions. Conventional multidimensional structures deal with discrete facts linked to discrete dimensions. However, when dealing with natural continuous phenomena the discrete representation is not adequate. There is a need to integrate spatiotemporal continuity within multidimensional structures to enable analysis and exploration of continuous field data. Research issues that lead to the integration of spatiotemporal continuity in multidimensional structures are numerous. In this paper, we discuss research issues related to the integration of continuity in multidimensional structures, present briefly a multidimensional model for continuous field data. We also define new aggregation operations. The model and the associated operations and measures are validated by a prototype.Keywords: Continuous Data, Data warehousing, DecisionSupport, SOLAP
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15945949 A Comparative Analysis of Different Web Content Mining Tools
Authors: T. Suresh Kumar, M. Arthanari, N. Shanthi
Abstract:
Nowadays, the Web has become one of the most pervasive platforms for information change and retrieval. It collects the suitable and perfectly fitting information from websites that one requires. Data mining is the form of extracting data’s available in the internet. Web mining is one of the elements of data mining Technique, which relates to various research communities such as information recovery, folder managing system and simulated intellects. In this Paper we have discussed the concepts of Web mining. We contain generally focused on one of the categories of Web mining, specifically the Web Content Mining and its various farm duties. The mining tools are imperative to scanning the many images, text, and HTML documents and then, the result is used by the various search engines. We conclude by presenting a comparative table of these tools based on some pertinent criteria.
Keywords: Data Mining, Web Mining, Web Content Mining, Mining Tools, Information retrieval.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 35525948 Bottom Up Text Mining through Hierarchical Document Representation
Authors: Y. Djouadi., F. Souam.
Abstract:
Most of the existing text mining approaches are proposed, keeping in mind, transaction databases model. Thus, the mined dataset is structured using just one concept: the “transaction", whereas the whole dataset is modeled using the “set" abstract type. In such cases, the structure of the whole dataset and the relationships among the transactions themselves are not modeled and consequently, not considered in the mining process. We believe that taking into account structure properties of hierarchically structured information (e.g. textual document, etc ...) in the mining process, can leads to best results. For this purpose, an hierarchical associations rule mining approach for textual documents is proposed in this paper and the classical set-oriented mining approach is reconsidered profits to a Direct Acyclic Graph (DAG) oriented approach. Natural languages processing techniques are used in order to obtain the DAG structure. Based on this graph model, an hierarchical bottom up algorithm is proposed. The main idea is that each node is mined with its parent node.Keywords: Graph based association rules mining, Hierarchical document structure, Text mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20565947 Multidimensional Visualization Tools for Analysis of Expression Data
Authors: Urska Cvek, Marjan Trutschl, Randolph Stone II, Zanobia Syed, John L. Clifford, Anita L. Sabichi
Abstract:
Expression data analysis is based mostly on the statistical approaches that are indispensable for the study of biological systems. Large amounts of multidimensional data resulting from the high-throughput technologies are not completely served by biostatistical techniques and are usually complemented with visual, knowledge discovery and other computational tools. In many cases, in biological systems we only speculate on the processes that are causing the changes, and it is the visual explorative analysis of data during which a hypothesis is formed. We would like to show the usability of multidimensional visualization tools and promote their use in life sciences. We survey and show some of the multidimensional visualization tools in the process of data exploration, such as parallel coordinates and radviz and we extend them by combining them with the self-organizing map algorithm. We use a time course data set of transitional cell carcinoma of the bladder in our examples. Analysis of data with these tools has the potential to uncover additional relationships and non-trivial structures.Keywords: microarrays, visualization, parallel coordinates, radviz, self-organizing maps.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 25075946 Multidimensional Compromise Programming Evaluation of Digital Commerce Websites
Authors: C. Ardil
Abstract:
Multidimensional compromise programming evaluation of digital commerce websites is essential not only to have recommendations for improvement, but also to make comparisons with global business competitors. This research provides a multidimensional decision making model that prioritizes the objective criteria weights of various commerce websites using multidimensional compromise solution. Evaluation of digital commerce website quality can be considered as a complex information system structure including qualitative and quantitative factors for a multicriteria decision making problem. The proposed multicriteria decision making approach mainly consists of three sequential steps for the selection problem. In the first step, three major different evaluation criteria are characterized for website ranking problem. In the second step, identified critical criteria are weighted using the standard deviation procedure. In the third step, the multidimensional compromise programming is applied to rank the digital commerce websites.
Keywords: Standard deviation, commerce website, website evaluation, multicriteria decision making, multicriteria compromise programming, website quality, multidimensional decision analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8165945 An Efficient Data Mining Approach on Compressed Transactions
Authors: Jia-Yu Dai, Don-Lin Yang, Jungpin Wu, Ming-Chuan Hung
Abstract:
In an era of knowledge explosion, the growth of data increases rapidly day by day. Since data storage is a limited resource, how to reduce the data space in the process becomes a challenge issue. Data compression provides a good solution which can lower the required space. Data mining has many useful applications in recent years because it can help users discover interesting knowledge in large databases. However, existing compression algorithms are not appropriate for data mining. In [1, 2], two different approaches were proposed to compress databases and then perform the data mining process. However, they all lack the ability to decompress the data to their original state and improve the data mining performance. In this research a new approach called Mining Merged Transactions with the Quantification Table (M2TQT) was proposed to solve these problems. M2TQT uses the relationship of transactions to merge related transactions and builds a quantification table to prune the candidate itemsets which are impossible to become frequent in order to improve the performance of mining association rules. The experiments show that M2TQT performs better than existing approaches.Keywords: Association rule, data mining, merged transaction, quantification table.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19595944 Preliminary Overview of Data Mining Technology for Knowledge Management System in Institutions of Higher Learning
Authors: Muslihah Wook, Zawiyah M. Yusof, Mohd Zakree Ahmad Nazri
Abstract:
Data mining has been integrated into application systems to enhance the quality of the decision-making process. This study aims to focus on the integration of data mining technology and Knowledge Management System (KMS), due to the ability of data mining technology to create useful knowledge from large volumes of data. Meanwhile, KMS vitally support the creation and use of knowledge. The integration of data mining technology and KMS are popularly used in business for enhancing and sustaining organizational performance. However, there is a lack of studies that applied data mining technology and KMS in the education sector; particularly students- academic performance since this could reflect the IHL performance. Realizing its importance, this study seeks to integrate data mining technology and KMS to promote an effective management of knowledge within IHLs. Several concepts from literature are adapted, for proposing the new integrative data mining technology and KMS framework to an IHL.
Keywords: Data mining, Institutions of Higher Learning, Knowledge Management System, Students' academic performance.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21425943 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity
Authors: Hoda A. Abdel Hafez
Abstract:
Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.Keywords: Mining Big Data, Big Data, Machine learning, Data Streams, Telecommunication.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24805942 Multidimensional Compromise Optimization for Development Ranking of the Gulf Cooperation Council Countries and Turkey
Authors: C. Ardil
Abstract:
In this research, a multidimensional compromise optimization method is proposed for multidimensional decision making analysis in the development ranking of the Gulf Cooperation Council Countries and Turkey. The proposed approach presents ranking solutions resulting from different multicriteria decision analyses, which yield different ranking orders for the same ranking problem, consisting of a set of alternatives in terms of numerous competing criteria when they are applied with the same numerical data. The multiobjective optimization decision making problem is considered in three sequential steps. In the first step, five different criteria related to the development ranking are gathered from the research field. In the second step, identified evaluation criteria are, objectively, weighted using standard deviation procedure. In the third step, a country selection problem is illustrated with a numerical example as an application of the proposed multidimensional compromise optimization model. Finally, multidimensional compromise optimization approach is applied to rank the Gulf Cooperation Council Countries and Turkey.
Keywords: Standard deviation, performance evaluation, multicriteria decision making, multidimensional compromise optimization, vector normalization, multicriteria decision making, multicriteria analysis, multidimensional decision analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8115941 Proposing an Efficient Method for Frequent Pattern Mining
Authors: Vaibhav Kant Singh, Vijay Shah, Yogendra Kumar Jain, Anupam Shukla, A.S. Thoke, Vinay KumarSingh, Chhaya Dule, Vivek Parganiha
Abstract:
Data mining, which is the exploration of knowledge from the large set of data, generated as a result of the various data processing activities. Frequent Pattern Mining is a very important task in data mining. The previous approaches applied to generate frequent set generally adopt candidate generation and pruning techniques for the satisfaction of the desired objective. This paper shows how the different approaches achieve the objective of frequent mining along with the complexities required to perform the job. This paper will also look for hardware approach of cache coherence to improve efficiency of the above process. The process of data mining is helpful in generation of support systems that can help in Management, Bioinformatics, Biotechnology, Medical Science, Statistics, Mathematics, Banking, Networking and other Computer related applications. This paper proposes the use of both upward and downward closure property for the extraction of frequent item sets which reduces the total number of scans required for the generation of Candidate Sets.Keywords: Data Mining, Candidate Sets, Frequent Item set, Pruning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16815940 Multinomial Dirichlet Gaussian Process Model for Classification of Multidimensional Data
Authors: Wanhyun Cho, Soonja Kang, Sangkyoon Kim, Soonyoung Park
Abstract:
We present probabilistic multinomial Dirichlet classification model for multidimensional data and Gaussian process priors. Here, we have considered efficient computational method that can be used to obtain the approximate posteriors for latent variables and parameters needed to define the multiclass Gaussian process classification model. We first investigated the process of inducing a posterior distribution for various parameters and latent function by using the variational Bayesian approximations and important sampling method, and next we derived a predictive distribution of latent function needed to classify new samples. The proposed model is applied to classify the synthetic multivariate dataset in order to verify the performance of our model. Experiment result shows that our model is more accurate than the other approximation methods.Keywords: Multinomial dirichlet classification model, Gaussian process priors, variational Bayesian approximation, Importance sampling, approximate posterior distribution, Marginal likelihood evidence.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16145939 A New Model for Discovering XML Association Rules from XML Documents
Authors: R. AliMohammadzadeh, M. Rahgozar, A. Zarnani
Abstract:
The inherent flexibilities of XML in both structure and semantics makes mining from XML data a complex task with more challenges compared to traditional association rule mining in relational databases. In this paper, we propose a new model for the effective extraction of generalized association rules form a XML document collection. We directly use frequent subtree mining techniques in the discovery process and do not ignore the tree structure of data in the final rules. The frequent subtrees based on the user provided support are split to complement subtrees to form the rules. We explain our model within multi-steps from data preparation to rule generation.Keywords: XML, Data Mining, Association Rule Mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16305938 Web Application to Profiling Scientific Institutions through Citation Mining
Authors: Hector D. Cortes, Jesus A. del Rio, Esther O. Garcia, Miguel Robles
Abstract:
Recently the use of data mining to scientific bibliographic data bases has been implemented to analyze the pathways of the knowledge or the core scientific relevances of a laureated novel or a country. This specific case of data mining has been named citation mining, and it is the integration of citation bibliometrics and text mining. In this paper we present an improved WEB implementation of statistical physics algorithms to perform the text mining component of citation mining. In particular we use an entropic like distance between the compression of text as an indicator of the similarity between them. Finally, we have included the recently proposed index h to characterize the scientific production. We have used this web implementation to identify users, applications and impact of the Mexican scientific institutions located in the State of Morelos.
Keywords: Citation Mining, Text Mining, Science Impact
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17545937 Revised PLWAP Tree with Non-frequent Items for Mining Sequential Pattern
Authors: R. Vishnu Priya, A. Vadivel
Abstract:
Sequential pattern mining is a challenging task in data mining area with large applications. One among those applications is mining patterns from weblog. Recent times, weblog is highly dynamic and some of them may become absolute over time. In addition, users may frequently change the threshold value during the data mining process until acquiring required output or mining interesting rules. Some of the recently proposed algorithms for mining weblog, build the tree with two scans and always consume large time and space. In this paper, we build Revised PLWAP with Non-frequent Items (RePLNI-tree) with single scan for all items. While mining sequential patterns, the links related to the nonfrequent items are not considered. Hence, it is not required to delete or maintain the information of nodes while revising the tree for mining updated transactions. The algorithm supports both incremental and interactive mining. It is not required to re-compute the patterns each time, while weblog is updated or minimum support changed. The performance of the proposed tree is better, even the size of incremental database is more than 50% of existing one. For evaluation purpose, we have used the benchmark weblog dataset and found that the performance of proposed tree is encouraging compared to some of the recently proposed approaches.
Keywords: Sequential pattern mining, weblog, frequent and non-frequent items, incremental and interactive mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19305936 Survey on Image Mining Using Genetic Algorithm
Authors: Jyoti Dua
Abstract:
One image is worth more than thousand words. Images if analyzed can reveal useful information. Low level image processing deals with the extraction of specific feature from a single image. Now the question arises: What technique should be used to extract patterns of very large and detailed image database? The answer of the question is: “Image Mining”. Image Mining deals with the extraction of image data relationship, implicit knowledge, and another pattern from the collection of images or image database. It is nothing but the extension of Data Mining. In the following paper, not only we are going to scrutinize the current techniques of image mining but also present a new technique for mining images using Genetic Algorithm.
Keywords: Image Mining, Data Mining, Genetic Algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24455935 Frequent Itemset Mining Using Rough-Sets
Authors: Usman Qamar, Younus Javed
Abstract:
Frequent pattern mining is the process of finding a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set. It was proposed in the context of frequent itemsets and association rule mining. Frequent pattern mining is used to find inherent regularities in data. What products were often purchased together? Its applications include basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis. However, one of the bottlenecks of frequent itemset mining is that as the data increase the amount of time and resources required to mining the data increases at an exponential rate. In this investigation a new algorithm is proposed which can be uses as a pre-processor for frequent itemset mining. FASTER (FeAture SelecTion using Entropy and Rough sets) is a hybrid pre-processor algorithm which utilizes entropy and roughsets to carry out record reduction and feature (attribute) selection respectively. FASTER for frequent itemset mining can produce a speed up of 3.1 times when compared to original algorithm while maintaining an accuracy of 71%.
Keywords: Rough-sets, Classification, Feature Selection, Entropy, Outliers, Frequent itemset mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24345934 Weka Based Desktop Data Mining as Web Service
Authors: Sujala.D.Shetty, S.Vadivel, Sakshi Vaghella
Abstract:
Data mining is the process of sifting through large volumes of data, analyzing data from different perspectives and summarizing it into useful information. One of the widely used desktop applications for data mining is the Weka tool which is nothing but a collection of machine learning algorithms implemented in Java and open sourced under the General Public License (GPL). A web service is a software system designed to support interoperable machine to machine interaction over a network using SOAP messages. Unlike a desktop application, a web service is easy to upgrade, deliver and access and does not occupy any memory on the system. Keeping in mind the advantages of a web service over a desktop application, in this paper we are demonstrating how this Java based desktop data mining application can be implemented as a web service to support data mining across the internet.Keywords: desktop application, Weka mining, web service
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 40805933 Multidimensional Data Mining by Means of Randomly Travelling Hyper-Ellipsoids
Authors: Pavel Y. Tabakov, Kevin Duffy
Abstract:
The present study presents a new approach to automatic data clustering and classification problems in large and complex databases and, at the same time, derives specific types of explicit rules describing each cluster. The method works well in both sparse and dense multidimensional data spaces. The members of the data space can be of the same nature or represent different classes. A number of N-dimensional ellipsoids are used for enclosing the data clouds. Due to the geometry of an ellipsoid and its free rotation in space the detection of clusters becomes very efficient. The method is based on genetic algorithms that are used for the optimization of location, orientation and geometric characteristics of the hyper-ellipsoids. The proposed approach can serve as a basis for the development of general knowledge systems for discovering hidden knowledge and unexpected patterns and rules in various large databases.Keywords: Classification, clustering, data minig, genetic algorithms.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17715932 Optimal Algorithm for Constructing the Delaunay Triangulation in Ed
Authors: V. Tereshchenko, D. Taran
Abstract:
In this paper we propose a new approach to constructing the Delaunay Triangulation and the optimum algorithm for the case of multidimensional spaces (d ≥ 2). Analysing the modern state, it is possible to draw a conclusion, that the ideas for the existing effective algorithms developed for the case of d ≥ 2 are not simple to generalize on a multidimensional case, without the loss of efficiency. We offer for the solving this problem an effective algorithm that satisfies all the given requirements. But theoretical complexity of the problem it is impossible to improve as the Worst - Case Optimality for algorithms of solving such a problem is proved.
Keywords: Delaunay triangulation, multidimensional space, Voronoi Diagram, optimal algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19815931 Role of Association Rule Mining in Numerical Data Analysis
Authors: Sudhir Jagtap, Kodge B. G., Shinde G. N., Devshette P. M
Abstract:
Numerical analysis naturally finds applications in all fields of engineering and the physical sciences, but in the 21st century, the life sciences and even the arts have adopted elements of scientific computations. The numerical data analysis became key process in research and development of all the fields [6]. In this paper we have made an attempt to analyze the specified numerical patterns with reference to the association rule mining techniques with minimum confidence and minimum support mining criteria. The extracted rules and analyzed results are graphically demonstrated. Association rules are a simple but very useful form of data mining that describe the probabilistic co-occurrence of certain events within a database [7]. They were originally designed to analyze market-basket data, in which the likelihood of items being purchased together within the same transactions are analyzed.Keywords: Numerical data analysis, Data Mining, Association Rule Mining
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 28615930 Moving Data Mining Tools toward a Business Intelligence System
Authors: Nittaya Kerdprasop, Kittisak Kerdprasop
Abstract:
Data mining (DM) is the process of finding and extracting frequent patterns that can describe the data, or predict unknown or future values. These goals are achieved by using various learning algorithms. Each algorithm may produce a mining result completely different from the others. Some algorithms may find millions of patterns. It is thus the difficult job for data analysts to select appropriate models and interpret the discovered knowledge. In this paper, we describe a framework of an intelligent and complete data mining system called SUT-Miner. Our system is comprised of a full complement of major DM algorithms, pre-DM and post-DM functionalities. It is the post-DM packages that ease the DM deployment for business intelligence applications.Keywords: Business intelligence, data mining, functionalprogramming, intelligent system.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17415929 Concurrency in Web Access Patterns Mining
Authors: Jing Lu, Malcolm Keech, Weiru Chen
Abstract:
Web usage mining is an interesting application of data mining which provides insight into customer behaviour on the Internet. An important technique to discover user access and navigation trails is based on sequential patterns mining. One of the key challenges for web access patterns mining is tackling the problem of mining richly structured patterns. This paper proposes a novel model called Web Access Patterns Graph (WAP-Graph) to represent all of the access patterns from web mining graphically. WAP-Graph also motivates the search for new structural relation patterns, i.e. Concurrent Access Patterns (CAP), to identify and predict more complex web page requests. Corresponding CAP mining and modelling methods are proposed and shown to be effective in the search for and representation of concurrency between access patterns on the web. From experiments conducted on large-scale synthetic sequence data as well as real web access data, it is demonstrated that CAP mining provides a powerful method for structural knowledge discovery, which can be visualised through the CAP-Graph model.Keywords: concurrent access patterns (CAP), CAP mining and modelling, CAP-Graph, web access patterns (WAP), WAP-Graph, Web usage mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17255928 Incremental Mining of Shocking Association Patterns
Authors: Eiad Yafi, Ahmed Sultan Al-Hegami, M. A. Alam, Ranjit Biswas
Abstract:
Association rules are an important problem in data mining. Massively increasing volume of data in real life databases has motivated researchers to design novel and incremental algorithms for association rules mining. In this paper, we propose an incremental association rules mining algorithm that integrates shocking interestingness criterion during the process of building the model. A new interesting measure called shocking measure is introduced. One of the main features of the proposed approach is to capture the user background knowledge, which is monotonically augmented. The incremental model that reflects the changing data and the user beliefs is attractive in order to make the over all KDD process more effective and efficient. We implemented the proposed approach and experiment it with some public datasets and found the results quite promising.Keywords: Knowledge discovery in databases (KDD), Data mining, Incremental Association rules, Domain knowledge, Interestingness, Shocking rules (SHR).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1866