Search results for: mining methods
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 15423

Search results for: mining methods

15393 Application of Artificial Neural Network Technique for Diagnosing Asthma

Authors: Azadeh Bashiri

Abstract:

Introduction: Lack of proper diagnosis and inadequate treatment of asthma leads to physical and financial complications. This study aimed to use data mining techniques and creating a neural network intelligent system for diagnosis of asthma. Methods: The study population is the patients who had visited one of the Lung Clinics in Tehran. Data were analyzed using the SPSS statistical tool and the chi-square Pearson's coefficient was the basis of decision making for data ranking. The considered neural network is trained using back propagation learning technique. Results: According to the analysis performed by means of SPSS to select the top factors, 13 effective factors were selected, in different performances, data was mixed in various forms, so the different models were made for training the data and testing networks and in all different modes, the network was able to predict correctly 100% of all cases. Conclusion: Using data mining methods before the design structure of system, aimed to reduce the data dimension and the optimum choice of the data, will lead to a more accurate system. Therefore, considering the data mining approaches due to the nature of medical data is necessary.

Keywords: asthma, data mining, Artificial Neural Network, intelligent system

Procedia PDF Downloads 244
15392 Efficient Frequent Itemset Mining Methods over Real-Time Spatial Big Data

Authors: Hamdi Sana, Emna Bouazizi, Sami Faiz

Abstract:

In recent years, there is a huge increase in the use of spatio-temporal applications where data and queries are continuously moving. As a result, the need to process real-time spatio-temporal data seems clear and real-time stream data management becomes a hot topic. Sliding window model and frequent itemset mining over dynamic data are the most important problems in the context of data mining. Thus, sliding window model for frequent itemset mining is a widely used model for data stream mining due to its emphasis on recent data and its bounded memory requirement. These methods use the traditional transaction-based sliding window model where the window size is based on a fixed number of transactions. Actually, this model supposes that all transactions have a constant rate which is not suited for real-time applications. And the use of this model in such applications endangers their performance. Based on these observations, this paper relaxes the notion of window size and proposes the use of a timestamp-based sliding window model. In our proposed frequent itemset mining algorithm, support conditions are used to differentiate frequents and infrequent patterns. Thereafter, a tree is developed to incrementally maintain the essential information. We evaluate our contribution. The preliminary results are quite promising.

Keywords: real-time spatial big data, frequent itemset, transaction-based sliding window model, timestamp-based sliding window model, weighted frequent patterns, tree, stream query

Procedia PDF Downloads 130
15391 Spatio-Temporal Data Mining with Association Rules for Lake Van

Authors: Tolga Aydin, M. Fatih Alaeddinoğlu

Abstract:

People, throughout the history, have made estimates and inferences about the future by using their past experiences. Developing information technologies and the improvements in the database management systems make it possible to extract useful information from knowledge in hand for the strategic decisions. Therefore, different methods have been developed. Data mining by association rules learning is one of such methods. Apriori algorithm, one of the well-known association rules learning algorithms, is not commonly used in spatio-temporal data sets. However, it is possible to embed time and space features into the data sets and make Apriori algorithm a suitable data mining technique for learning spatio-temporal association rules. Lake Van, the largest lake of Turkey, is a closed basin. This feature causes the volume of the lake to increase or decrease as a result of change in water amount it holds. In this study, evaporation, humidity, lake altitude, amount of rainfall and temperature parameters recorded in Lake Van region throughout the years are used by the Apriori algorithm and a spatio-temporal data mining application is developed to identify overflows and newly-formed soil regions (underflows) occurring in the coastal parts of Lake Van. Identifying possible reasons of overflows and underflows may be used to alert the experts to take precautions and make the necessary investments.

Keywords: apriori algorithm, association rules, data mining, spatio-temporal data

Procedia PDF Downloads 343
15390 Object-Centric Process Mining Using Process Cubes

Authors: Anahita Farhang Ghahfarokhi, Alessandro Berti, Wil M.P. van der Aalst

Abstract:

Process mining provides ways to analyze business processes. Common process mining techniques consider the process as a whole. However, in real-life business processes, different behaviors exist that make the overall process too complex to interpret. Process comparison is a branch of process mining that isolates different behaviors of the process from each other by using process cubes. Process cubes organize event data using different dimensions. Each cell contains a set of events that can be used as an input to apply process mining techniques. Existing work on process cubes assume single case notions. However, in real processes, several case notions (e.g., order, item, package, etc.) are intertwined. Object-centric process mining is a new branch of process mining addressing multiple case notions in a process. To make a bridge between object-centric process mining and process comparison, we propose a process cube framework, which supports process cube operations such as slice and dice on object-centric event logs. To facilitate the comparison, the framework is integrated with several object-centric process discovery approaches.

Keywords: multidimensional process mining, mMulti-perspective business processes, OLAP, process cubes, process discovery, process mining

Procedia PDF Downloads 221
15389 Algorithms used in Spatial Data Mining GIS

Authors: Vahid Bairami Rad

Abstract:

Extracting knowledge from spatial data like GIS data is important to reduce the data and extract information. Therefore, the development of new techniques and tools that support the human in transforming data into useful knowledge has been the focus of the relatively new and interdisciplinary research area ‘knowledge discovery in databases’. Thus, we introduce a set of database primitives or basic operations for spatial data mining which are sufficient to express most of the spatial data mining algorithms from the literature. This approach has several advantages. Similar to the relational standard language SQL, the use of standard primitives will speed-up the development of new data mining algorithms and will also make them more portable. We introduced a database-oriented framework for spatial data mining which is based on the concepts of neighborhood graphs and paths. A small set of basic operations on these graphs and paths were defined as database primitives for spatial data mining. Furthermore, techniques to efficiently support the database primitives by a commercial DBMS were presented.

Keywords: spatial data base, knowledge discovery database, data mining, spatial relationship, predictive data mining

Procedia PDF Downloads 425
15388 Concept Drifts Detection and Localisation in Process Mining

Authors: M. V. Manoj Kumar, Likewin Thomas, Annappa

Abstract:

Process mining provides methods and techniques for analyzing event logs recorded in modern information systems that support real-world operations. While analyzing an event-log, state-of-the-art techniques available in process mining believe that the operational process as a static entity (stationary). This is not often the case due to the possibility of occurrence of a phenomenon called concept drift. During the period of execution, the process can experience concept drift and can evolve with respect to any of its associated perspectives exhibiting various patterns-of-change with a different pace. Work presented in this paper discusses the main aspects to consider while addressing concept drift phenomenon and proposes a method for detecting and localizing the sudden concept drifts in control-flow perspective of the process by using features extracted by processing the traces in the process log. Our experimental results are promising in the direction of efficiently detecting and localizing concept drift in the context of process mining research discipline.

Keywords: abrupt drift, concept drift, sudden drift, control-flow perspective, detection and localization, process mining

Procedia PDF Downloads 314
15387 Business-Intelligence Mining of Large Decentralized Multimedia Datasets with a Distributed Multi-Agent System

Authors: Karima Qayumi, Alex Norta

Abstract:

The rapid generation of high volume and a broad variety of data from the application of new technologies pose challenges for the generation of business-intelligence. Most organizations and business owners need to extract data from multiple sources and apply analytical methods for the purposes of developing their business. Therefore, the recently decentralized data management environment is relying on a distributed computing paradigm. While data are stored in highly distributed systems, the implementation of distributed data-mining techniques is a challenge. The aim of this technique is to gather knowledge from every domain and all the datasets stemming from distributed resources. As agent technologies offer significant contributions for managing the complexity of distributed systems, we consider this for next-generation data-mining processes. To demonstrate agent-based business intelligence operations, we use agent-oriented modeling techniques to develop a new artifact for mining massive datasets.

Keywords: agent-oriented modeling (AOM), business intelligence model (BIM), distributed data mining (DDM), multi-agent system (MAS)

Procedia PDF Downloads 391
15386 Detecting Venomous Files in IDS Using an Approach Based on Data Mining Algorithm

Authors: Sukhleen Kaur

Abstract:

In security groundwork, Intrusion Detection System (IDS) has become an important component. The IDS has received increasing attention in recent years. IDS is one of the effective way to detect different kinds of attacks and malicious codes in a network and help us to secure the network. Data mining techniques can be implemented to IDS, which analyses the large amount of data and gives better results. Data mining can contribute to improving intrusion detection by adding a level of focus to anomaly detection. So far the study has been carried out on finding the attacks but this paper detects the malicious files. Some intruders do not attack directly, but they hide some harmful code inside the files or may corrupt those file and attack the system. These files are detected according to some defined parameters which will form two lists of files as normal files and harmful files. After that data mining will be performed. In this paper a hybrid classifier has been used via Naive Bayes and Ripper classification methods. The results show how the uploaded file in the database will be tested against the parameters and then it is characterised as either normal or harmful file and after that the mining is performed. Moreover, when a user tries to mine on harmful file it will generate an exception that mining cannot be made on corrupted or harmful files.

Keywords: data mining, association, classification, clustering, decision tree, intrusion detection system, misuse detection, anomaly detection, naive Bayes, ripper

Procedia PDF Downloads 390
15385 Data Mining Practices: Practical Studies on the Telecommunication Companies in Jordan

Authors: Dina Ahmad Alkhodary

Abstract:

This study aimed to investigate the practices of Data Mining on the telecommunication companies in Jordan, from the viewpoint of the respondents. In order to achieve the goal of the study, and test the validity of hypotheses, the researcher has designed a questionnaire to collect data from managers and staff members from main department in the researched companies. The results shows improvements stages of the telecommunications companies towered Data Mining.

Keywords: data, mining, development, business

Procedia PDF Downloads 465
15384 Frequent Itemset Mining Using Rough-Sets

Authors: Usman Qamar, Younus Javed

Abstract:

Frequent pattern mining is the process of finding a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set. It was proposed in the context of frequent itemsets and association rule mining. Frequent pattern mining is used to find inherent regularities in data. What products were often purchased together? Its applications include basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis. However, one of the bottlenecks of frequent itemset mining is that as the data increase the amount of time and resources required to mining the data increases at an exponential rate. In this investigation a new algorithm is proposed which can be uses as a pre-processor for frequent itemset mining. FASTER (FeAture SelecTion using Entropy and Rough sets) is a hybrid pre-processor algorithm which utilizes entropy and rough-sets to carry out record reduction and feature (attribute) selection respectively. FASTER for frequent itemset mining can produce a speed up of 3.1 times when compared to original algorithm while maintaining an accuracy of 71%.

Keywords: rough-sets, classification, feature selection, entropy, outliers, frequent itemset mining

Procedia PDF Downloads 406
15383 A Modular Framework for Enabling Analysis for Educators with Different Levels of Data Mining Skills

Authors: Kyle De Freitas, Margaret Bernard

Abstract:

Enabling data mining analysis among a wider audience of educators is an active area of research within the educational data mining (EDM) community. The paper proposes a framework for developing an environment that caters for educators who have little technical data mining skills as well as for more advanced users with some data mining expertise. This framework architecture was developed through the review of the strengths and weaknesses of existing models in the literature. The proposed framework provides a modular architecture for future researchers to focus on the development of specific areas within the EDM process. Finally, the paper also highlights a strategy of enabling analysis through either the use of predefined questions or a guided data mining process and highlights how the developed questions and analysis conducted can be reused and extended over time.

Keywords: educational data mining, learning management system, learning analytics, EDM framework

Procedia PDF Downloads 295
15382 Data Mining Spatial: Unsupervised Classification of Geographic Data

Authors: Chahrazed Zouaoui

Abstract:

In recent years, the volume of geospatial information is increasing due to the evolution of communication technologies and information, this information is presented often by geographic information systems (GIS) and stored on of spatial databases (BDS). The classical data mining revealed a weakness in knowledge extraction at these enormous amounts of data due to the particularity of these spatial entities, which are characterized by the interdependence between them (1st law of geography). This gave rise to spatial data mining. Spatial data mining is a process of analyzing geographic data, which allows the extraction of knowledge and spatial relationships from geospatial data, including methods of this process we distinguish the monothematic and thematic, geo- Clustering is one of the main tasks of spatial data mining, which is registered in the part of the monothematic method. It includes geo-spatial entities similar in the same class and it affects more dissimilar to the different classes. In other words, maximize intra-class similarity and minimize inter similarity classes. Taking account of the particularity of geo-spatial data. Two approaches to geo-clustering exist, the dynamic processing of data involves applying algorithms designed for the direct treatment of spatial data, and the approach based on the spatial data pre-processing, which consists of applying clustering algorithms classic pre-processed data (by integration of spatial relationships). This approach (based on pre-treatment) is quite complex in different cases, so the search for approximate solutions involves the use of approximation algorithms, including the algorithms we are interested in dedicated approaches (clustering methods for partitioning and methods for density) and approaching bees (biomimetic approach), our study is proposed to design very significant to this problem, using different algorithms for automatically detecting geo-spatial neighborhood in order to implement the method of geo- clustering by pre-treatment, and the application of the bees algorithm to this problem for the first time in the field of geo-spatial.

Keywords: mining, GIS, geo-clustering, neighborhood

Procedia PDF Downloads 357
15381 Assessment of Prevalent Diseases Caused by Mining Activities in the Northern Part of Mindanao Island, Philippines

Authors: Odinah Cuartero-Enteria, Kyla Rita Mercado, Jason Salamanes, Aian Pecasales, Sherwin Sabado

Abstract:

The northern part of Mindanao Island, Philippines has sizable reserve of mineral resources. Years ago, mining activities have been flourishing which resulted to both local economic gain but with environmental concerns. This study investigates the prevalent diseases by mining activities in these areas. The study was done using the secondary data gathered from the Rural Health Units (RHU) of the selected areas. The study further determined the prevalent diseases that existed in the three areas from years 2005, 2010 and 2015 indicating before the mining activities and when mining activities are present. The results show that areas which are far from mining activities have fewer cases of patients suffering from air-borne diseases. The top ten most common diseases such as pneumonia, tuberculosis, influenza, upper respiratory tract infection (URTI) and skin diseases were caused by air-borne due to air pollution. Hence, the places where mining activities are present contribute to the prevalent diseases. Thus, addressing the air pollution caused by mining activities is very important.

Keywords: Philippines, Mindanao Island, mining activities, pollution, prevalent diseases

Procedia PDF Downloads 437
15380 Cloud Computing in Data Mining: A Technical Survey

Authors: Ghaemi Reza, Abdollahi Hamid, Dashti Elham

Abstract:

Cloud computing poses a diversity of challenges in data mining operation arising out of the dynamic structure of data distribution as against the use of typical database scenarios in conventional architecture. Due to immense number of users seeking data on daily basis, there is a serious security concerns to cloud providers as well as data providers who put their data on the cloud computing environment. Big data analytics use compute intensive data mining algorithms (Hidden markov, MapReduce parallel programming, Mahot Project, Hadoop distributed file system, K-Means and KMediod, Apriori) that require efficient high performance processors to produce timely results. Data mining algorithms to solve or optimize the model parameters. The challenges that operation has to encounter is the successful transactions to be established with the existing virtual machine environment and the databases to be kept under the control. Several factors have led to the distributed data mining from normal or centralized mining. The approach is as a SaaS which uses multi-agent systems for implementing the different tasks of system. There are still some problems of data mining based on cloud computing, including design and selection of data mining algorithms.

Keywords: cloud computing, data mining, computing models, cloud services

Procedia PDF Downloads 448
15379 Research of the Three-Dimensional Visualization Geological Modeling of Mine Based on Surpac

Authors: Honggang Qu, Yong Xu, Rongmei Liu, Zhenji Gao, Bin Wang

Abstract:

Today's mining industry is advancing gradually toward digital and visual direction. The three-dimensional visualization geological modeling of mine is the digital characterization of mineral deposits and is one of the key technology of digital mining. Three-dimensional geological modeling is a technology that combines geological spatial information management, geological interpretation, geological spatial analysis and prediction, geostatistical analysis, entity content analysis and graphic visualization in a three-dimensional environment with computer technology and is used in geological analysis. In this paper, the three-dimensional geological modeling of an iron mine through the use of Surpac is constructed, and the weight difference of the estimation methods between the distance power inverse ratio method and ordinary kriging is studied, and the ore body volume and reserves are simulated and calculated by using these two methods. Compared with the actual mine reserves, its result is relatively accurate, so it provides scientific bases for mine resource assessment, reserve calculation, mining design and so on.

Keywords: three-dimensional geological modeling, geological database, geostatistics, block model

Procedia PDF Downloads 47
15378 An Experimental Study for Assessing Email Classification Attributes Using Feature Selection Methods

Authors: Issa Qabaja, Fadi Thabtah

Abstract:

Email phishing classification is one of the vital problems in the online security research domain that have attracted several scholars due to its impact on the users payments performed daily online. One aspect to reach a good performance by the detection algorithms in the email phishing problem is to identify the minimal set of features that significantly have an impact on raising the phishing detection rate. This paper investigate three known feature selection methods named Information Gain (IG), Chi-square and Correlation Features Set (CFS) on the email phishing problem to separate high influential features from low influential ones in phishing detection. We measure the degree of influentially by applying four data mining algorithms on a large set of features. We compare the accuracy of these algorithms on the complete features set before feature selection has been applied and after feature selection has been applied. After conducting experiments, the results show 12 common significant features have been chosen among the considered features by the feature selection methods. Further, the average detection accuracy derived by the data mining algorithms on the reduced 12-features set was very slight affected when compared with the one derived from the 47-features set.

Keywords: data mining, email classification, phishing, online security

Procedia PDF Downloads 402
15377 Mining Diagnostic Investigation Process

Authors: Sohail Imran, Tariq Mahmood

Abstract:

In complex healthcare diagnostic investigation process, medical practitioners have to focus on ways to standardize their processes to perform high quality care and optimize the time and costs. Process mining techniques can be applied to extract process related knowledge from data without considering causal and dynamic dependencies in business domain and processes. The application of process mining is effective in diagnostic investigation. It is very helpful where a treatment gives no dispositive evidence favoring it. In this paper, we applied process mining to discover important process flow of diagnostic investigation for hepatitis patients. This approach has some benefits which can enhance the quality and efficiency of diagnostic investigation processes.

Keywords: process mining, healthcare, diagnostic investigation process, process flow

Procedia PDF Downloads 490
15376 Analysis of Reliability of Mining Shovel Using Weibull Model

Authors: Anurag Savarnya

Abstract:

The reliability of the various parts of electric mining shovel has been assessed through the application of Weibull Model. The study was initiated to find reliability of components of electric mining shovel. The paper aims to optimize the reliability of components and increase the life cycle of component. A multilevel decomposition of the electric mining shovel was done and maintenance records were used to evaluate the failure data and appropriate system characterization was done to model the system in terms of reasonable number of components. The approach used develops a mathematical model to assess the reliability of the electric mining shovel components. The model can be used to predict reliability of components of the hydraulic mining shovel and system performance. Reliability is an inherent attribute to a system. When the life-cycle costs of a system are being analyzed, reliability plays an important role as a major driver of these costs and has considerable influence on system performance. It is an iterative process that begins with specification of reliability goals consistent with cost and performance objectives. The data were collected from an Indian open cast coal mine and the reliability of various components of the electric mining shovel has been assessed by following a Weibull Model.

Keywords: reliability, Weibull model, electric mining shovel

Procedia PDF Downloads 469
15375 Exploring the Role of Data Mining in Crime Classification: A Systematic Literature Review

Authors: Faisal Muhibuddin, Ani Dijah Rahajoe

Abstract:

This in-depth exploration, through a systematic literature review, scrutinizes the nuanced role of data mining in the classification of criminal activities. The research focuses on investigating various methodological aspects and recent developments in leveraging data mining techniques to enhance the effectiveness and precision of crime categorization. Commencing with an exposition of the foundational concepts of crime classification and its evolutionary dynamics, this study details the paradigm shift from conventional methods towards approaches supported by data mining, addressing the challenges and complexities inherent in the modern crime landscape. Specifically, the research delves into various data mining techniques, including K-means clustering, Naïve Bayes, K-nearest neighbour, and clustering methods. A comprehensive review of the strengths and limitations of each technique provides insights into their respective contributions to improving crime classification models. The integration of diverse data sources takes centre stage in this research. A detailed analysis explores how the amalgamation of structured data (such as criminal records) and unstructured data (such as social media) can offer a holistic understanding of crime, enriching classification models with more profound insights. Furthermore, the study explores the temporal implications in crime classification, emphasizing the significance of considering temporal factors to comprehend long-term trends and seasonality. The availability of real-time data is also elucidated as a crucial element in enhancing responsiveness and accuracy in crime classification.

Keywords: data mining, classification algorithm, naïve bayes, k-means clustering, k-nearest neigbhor, crime, data analysis, sistematic literature review

Procedia PDF Downloads 30
15374 An Adaptive Distributed Incremental Association Rule Mining System

Authors: Adewale O. Ogunde, Olusegun Folorunso, Adesina S. Sodiya

Abstract:

Most existing Distributed Association Rule Mining (DARM) systems are still facing several challenges. One of such challenges that have not received the attention of many researchers is the inability of existing systems to adapt to constantly changing databases and mining environments. In this work, an Adaptive Incremental Mining Algorithm (AIMA) is therefore proposed to address these problems. AIMA employed multiple mobile agents for the entire mining process. AIMA was designed to adapt to changes in the distributed databases by mining only the incremental database updates and using this to update the existing rules in order to improve the overall response time of the DARM system. In AIMA, global association rules were integrated incrementally from one data site to another through Results Integration Coordinating Agents. The mining agents in AIMA were made adaptive by defining mining goals with reasoning and behavioral capabilities and protocols that enabled them to either maintain or change their goals. AIMA employed Java Agent Development Environment Extension for designing the internal agents’ architecture. Results from experiments conducted on real datasets showed that the adaptive system, AIMA performed better than the non-adaptive systems with lower communication costs and higher task completion rates.

Keywords: adaptivity, data mining, distributed association rule mining, incremental mining, mobile agents

Procedia PDF Downloads 367
15373 Treatment of Cyanide Effluents with Platinum Impregned on Mg-Al Layered Hydroxides

Authors: María R. Contreras, Diana Endara

Abstract:

Cyanide leaching is the most used technology for gold mining industry, which produces large amounts of effluents requiring treatment. In Ecuador the development of gold mining industry has increased, causing significant environmental impacts due to the highly use of cyanide, it is estimated that 10 gr of extracted gold generates 7000 liters of water contaminated with 300mg/L of free cyanide. The most common methods used nowadays are the treatment with peroxodisulfuric acid, ozonation, H₂O₂ and other reactants which are expensive and present disadvantages. Several methods have been developed to treat this contaminant such as heterogeneous catalysts. Layered double hydroxides (LDHs) have received much attention due to their wide applications like a catalysis support. Therefore, in this study, Mg-Al/ LDH was synthetized by coprecipitation method and then platinum was impregned on it, in order to enhance its catalytic activity. Two methods of impregnation were used, the first one, called incipient wet impregnation and the second one was developed by continuous agitation of LDH in contact with chloroplatinic acid solution for 24 h. The support impregnated was analyzed by X-ray diffraction, FTIR and SEM. Finally, the oxidation of cyanide ion was performed by preparing synthetic solutions of sodium cyanide (NaCN) with an initial concentration of 500 mg/L at pH 10,5 and air flow of 180 NL/h. After 8 hours of treatment, an 80% of oxidation of ion cyanide was achieved.

Keywords: catalysis, cyanide, LDHs, mining

Procedia PDF Downloads 118
15372 Data Stream Association Rule Mining with Cloud Computing

Authors: B. Suraj Aravind, M. H. M. Krishna Prasad

Abstract:

There exist emerging applications of data streams that require association rule mining, such as network traffic monitoring, web click streams analysis, sensor data, data from satellites etc. Data streams typically arrive continuously in high speed with huge amount and changing data distribution. This raises new issues that need to be considered when developing association rule mining techniques for stream data. This paper proposes to introduce an improved data stream association rule mining algorithm by eliminating the limitation of resources. For this, the concept of cloud computing is used. Inclusion of this may lead to additional unknown problems which needs further research.

Keywords: data stream, association rule mining, cloud computing, frequent itemsets

Procedia PDF Downloads 469
15371 Data Mining As A Tool For Knowledge Management: A Review

Authors: Maram Saleh

Abstract:

Knowledge has become an essential resource in today’s economy and become the most important asset of maintaining competition advantage in organizations. The importance of knowledge has made organizations to manage their knowledge assets and resources through all multiple knowledge management stages such as: Knowledge Creation, knowledge storage, knowledge sharing and knowledge use. Researches on data mining are continues growing over recent years on both business and educational fields. Data mining is one of the most important steps of the knowledge discovery in databases process aiming to extract implicit, unknown but useful knowledge and it is considered as significant subfield in knowledge management. Data miming have the great potential to help organizations to focus on extracting the most important information on their data warehouses. Data mining tools and techniques can predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. This review paper explores the applications of data mining techniques in supporting knowledge management process as an effective knowledge discovery technique. In this paper, we identify the relationship between data mining and knowledge management, and then focus on introducing some application of date mining techniques in knowledge management for some real life domains.

Keywords: Data Mining, Knowledge management, Knowledge discovery, Knowledge creation.

Procedia PDF Downloads 179
15370 Indexing and Incremental Approach Using Map Reduce Bipartite Graph (MRBG) for Mining Evolving Big Data

Authors: Adarsh Shroff

Abstract:

Big data is a collection of dataset so large and complex that it becomes difficult to process using data base management tools. To perform operations like search, analysis, visualization on big data by using data mining; which is the process of extraction of patterns or knowledge from large data set. In recent years, the data mining applications become stale and obsolete over time. Incremental processing is a promising approach to refreshing mining results. It utilizes previously saved states to avoid the expense of re-computation from scratch. This project uses i2MapReduce, an incremental processing extension to Map Reduce, the most widely used framework for mining big data. I2MapReduce performs key-value pair level incremental processing rather than task level re-computation, supports not only one-step computation but also more sophisticated iterative computation, which is widely used in data mining applications, and incorporates a set of novel techniques to reduce I/O overhead for accessing preserved fine-grain computation states. To optimize the mining results, evaluate i2MapReduce using a one-step algorithm and three iterative algorithms with diverse computation characteristics for efficient mining.

Keywords: big data, map reduce, incremental processing, iterative computation

Procedia PDF Downloads 315
15369 Comparative Analysis of Classification Methods in Determining Non-Active Student Characteristics in Indonesia Open University

Authors: Dewi Juliah Ratnaningsih, Imas Sukaesih Sitanggang

Abstract:

Classification is one of data mining techniques that aims to discover a model from training data that distinguishes records into the appropriate category or class. Data mining classification methods can be applied in education, for example, to determine the classification of non-active students in Indonesia Open University. This paper presents a comparison of three methods of classification: Naïve Bayes, Bagging, and C.45. The criteria used to evaluate the performance of three methods of classification are stratified cross-validation, confusion matrix, the value of the area under the ROC Curve (AUC), Recall, Precision, and F-measure. The data used for this paper are from the non-active Indonesia Open University students in registration period of 2004.1 to 2012.2. Target analysis requires that non-active students were divided into 3 groups: C1, C2, and C3. Data analyzed are as many as 4173 students. Results of the study show: (1) Bagging method gave a high degree of classification accuracy than Naïve Bayes and C.45, (2) the Bagging classification accuracy rate is 82.99 %, while the Naïve Bayes and C.45 are 80.04 % and 82.74 % respectively, (3) the result of Bagging classification tree method has a large number of nodes, so it is quite difficult in decision making, (4) classification of non-active Indonesia Open University student characteristics uses algorithms C.45, (5) based on the algorithm C.45, there are 5 interesting rules which can describe the characteristics of non-active Indonesia Open University students.

Keywords: comparative analysis, data mining, clasiffication, Bagging, Naïve Bayes, C.45, non-active students, Indonesia Open University

Procedia PDF Downloads 288
15368 Review and Comparison of Associative Classification Data Mining Approaches

Authors: Suzan Wedyan

Abstract:

Data mining is one of the main phases in the Knowledge Discovery Database (KDD) which is responsible of finding hidden and useful knowledge from databases. There are many different tasks for data mining including regression, pattern recognition, clustering, classification, and association rule. In recent years a promising data mining approach called associative classification (AC) has been proposed, AC integrates classification and association rule discovery to build classification models (classifiers). This paper surveys and critically compares several AC algorithms with reference of the different procedures are used in each algorithm, such as rule learning, rule sorting, rule pruning, classifier building, and class allocation for test cases.

Keywords: associative classification, classification, data mining, learning, rule ranking, rule pruning, prediction

Procedia PDF Downloads 507
15367 Data Mining Techniques for Anti-Money Laundering

Authors: M. Sai Veerendra

Abstract:

Today, money laundering (ML) poses a serious threat not only to financial institutions but also to the nation. This criminal activity is becoming more and more sophisticated and seems to have moved from the cliché of drug trafficking to financing terrorism and surely not forgetting personal gain. Most of the financial institutions internationally have been implementing anti-money laundering solutions (AML) to fight investment fraud activities. However, traditional investigative techniques consume numerous man-hours. Recently, data mining approaches have been developed and are considered as well-suited techniques for detecting ML activities. Within the scope of a collaboration project on developing a new data mining solution for AML Units in an international investment bank in Ireland, we survey recent data mining approaches for AML. In this paper, we present not only these approaches but also give an overview on the important factors in building data mining solutions for AML activities.

Keywords: data mining, clustering, money laundering, anti-money laundering solutions

Procedia PDF Downloads 511
15366 Mining in Nigeria and Development Effort of Metallurgical Technologies at National Metallurgical Development Center Jos, Plateau State-Nigeria

Authors: Linus O. Asuquo

Abstract:

Mining in Nigeria and development effort of metallurgical technologies at National Metallurgical Development Centre Jos has been addressed in this paper. The paper has looked at the history of mining in Nigeria, the impact of mining on social and industrial development, and the contribution of the mining sector to Nigeria’s Gross Domestic Product (GDP). The paper clearly stated that Nigeria’s mining sector only contributes 0.5% to the nation’s GDP unlike Botswana that the mining sector contributes 38% to the nation’s GDP. Nigeria Bureau of Statistics has it on record that Nigeria has about 44 solid minerals awaiting to be exploited. Clearly highlighted by this paper is the abundant potentials that exist in the mining sector for investment. The paper made an exposition on the extensive efforts made at National Metallurgical Development Center (NMDC) to develop metallurgical technologies in various areas of the metals sector; like mineral processing, foundry development, nonferrous metals extraction, materials testing, lime calcination, ANO (Trade name for powder lubricant) wire drawing lubricant, refractories and many others. The paper went ahead to draw a conclusion that there is a need to develop the mining sector in Nigeria and to give a sustainable support to the efforts currently made at NMDC to develop metallurgical technologies which are capable of transforming the metals sector in Nigeria, which will lead to industrialization. Finally the paper made some recommendations which traverse the topic for the best expectation.

Keywords: mining, minerals, technologies, value addition

Procedia PDF Downloads 67
15365 Association Rules Mining and NOSQL Oriented Document in Big Data

Authors: Sarra Senhadji, Imene Benzeguimi, Zohra Yagoub

Abstract:

Big Data represents the recent technology of manipulating voluminous and unstructured data sets over multiple sources. Therefore, NOSQL appears to handle the problem of unstructured data. Association rules mining is one of the popular techniques of data mining to extract hidden relationship from transactional databases. The algorithm for finding association dependencies is well-solved with Map Reduce. The goal of our work is to reduce the time of generating of frequent itemsets by using Map Reduce and NOSQL database oriented document. A comparative study is given to evaluate the performances of our algorithm with the classical algorithm Apriori.

Keywords: Apriori, Association rules mining, Big Data, Data Mining, Hadoop, MapReduce, MongoDB, NoSQL

Procedia PDF Downloads 132
15364 The Significance of Picture Mining in the Fashion and Design as a New Research Method

Authors: Katsue Edo, Yu Hiroi

Abstract:

T Increasing attention has been paid to using pictures and photographs in research since the beginning of the 21th century in social sciences. Meanwhile we have been studying the usefulness of Picture mining, which is one of the new ways for a these picture using researches. Picture Mining is an explorative research analysis method that takes useful information from pictures, photographs and static or moving images. It is often compared with the methods of text mining. The Picture Mining concept includes observational research in the broad sense, because it also aims to analyze moving images (Ochihara and Edo 2013). In the recent literature, studies and reports using pictures are increasing due to the environmental changes. These are identified as technological and social changes (Edo et.al. 2013). Low price digital cameras and i-phones, high information transmission speed, low costs for information transferring and high performance and resolution of the cameras of mobile phones have changed the photographing behavior of people. Consequently, there is less resistance in taking and processing photographs for most of the people in the developing countries. In these studies, this method of collecting data from respondents is often called as ‘participant-generated photography’ or ‘respondent-generated visual imagery’, which focuses on the collection of data and its analysis (Pauwels 2011, Snyder 2012). But there are few systematical and conceptual studies that supports it significance of these methods. We have discussed in the recent years to conceptualize these picture using research methods and formalize theoretical findings (Edo et. al. 2014). We have identified the most efficient fields of Picture mining in the following areas inductively and in case studies; 1) Research in Consumer and Customer Lifestyles. 2) New Product Development. 3) Research in Fashion and Design. Though we have found that it will be useful in these fields and areas, we must verify these assumptions. In this study we will focus on the field of fashion and design, to determine whether picture mining methods are really reliable in this area. In order to do so we have conducted an empirical research of the respondents’ attitudes and behavior concerning pictures and photographs. We compared the attitudes and behavior of pictures toward fashion to meals, and found out that taking pictures of fashion is not as easy as taking meals and food. Respondents do not often take pictures of fashion and upload their pictures online, such as Facebook and Instagram, compared to meals and food because of the difficulty of taking them. We concluded that we should be more careful in analyzing pictures in the fashion area for there still might be some kind of bias existing even if the environment of pictures have drastically changed in these years.

Keywords: empirical research, fashion and design, Picture Mining, qualitative research

Procedia PDF Downloads 336