Search results for: data mining challenges
29102 A Data Mining Approach for Analysing and Predicting the Bank's Asset Liability Management Based on Basel III Norms
Authors: Nidhin Dani Abraham, T. K. Sri Shilpa
Abstract:
Asset liability management is an important aspect in banking business. Moreover, the today’s banking is based on BASEL III which strictly regulates on the counterparty default. This paper focuses on prediction and analysis of counter party default risk, which is a type of risk occurs when the customers fail to repay the amount back to the lender (bank or any financial institutions). This paper proposes an approach to reduce the counterparty risk occurring in the financial institutions using an appropriate data mining technique and thus predicts the occurrence of NPA. It also helps in asset building and restructuring quality. Liability management is very important to carry out banking business. To know and analyze the depth of liability of bank, a suitable technique is required. For that a data mining technique is being used to predict the dormant behaviour of various deposit bank customers. Various models are implemented and the results are analyzed of saving bank deposit customers. All these data are cleaned using data cleansing approach from the bank data warehouse.Keywords: data mining, asset liability management, BASEL III, banking
Procedia PDF Downloads 55229101 Hierarchical Clustering Algorithms in Data Mining
Authors: Z. Abdullah, A. R. Hamdan
Abstract:
Clustering is a process of grouping objects and data into groups of clusters to ensure that data objects from the same cluster are identical to each other. Clustering algorithms in one of the areas in data mining and it can be classified into partition, hierarchical, density based, and grid-based. Therefore, in this paper, we do a survey and review for four major hierarchical clustering algorithms called CURE, ROCK, CHAMELEON, and BIRCH. The obtained state of the art of these algorithms will help in eliminating the current problems, as well as deriving more robust and scalable algorithms for clustering.Keywords: clustering, unsupervised learning, algorithms, hierarchical
Procedia PDF Downloads 88529100 Data Mining Algorithms Analysis: Case Study of Price Predictions of Lands
Authors: Julio Albuja, David Zaldumbide
Abstract:
Data analysis is an important step before taking a decision about money. The aim of this work is to analyze the factors that influence the final price of the houses through data mining algorithms. To our best knowledge, previous work was researched just to compare results. Furthermore, before using the data of the data set, the Z-Transformation were used to standardize the data in the same range. Hence, the data was classified into two groups to visualize them in a readability format. A decision tree was built, and graphical data is displayed where clearly is easy to see the results and the factors' influence in these graphics. The definitions of these methods are described, as well as the descriptions of the results. Finally, conclusions and recommendations are presented related to the released results that our research showed making it easier to apply these algorithms using a customized data set.Keywords: algorithms, data, decision tree, transformation
Procedia PDF Downloads 37429099 Hybrid Knowledge Approach for Determining Health Care Provider Specialty from Patient Diagnoses
Authors: Erin Lynne Plettenberg, Jeremy Vickery
Abstract:
In an access-control situation, the role of a user determines whether a data request is appropriate. This paper combines vetted web mining and logic modeling to build a lightweight system for determining the role of a health care provider based only on their prior authorized requests. The model identifies provider roles with 100% recall from very little data. This shows the value of vetted web mining in AI systems, and suggests the impact of the ICD classification on medical practice.Keywords: electronic medical records, information extraction, logic modeling, ontology, vetted web mining
Procedia PDF Downloads 17229098 An Improved Parallel Algorithm of Decision Tree
Authors: Jiameng Wang, Yunfei Yin, Xiyu Deng
Abstract:
Parallel optimization is one of the important research topics of data mining at this stage. Taking Classification and Regression Tree (CART) parallelization as an example, this paper proposes a parallel data mining algorithm based on SSP-OGini-PCCP. Aiming at the problem of choosing the best CART segmentation point, this paper designs an S-SP model without data association; and in order to calculate the Gini index efficiently, a parallel OGini calculation method is designed. In addition, in order to improve the efficiency of the pruning algorithm, a synchronous PCCP pruning strategy is proposed in this paper. In this paper, the optimal segmentation calculation, Gini index calculation, and pruning algorithm are studied in depth. These are important components of parallel data mining. By constructing a distributed cluster simulation system based on SPARK, data mining methods based on SSP-OGini-PCCP are tested. Experimental results show that this method can increase the search efficiency of the best segmentation point by an average of 89%, increase the search efficiency of the Gini segmentation index by 3853%, and increase the pruning efficiency by 146% on average; and as the size of the data set increases, the performance of the algorithm remains stable, which meets the requirements of contemporary massive data processing.Keywords: classification, Gini index, parallel data mining, pruning ahead
Procedia PDF Downloads 12329097 Design and Development of Data Mining Application for Medical Centers in Remote Areas
Authors: Grace Omowunmi Soyebi
Abstract:
Data Mining is the extraction of information from a large database which helps in predicting a trend or behavior, thereby helping management make knowledge-driven decisions. One principal problem of most hospitals in rural areas is making use of the file management system for keeping records. A lot of time is wasted when a patient visits the hospital, probably in an emergency, and the nurse or attendant has to search through voluminous files before the patient's file can be retrieved; this may cause an unexpected to happen to the patient. This Data Mining application is to be designed using a Structured System Analysis and design method, which will help in a well-articulated analysis of the existing file management system, feasibility study, and proper documentation of the Design and Implementation of a Computerized medical record system. This Computerized system will replace the file management system and help to easily retrieve a patient's record with increased data security, access clinical records for decision-making, and reduce the time range at which a patient gets attended to.Keywords: data mining, medical record system, systems programming, computing
Procedia PDF Downloads 20929096 A Hybrid Data Mining Algorithm Based System for Intelligent Defence Mission Readiness and Maintenance Scheduling
Authors: Shivam Dwivedi, Sumit Prakash Gupta, Durga Toshniwal
Abstract:
It is a challenging task in today’s date to keep defence forces in the highest state of combat readiness with budgetary constraints. A huge amount of time and money is squandered in the unnecessary and expensive traditional maintenance activities. To overcome this limitation Defence Intelligent Mission Readiness and Maintenance Scheduling System has been proposed, which ameliorates the maintenance system by diagnosing the condition and predicting the maintenance requirements. Based on new data mining algorithms, this system intelligently optimises mission readiness for imminent operations and maintenance scheduling in repair echelons. With modified data mining algorithms such as Weighted Feature Ranking Genetic Algorithm and SVM-Random Forest Linear ensemble, it improves the reliability, availability and safety, alongside reducing maintenance cost and Equipment Out of Action (EOA) time. The results clearly conclude that the introduced algorithms have an edge over the conventional data mining algorithms. The system utilizing the intelligent condition-based maintenance approach improves the operational and maintenance decision strategy of the defence force.Keywords: condition based maintenance, data mining, defence maintenance, ensemble, genetic algorithms, maintenance scheduling, mission capability
Procedia PDF Downloads 29729095 Data Mining Spatial: Unsupervised Classification of Geographic Data
Authors: Chahrazed Zouaoui
Abstract:
In recent years, the volume of geospatial information is increasing due to the evolution of communication technologies and information, this information is presented often by geographic information systems (GIS) and stored on of spatial databases (BDS). The classical data mining revealed a weakness in knowledge extraction at these enormous amounts of data due to the particularity of these spatial entities, which are characterized by the interdependence between them (1st law of geography). This gave rise to spatial data mining. Spatial data mining is a process of analyzing geographic data, which allows the extraction of knowledge and spatial relationships from geospatial data, including methods of this process we distinguish the monothematic and thematic, geo- Clustering is one of the main tasks of spatial data mining, which is registered in the part of the monothematic method. It includes geo-spatial entities similar in the same class and it affects more dissimilar to the different classes. In other words, maximize intra-class similarity and minimize inter similarity classes. Taking account of the particularity of geo-spatial data. Two approaches to geo-clustering exist, the dynamic processing of data involves applying algorithms designed for the direct treatment of spatial data, and the approach based on the spatial data pre-processing, which consists of applying clustering algorithms classic pre-processed data (by integration of spatial relationships). This approach (based on pre-treatment) is quite complex in different cases, so the search for approximate solutions involves the use of approximation algorithms, including the algorithms we are interested in dedicated approaches (clustering methods for partitioning and methods for density) and approaching bees (biomimetic approach), our study is proposed to design very significant to this problem, using different algorithms for automatically detecting geo-spatial neighborhood in order to implement the method of geo- clustering by pre-treatment, and the application of the bees algorithm to this problem for the first time in the field of geo-spatial.Keywords: mining, GIS, geo-clustering, neighborhood
Procedia PDF Downloads 37529094 An Optimized Association Rule Mining Algorithm
Authors: Archana Singh, Jyoti Agarwal, Ajay Rana
Abstract:
Data Mining is an efficient technology to discover patterns in large databases. Association Rule Mining techniques are used to find the correlation between the various item sets in a database, and this co-relation between various item sets are used in decision making and pattern analysis. In recent years, the problem of finding association rules from large datasets has been proposed by many researchers. Various research papers on association rule mining (ARM) are studied and analyzed first to understand the existing algorithms. Apriori algorithm is the basic ARM algorithm, but it requires so many database scans. In DIC algorithm, less amount of database scan is needed but complex data structure lattice is used. The main focus of this paper is to propose a new optimized algorithm (Friendly Algorithm) and compare its performance with the existing algorithms A data set is used to find out frequent itemsets and association rules with the help of existing and proposed (Friendly Algorithm) and it has been observed that the proposed algorithm also finds all the frequent itemsets and essential association rules from databases as compared to existing algorithms in less amount of database scan. In the proposed algorithm, an optimized data structure is used i.e. Graph and Adjacency Matrix.Keywords: association rules, data mining, dynamic item set counting, FP-growth, friendly algorithm, graph
Procedia PDF Downloads 42129093 Assessing Carbon Stock and Sequestration of Reforestation Species on Old Mining Sites in Morocco Using the DNDC Model
Authors: Nabil Elkhatri, Mohamed Louay Metougui, Ngonidzashe Chirinda
Abstract:
Mining activities have left a legacy of degraded landscapes, prompting urgent efforts for ecological restoration. Reforestation holds promise as a potent tool to rehabilitate these old mining sites, with the potential to sequester carbon and contribute to climate change mitigation. This study focuses on evaluating the carbon stock and sequestration potential of reforestation species in the context of Morocco's mining areas, employing the DeNitrification-DeComposition (DNDC) model. The research is grounded in recognizing the need to connect theoretical models with practical implementation, ensuring that reforestation efforts are informed by accurate and context-specific data. Field data collection encompasses growth patterns, biomass accumulation, and carbon sequestration rates, establishing an empirical foundation for the study's analyses. By integrating the collected data with the DNDC model, the study aims to provide a comprehensive understanding of carbon dynamics within reforested ecosystems on old mining sites. The major findings reveal varying sequestration rates among different reforestation species, indicating the potential for species-specific optimization of reforestation strategies to enhance carbon capture. This research's significance lies in its potential to contribute to sustainable land management practices and climate change mitigation strategies. By quantifying the carbon stock and sequestration potential of reforestation species, the study serves as a valuable resource for policymakers, land managers, and practitioners involved in ecological restoration and carbon management. Ultimately, the study aligns with global objectives to rejuvenate degraded landscapes while addressing pressing climate challenges.Keywords: carbon stock, carbon sequestration, DNDC model, ecological restoration, mining sites, Morocco, reforestation, sustainable land management.
Procedia PDF Downloads 7629092 Mining Educational Data to Support Students’ Major Selection
Authors: Kunyanuth Kularbphettong, Cholticha Tongsiri
Abstract:
This paper aims to create the model for student in choosing an emphasized track of student majoring in computer science at Suan Sunandha Rajabhat University. The objective of this research is to develop the suggested system using data mining technique to analyze knowledge and conduct decision rules. Such relationships can be used to demonstrate the reasonableness of student choosing a track as well as to support his/her decision and the system is verified by experts in the field. The sampling is from student of computer science based on the system and the questionnaire to see the satisfaction. The system result is found to be satisfactory by both experts and student as well.Keywords: data mining technique, the decision support system, knowledge and decision rules, education
Procedia PDF Downloads 42329091 Predicting Medical Check-Up Patient Re-Coming Using Sequential Pattern Mining and Association Rules
Authors: Rizka Aisha Rahmi Hariadi, Chao Ou-Yang, Han-Cheng Wang, Rajesri Govindaraju
Abstract:
As the increasing of medical check-up popularity, there are a huge number of medical check-up data stored in database and have not been useful. These data actually can be very useful for future strategic planning if we mine it correctly. In other side, a lot of patients come with unpredictable coming and also limited available facilities make medical check-up service offered by hospital not maximal. To solve that problem, this study used those medical check-up data to predict patient re-coming. Sequential pattern mining (SPM) and association rules method were chosen because these methods are suitable for predicting patient re-coming using sequential data. First, based on patient personal information the data was grouped into … groups then discriminant analysis was done to check significant of the grouping. Second, for each group some frequent patterns were generated using SPM method. Third, based on frequent patterns of each group, pairs of variable can be extracted using association rules to get general pattern of re-coming patient. Last, discussion and conclusion was done to give some implications of the results.Keywords: patient re-coming, medical check-up, health examination, data mining, sequential pattern mining, association rules, discriminant analysis
Procedia PDF Downloads 64029090 Modelling of Powered Roof Supports Work
Authors: Marcin Michalak
Abstract:
Due to the increasing efforts on saving our natural environment a change in the structure of energy resources can be observed - an increasing fraction of a renewable energy sources. In many countries traditional underground coal mining loses its significance but there are still countries, like Poland or Germany, in which the coal based technologies have the greatest fraction in a total energy production. This necessitates to make an effort to limit the costs and negative effects of underground coal mining. The longwall complex is as essential part of the underground coal mining. The safety and the effectiveness of the work is strongly dependent of the diagnostic state of powered roof supports. The building of a useful and reliable diagnostic system requires a lot of data. As the acquisition of a data of any possible operating conditions it is important to have a possibility to generate a demanded artificial working characteristics. In this paper a new approach of modelling a leg pressure in the single unit of powered roof support. The model is a result of the analysis of a typical working cycles.Keywords: machine modelling, underground mining, coal mining, structure
Procedia PDF Downloads 36829089 Distributed Perceptually Important Point Identification for Time Series Data Mining
Authors: Tak-Chung Fu, Ying-Kit Hung, Fu-Lai Chung
Abstract:
In the field of time series data mining, the concept of the Perceptually Important Point (PIP) identification process is first introduced in 2001. This process originally works for financial time series pattern matching and it is then found suitable for time series dimensionality reduction and representation. Its strength is on preserving the overall shape of the time series by identifying the salient points in it. With the rise of Big Data, time series data contributes a major proportion, especially on the data which generates by sensors in the Internet of Things (IoT) environment. According to the nature of PIP identification and the successful cases, it is worth to further explore the opportunity to apply PIP in time series ‘Big Data’. However, the performance of PIP identification is always considered as the limitation when dealing with ‘Big’ time series data. In this paper, two distributed versions of PIP identification based on the Specialized Binary (SB) Tree are proposed. The proposed approaches solve the bottleneck when running the PIP identification process in a standalone computer. Improvement in term of speed is obtained by the distributed versions.Keywords: distributed computing, performance analysis, Perceptually Important Point identification, time series data mining
Procedia PDF Downloads 43429088 Discovering User Behaviour Patterns from Web Log Analysis to Enhance the Accessibility and Usability of Website
Authors: Harpreet Singh
Abstract:
Finding relevant information on the World Wide Web is becoming highly challenging day by day. Web usage mining is used for the extraction of relevant and useful knowledge, such as user behaviour patterns, from web access log records. Web access log records all the requests for individual files that the users have requested from the website. Web usage mining is important for Customer Relationship Management (CRM), as it can ensure customer satisfaction as far as the interaction between the customer and the organization is concerned. Web usage mining is helpful in improving website structure or design as per the user’s requirement by analyzing the access log file of a website through a log analyzer tool. The focus of this paper is to enhance the accessibility and usability of a guitar selling web site by analyzing their access log through Deep Log Analyzer tool. The results show that the maximum number of users is from the United States and that they use Opera 9.8 web browser and the Windows XP operating system.Keywords: web usage mining, web mining, log file, data mining, deep log analyzer
Procedia PDF Downloads 24829087 A Theoretical Model for Pattern Extraction in Large Datasets
Authors: Muhammad Usman
Abstract:
Pattern extraction has been done in past to extract hidden and interesting patterns from large datasets. Recently, advancements are being made in these techniques by providing the ability of multi-level mining, effective dimension reduction, advanced evaluation and visualization support. This paper focuses on reviewing the current techniques in literature on the basis of these parameters. Literature review suggests that most of the techniques which provide multi-level mining and dimension reduction, do not handle mixed-type data during the process. Patterns are not extracted using advanced algorithms for large datasets. Moreover, the evaluation of patterns is not done using advanced measures which are suited for high-dimensional data. Techniques which provide visualization support are unable to handle a large number of rules in a small space. We present a theoretical model to handle these issues. The implementation of the model is beyond the scope of this paper.Keywords: association rule mining, data mining, data warehouses, visualization of association rules
Procedia PDF Downloads 22329086 “Octopub”: Geographical Sentiment Analysis Using Named Entity Recognition from Social Networks for Geo-Targeted Billboard Advertising
Authors: Oussama Hafferssas, Hiba Benyahia, Amina Madani, Nassima Zeriri
Abstract:
Although data nowadays has multiple forms; from text to images, and from audio to videos, yet text is still the most used one at a public level. At an academical and research level, and unlike other forms, text can be considered as the easiest form to process. Therefore, a brunch of Data Mining researches has been always under its shadow, called "Text Mining". Its concept is just like data mining’s, finding valuable patterns in data, from large collections and tremendous volumes of data, in this case: Text. Named entity recognition (NER) is one of Text Mining’s disciplines, it aims to extract and classify references such as proper names, locations, expressions of time and dates, organizations and more in a given text. Our approach "Octopub" does not aim to find new ways to improve named entity recognition process, rather than that it’s about finding a new, and yet smart way, to use NER in a way that we can extract sentiments of millions of people using Social Networks as a limitless information source, and Marketing for product promotion as the main domain of application.Keywords: textmining, named entity recognition(NER), sentiment analysis, social media networks (SN, SMN), business intelligence(BI), marketing
Procedia PDF Downloads 58929085 Emotion Mining and Attribute Selection for Actionable Recommendations to Improve Customer Satisfaction
Authors: Jaishree Ranganathan, Poonam Rajurkar, Angelina A. Tzacheva, Zbigniew W. Ras
Abstract:
In today’s world, business often depends on the customer feedback and reviews. Sentiment analysis helps identify and extract information about the sentiment or emotion of the of the topic or document. Attribute selection is a challenging problem, especially with large datasets in actionable pattern mining algorithms. Action Rule Mining is one of the methods to discover actionable patterns from data. Action Rules are rules that help describe specific actions to be made in the form of conditions that help achieve the desired outcome. The rules help to change from any undesirable or negative state to a more desirable or positive state. In this paper, we present a Lexicon based weighted scheme approach to identify emotions from customer feedback data in the area of manufacturing business. Also, we use Rough sets and explore the attribute selection method for large scale datasets. Then we apply Actionable pattern mining to extract possible emotion change recommendations. This kind of recommendations help business analyst to improve their customer service which leads to customer satisfaction and increase sales revenue.Keywords: actionable pattern discovery, attribute selection, business data, data mining, emotion
Procedia PDF Downloads 19929084 Knowledge-Driven Decision Support System Based on Knowledge Warehouse and Data Mining by Improving Apriori Algorithm with Fuzzy Logic
Authors: Pejman Hosseinioun, Hasan Shakeri, Ghasem Ghorbanirostam
Abstract:
In recent years, we have seen an increasing importance of research and study on knowledge source, decision support systems, data mining and procedure of knowledge discovery in data bases and it is considered that each of these aspects affects the others. In this article, we have merged information source and knowledge source to suggest a knowledge based system within limits of management based on storing and restoring of knowledge to manage information and improve decision making and resources. In this article, we have used method of data mining and Apriori algorithm in procedure of knowledge discovery one of the problems of Apriori algorithm is that, a user should specify the minimum threshold for supporting the regularity. Imagine that a user wants to apply Apriori algorithm for a database with millions of transactions. Definitely, the user does not have necessary knowledge of all existing transactions in that database, and therefore cannot specify a suitable threshold. Our purpose in this article is to improve Apriori algorithm. To achieve our goal, we tried using fuzzy logic to put data in different clusters before applying the Apriori algorithm for existing data in the database and we also try to suggest the most suitable threshold to the user automatically.Keywords: decision support system, data mining, knowledge discovery, data discovery, fuzzy logic
Procedia PDF Downloads 33529083 A Web Service-Based Framework for Mining E-Learning Data
Authors: Felermino D. M. A. Ali, S. C. Ng
Abstract:
E-learning is an evolutionary form of distance learning and has become better over time as new technologies emerged. Today, efforts are still being made to embrace E-learning systems with emerging technologies in order to make them better. Among these advancements, Educational Data Mining (EDM) is one that is gaining a huge and increasing popularity due to its wide application for improving the teaching-learning process in online practices. However, even though EDM promises to bring many benefits to educational industry in general and E-learning environments in particular, its principal drawback is the lack of easy to use tools. The current EDM tools usually require users to have some additional technical expertise to effectively perform EDM tasks. Thus, in response to these limitations, this study intends to design and implement an EDM application framework which aims at automating and simplify the development of EDM in E-learning environment. The application framework introduces a Service-Oriented Architecture (SOA) that hides the complexity of technical details and enables users to perform EDM in an automated fashion. The framework was designed based on abstraction, extensibility, and interoperability principles. The framework implementation was made up of three major modules. The first module provides an abstraction for data gathering, which was done by extending Moodle LMS (Learning Management System) source code. The second module provides data mining methods and techniques as services; it was done by converting Weka API into a set of Web services. The third module acts as an intermediary between the first two modules, it contains a user-friendly interface that allows dynamically locating data provider services, and running knowledge discovery tasks on data mining services. An experiment was conducted to evaluate the overhead of the proposed framework through a combination of simulation and implementation. The experiments have shown that the overhead introduced by the SOA mechanism is relatively small, therefore, it has been concluded that a service-oriented architecture can be effectively used to facilitate educational data mining in E-learning environments.Keywords: educational data mining, e-learning, distributed data mining, moodle, service-oriented architecture, Weka
Procedia PDF Downloads 23629082 Personalize E-Learning System Based on Clustering and Sequence Pattern Mining Approach
Authors: H. S. Saini, K. Vijayalakshmi, Rishi Sayal
Abstract:
Network-based education has been growing rapidly in size and quality. Knowledge clustering becomes more important in personalized information retrieval for web-learning. A personalized-Learning service after the learners’ knowledge has been classified with clustering. Through automatic analysis of learners’ behaviors, their partition with similar data level and interests may be discovered so as to produce learners with contents that best match educational needs for collaborative learning. We present a specific mining tool and a recommender engine that we have integrated in the online learning in order to help the teacher to carry out the whole e-learning process. We propose to use sequential pattern mining algorithms to discover the most used path by the students and from this information can recommend links to the new students automatically meanwhile they browse in the course. We have Developed a specific author tool in order to help the teacher to apply all the data mining process. We tend to report on many experiments with real knowledge so as to indicate the quality of using both clustering and sequential pattern mining algorithms together for discovering personalized e-learning systems.Keywords: e-learning, cluster, personalization, sequence, pattern
Procedia PDF Downloads 42829081 Sequential Pattern Mining from Data of Medical Record with Sequential Pattern Discovery Using Equivalent Classes (SPADE) Algorithm (A Case Study : Bolo Primary Health Care, Bima)
Authors: Rezky Rifaini, Raden Bagus Fajriya Hakim
Abstract:
This research was conducted at the Bolo primary health Care in Bima Regency. The purpose of the research is to find out the association pattern that is formed of medical record database from Bolo Primary health care’s patient. The data used is secondary data from medical records database PHC. Sequential pattern mining technique is the method that used to analysis. Transaction data generated from Patient_ID, Check_Date and diagnosis. Sequential Pattern Discovery Algorithms Using Equivalent Classes (SPADE) is one of the algorithm in sequential pattern mining, this algorithm find frequent sequences of data transaction, using vertical database and sequence join process. Results of the SPADE algorithm is frequent sequences that then used to form a rule. It technique is used to find the association pattern between items combination. Based on association rules sequential analysis with SPADE algorithm for minimum support 0,03 and minimum confidence 0,75 is gotten 3 association sequential pattern based on the sequence of patient_ID, check_Date and diagnosis data in the Bolo PHC.Keywords: diagnosis, primary health care, medical record, data mining, sequential pattern mining, SPADE algorithm
Procedia PDF Downloads 40129080 CoP-Networks: Virtual Spaces for New Faculty’s Professional Development in the 21st Higher Education
Authors: Eman AbuKhousa, Marwan Z. Bataineh
Abstract:
The 21st century higher education and globalization challenge new faculty members to build effective professional networks and partnership with industry in order to accelerate their growth and success. This creates the need for community of practice (CoP)-oriented development approaches that focus on cognitive apprenticeship while considering individual predisposition and future career needs. This work adopts data mining, clustering analysis, and social networking technologies to present the CoP-Network as a virtual space that connects together similar career-aspiration individuals who are socially influenced to join and engage in a process for domain-related knowledge and practice acquisitions. The CoP-Network model can be integrated into higher education to extend traditional graduate and professional development programs.Keywords: clustering analysis, community of practice, data mining, higher education, new faculty challenges, social network, social influence, professional development
Procedia PDF Downloads 18329079 Secure Multiparty Computations for Privacy Preserving Classifiers
Authors: M. Sumana, K. S. Hareesha
Abstract:
Secure computations are essential while performing privacy preserving data mining. Distributed privacy preserving data mining involve two to more sites that cannot pool in their data to a third party due to the violation of law regarding the individual. Hence in order to model the private data without compromising privacy and information loss, secure multiparty computations are used. Secure computations of product, mean, variance, dot product, sigmoid function using the additive and multiplicative homomorphic property is discussed. The computations are performed on vertically partitioned data with a single site holding the class value.Keywords: homomorphic property, secure product, secure mean and variance, secure dot product, vertically partitioned data
Procedia PDF Downloads 41229078 Understanding the Complexity of Corruption and Anti-Corruption in Indonesia's Mining Industry: Challenges and Opportunities
Authors: Ahmad Khoirul Umam, Iin Mayasari
Abstract:
Indonesia is blessed with rich natural resources and frequently dubbed as the 6th richest country in the world in terms of mining resources, including minerals and coal. Mining can contribute to the socio-economic development by generating state revenue for development, elevating poverty through employment, opening and developing remote areas, putting in basic infrastructure and creating new centres of developments. However, favouritism and rent-seeking behaviour committed by government officials, politicians, and business players in licensing and permit giving in mining and forestry sectors have resisted reforms. Even though Indonesia’s Corruption Eradication Commission (KPK) successfully targeted untouchable actors, public criticism continues to focus on questions of why corruption apparently remains systemic in mining industry in the country? This paper revealed that structural anomalies, as well as legacies of the Soeharto era’s power inequities, have severely inhibited Indonesia’s bureaucratic arrangements that continue to influence adversely the elements of transparency and accountability in mining industry governance. In the more liberalized and decentralized political system, the deficiencies have gradually assisted vested interest groups to band together, thus creating a coalition that can challenge, resist, and contain anti-graft actions. Therefore, Indonesia needs much more serious anti-corruption actions that would require eliminating the monopoly over power, enhancing competition, limiting discretion, and clarifying the rules of business and political competition in the mining sector in the country.Keywords: anti-corruption, public integrity, private integrity, mining industry, democratization
Procedia PDF Downloads 11129077 Performance Evaluation of Production Schedules Based on Process Mining
Authors: Kwan Hee Han
Abstract:
External environment of enterprise is rapidly changing majorly by global competition, cost reduction pressures, and new technology. In these situations, production scheduling function plays a critical role to meet customer requirements and to attain the goal of operational efficiency. It deals with short-term decision making in the production process of the whole supply chain. The major task of production scheduling is to seek a balance between customer orders and limited resources. In manufacturing companies, this task is so difficult because it should efficiently utilize resource capacity under the careful consideration of many interacting constraints. At present, many computerized software solutions have been utilized in many enterprises to generate a realistic production schedule to overcome the complexity of schedule generation. However, most production scheduling systems do not provide sufficient information about the validity of the generated schedule except limited statistics. Process mining only recently emerged as a sub-discipline of both data mining and business process management. Process mining techniques enable the useful analysis of a wide variety of processes such as process discovery, conformance checking, and bottleneck analysis. In this study, the performance of generated production schedule is evaluated by mining event log data of production scheduling software system by using the process mining techniques since every software system generates event logs for the further use such as security investigation, auditing and error bugging. An application of process mining approach is proposed for the validation of the goodness of production schedule generated by scheduling software systems in this study. By using process mining techniques, major evaluation criteria such as utilization of workstation, existence of bottleneck workstations, critical process route patterns, and work load balance of each machine over time are measured, and finally, the goodness of production schedule is evaluated. By using the proposed process mining approach for evaluating the performance of generated production schedule, the quality of production schedule of manufacturing enterprises can be improved.Keywords: data mining, event log, process mining, production scheduling
Procedia PDF Downloads 27929076 Decision Support System in Air Pollution Using Data Mining
Authors: E. Fathallahi Aghdam, V. Hosseini
Abstract:
Environmental pollution is not limited to a specific region or country; that is why sustainable development, as a necessary process for improvement, pays attention to issues such as destruction of natural resources, degradation of biological system, global pollution, and climate change in the world, especially in the developing countries. According to the World Health Organization, as a developing city, Tehran (capital of Iran) is one of the most polluted cities in the world in terms of air pollution. In this study, three pollutants including particulate matter less than 10 microns, nitrogen oxides, and sulfur dioxide were evaluated in Tehran using data mining techniques and through Crisp approach. The data from 21 air pollution measuring stations in different areas of Tehran were collected from 1999 to 2013. Commercial softwares Clementine was selected for this study. Tehran was divided into distinct clusters in terms of the mentioned pollutants using the software. As a data mining technique, clustering is usually used as a prologue for other analyses, therefore, the similarity of clusters was evaluated in this study through analyzing local conditions, traffic behavior, and industrial activities. In fact, the results of this research can support decision-making system, help managers improve the performance and decision making, and assist in urban studies.Keywords: data mining, clustering, air pollution, crisp approach
Procedia PDF Downloads 42729075 Enhance the Power of Sentiment Analysis
Authors: Yu Zhang, Pedro Desouza
Abstract:
Since big data has become substantially more accessible and manageable due to the development of powerful tools for dealing with unstructured data, people are eager to mine information from social media resources that could not be handled in the past. Sentiment analysis, as a novel branch of text mining, has in the last decade become increasingly important in marketing analysis, customer risk prediction and other fields. Scientists and researchers have undertaken significant work in creating and improving their sentiment models. In this paper, we present a concept of selecting appropriate classifiers based on the features and qualities of data sources by comparing the performances of five classifiers with three popular social media data sources: Twitter, Amazon Customer Reviews, and Movie Reviews. We introduced a couple of innovative models that outperform traditional sentiment classifiers for these data sources, and provide insights on how to further improve the predictive power of sentiment analysis. The modelling and testing work was done in R and Greenplum in-database analytic tools.Keywords: sentiment analysis, social media, Twitter, Amazon, data mining, machine learning, text mining
Procedia PDF Downloads 35329074 Optimizing Communications Overhead in Heterogeneous Distributed Data Streams
Authors: Rashi Bhalla, Russel Pears, M. Asif Naeem
Abstract:
In this 'Information Explosion Era' analyzing data 'a critical commodity' and mining knowledge from vertically distributed data stream incurs huge communication cost. However, an effort to decrease the communication in the distributed environment has an adverse influence on the classification accuracy; therefore, a research challenge lies in maintaining a balance between transmission cost and accuracy. This paper proposes a method based on Bayesian inference to reduce the communication volume in a heterogeneous distributed environment while retaining prediction accuracy. Our experimental evaluation reveals that a significant reduction in communication can be achieved across a diverse range of dataset types.Keywords: big data, bayesian inference, distributed data stream mining, heterogeneous-distributed data
Procedia PDF Downloads 16129073 Text Mining of Veterinary Forums for Epidemiological Surveillance Supplementation
Authors: Samuel Munaf, Kevin Swingler, Franz Brülisauer, Anthony O’Hare, George Gunn, Aaron Reeves
Abstract:
Web scraping and text mining are popular computer science methods deployed by public health researchers to augment traditional epidemiological surveillance. However, within veterinary disease surveillance, such techniques are still in the early stages of development and have not yet been fully utilised. This study presents an exploration into the utility of incorporating internet-based data to better understand the smallholder farming communities within Scotland by using online text extraction and the subsequent mining of this data. Web scraping of the livestock fora was conducted in conjunction with text mining of the data in search of common themes, words, and topics found within the text. Results from bi-grams and topic modelling uncover four main topics of interest within the data pertaining to aspects of livestock husbandry: feeding, breeding, slaughter, and disposal. These topics were found amongst both the poultry and pig sub-forums. Topic modeling appears to be a useful method of unsupervised classification regarding this form of data, as it has produced clusters that relate to biosecurity and animal welfare. Internet data can be a very effective tool in aiding traditional veterinary surveillance methods, but the requirement for human validation of said data is crucial. This opens avenues of research via the incorporation of other dynamic social media data, namely Twitter and Facebook/Meta, in addition to time series analysis to highlight temporal patterns.Keywords: veterinary epidemiology, disease surveillance, infodemiology, infoveillance, smallholding, social media, web scraping, sentiment analysis, geolocation, text mining, NLP
Procedia PDF Downloads 99