Search results for: Distributed Data Mining
7853 An Approach to Concerns and Aspects Mining for Web Applications
Authors: Carlo Bellettini, Alessandro Marchetto, Andrea Trentini
Abstract:
Web applications have become very complex and crucial, especially when combined with areas such as CRM (Customer Relationship Management) and BPR (Business Process Reengineering), the scientific community has focused attention to Web applications design, development, analysis, and testing, by studying and proposing methodologies and tools. This paper proposes an approach to automatic multi-dimensional concern mining for Web Applications, based on concepts analysis, impact analysis, and token-based concern identification. This approach lets the user to analyse and traverse Web software relevant to a particular concern (concept, goal, purpose, etc.) via multi-dimensional separation of concerns, to document, understand and test Web applications. This technique was developed in the context of WAAT (Web Applications Analysis and Testing) project. A semi-automatic tool to support this technique is currently under development.Keywords: Aspect Mining, Concepts Analysis, Concerns Mining, Multi-Dimensional Separation of Concerns, Impact Analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15137852 Validation and Selection between Machine Learning Technique and Traditional Methods to Reduce Bullwhip Effects: a Data Mining Approach
Authors: Hamid R. S. Mojaveri, Seyed S. Mousavi, Mojtaba Heydar, Ahmad Aminian
Abstract:
The aim of this paper is to present a methodology in three steps to forecast supply chain demand. In first step, various data mining techniques are applied in order to prepare data for entering into forecasting models. In second step, the modeling step, an artificial neural network and support vector machine is presented after defining Mean Absolute Percentage Error index for measuring error. The structure of artificial neural network is selected based on previous researchers' results and in this article the accuracy of network is increased by using sensitivity analysis. The best forecast for classical forecasting methods (Moving Average, Exponential Smoothing, and Exponential Smoothing with Trend) is resulted based on prepared data and this forecast is compared with result of support vector machine and proposed artificial neural network. The results show that artificial neural network can forecast more precisely in comparison with other methods. Finally, forecasting methods' stability is analyzed by using raw data and even the effectiveness of clustering analysis is measured.Keywords: Artificial Neural Networks (ANN), bullwhip effect, demand forecasting, Support Vector Machine (SVM).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20107851 Developing an Advanced Algorithm Capable of Classifying News, Articles and Other Textual Documents Using Text Mining Techniques
Authors: R. B. Knudsen, O. T. Rasmussen, R. A. Alphinas
Abstract:
The reason for conducting this research is to develop an algorithm that is capable of classifying news articles from the automobile industry, according to the competitive actions that they entail, with the use of Text Mining (TM) methods. It is needed to test how to properly preprocess the data for this research by preparing pipelines which fits each algorithm the best. The pipelines are tested along with nine different classification algorithms in the realm of regression, support vector machines, and neural networks. Preliminary testing for identifying the optimal pipelines and algorithms resulted in the selection of two algorithms with two different pipelines. The two algorithms are Logistic Regression (LR) and Artificial Neural Network (ANN). These algorithms are optimized further, where several parameters of each algorithm are tested. The best result is achieved with the ANN. The final model yields an accuracy of 0.79, a precision of 0.80, a recall of 0.78, and an F1 score of 0.76. By removing three of the classes that created noise, the final algorithm is capable of reaching an accuracy of 94%.
Keywords: Artificial neural network, competitive dynamics, logistic regression, text classification, text mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5357850 Main Cause of Children's Deaths in Indigenous Wayuu Community from Department of La Guajira: A Research Developed through Data Mining Use
Authors: Isaura Esther Solano Núñez, David Suarez
Abstract:
The main purpose of this research is to discover what causes death in children of the Wayuu community, and deeply analyze those results in order to take corrective measures to properly control infant mortality. We consider important to determine the reasons that are producing early death in this specific type of population, since they are the most vulnerable to high risk environmental conditions. In this way, the government, through competent authorities, may develop prevention policies and the right measures to avoid an increase of this tragic fact. The methodology used to develop this investigation is data mining, which consists in gaining and examining large amounts of data to produce new and valuable information. Through this technique it has been possible to determine that the child population is dying mostly from malnutrition. In short, this technique has been very useful to develop this study; it has allowed us to transform large amounts of information into a conclusive and important statement, which has made it easier to take appropriate steps to resolve a particular situation.
Keywords: Malnutrition, datamining, analytical, descriptive, population, wayuu, indigenous.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6967849 Dynamic Simulation of a Hybrid Wind Farm with Wind Turbines and Distributed Compressed Air Energy Storage System
Authors: Eronini Umez-Eronini
Abstract:
Compressed air energy storage (CAES) coupled with wind farms have gained attention as a means to address the intermittency and variability of wind power. However, most existing studies and implementations focus on bulk or centralized CAES plants. This study presents a dynamic model of a hybrid wind farm with distributed CAES, using air storage tanks and compressor and expander trains at each wind turbine station. It introduces the concept of a distributed CAES with linked air cooling and heating, and presents an approach to scheduling and regulating the production of compressed air and power in such a system. Mathematical models of the dynamic components of this hybrid wind farm system, including a simple transient wake field model, were developed and simulated using MATLAB, with real wind data and Transmission System Operator (TSO) absolute power reference signals as inputs. The simulation results demonstrate that the proposed ad hoc supervisory controller is able to track the minute-scale power demand signal within an error band size comparable to the electrical power rating of a single expander. This suggests that combining the global distributed CAES control with power regulation for individual wind turbines could further improve the system’s performance. The round trip electrical storage efficiency computed for the distributed CAES was also in the range of reported round trip storage electrical efficiencies for improved bulk CAES. These findings contribute to the enhancement of efficiency of wind farms without access to large-scale storage or underground caverns.
Keywords: Distributed CAES, compressed air, energy storage, hybrid wind farm, wind turbines, dynamic simulation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 757848 A Survey of Semantic Integration Approaches in Bioinformatics
Authors: Chaimaa Messaoudi, Rachida Fissoune, Hassan Badir
Abstract:
Technological advances of computer science and data analysis are helping to provide continuously huge volumes of biological data, which are available on the web. Such advances involve and require powerful techniques for data integration to extract pertinent knowledge and information for a specific question. Biomedical exploration of these big data often requires the use of complex queries across multiple autonomous, heterogeneous and distributed data sources. Semantic integration is an active area of research in several disciplines, such as databases, information-integration, and ontology. We provide a survey of some approaches and techniques for integrating biological data, we focus on those developed in the ontology community.Keywords: Semantic data integration, biological ontology, linked data, semantic web, OWL, RDF.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18197847 Machine Learning Facing Behavioral Noise Problem in an Imbalanced Data Using One Side Behavioral Noise Reduction: Application to a Fraud Detection
Authors: Salma El Hajjami, Jamal Malki, Alain Bouju, Mohammed Berrada
Abstract:
With the expansion of machine learning and data mining in the context of Big Data analytics, the common problem that affects data is class imbalance. It refers to an imbalanced distribution of instances belonging to each class. This problem is present in many real world applications such as fraud detection, network intrusion detection, medical diagnostics, etc. In these cases, data instances labeled negatively are significantly more numerous than the instances labeled positively. When this difference is too large, the learning system may face difficulty when tackling this problem, since it is initially designed to work in relatively balanced class distribution scenarios. Another important problem, which usually accompanies these imbalanced data, is the overlapping instances between the two classes. It is commonly referred to as noise or overlapping data. In this article, we propose an approach called: One Side Behavioral Noise Reduction (OSBNR). This approach presents a way to deal with the problem of class imbalance in the presence of a high noise level. OSBNR is based on two steps. Firstly, a cluster analysis is applied to groups similar instances from the minority class into several behavior clusters. Secondly, we select and eliminate the instances of the majority class, considered as behavioral noise, which overlap with behavior clusters of the minority class. The results of experiments carried out on a representative public dataset confirm that the proposed approach is efficient for the treatment of class imbalances in the presence of noise.Keywords: Machine learning, Imbalanced data, Data mining, Big data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11377846 Dose due the Incorporation of Radionuclides Using Teeth as Bioindicators nearby Caetité Uranium Mines
Authors: Viviane S. Guimarães, Ícaro M. M. Brasil, Simara S. Campos, Roseli F. Gennari, Márcia R. P. Attie, Susana O. Souza.
Abstract:
Uranium mining and processing in Brazil occur in a northeastern area near to Caetité-BA. Several Non-Governmental Organizations claim that uranium mining in this region is a pollutant causing health risks to the local population,but those in charge of the complex extraction and production of“yellow cake" for generating fuel to the nuclear power plants reject these allegations. This study aimed at identifying potential problems caused by mining to the population of Caetité. In this, work,the concentrations of 238U, 232Th and 40K radioisotopes in the teeth of the Caetité population were determined by ICP-MS. Teeth are used as bioindicators of incorporated radionuclides. Cumulative radiation doses in the skeleton were also determined. The concentration values were below 0.008 ppm, and annual effective dose due to radioisotopes are below to the reference values. Therefore, it is not possible to state that the mining process in Caetité increases pollution or radiation exposure in a meaningful way.Keywords: bioindicators, radiation dose, radioisotopesincorporation, uranium.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 41107845 Educational Data Mining: The Case of Department of Mathematics and Computing in the Period 2009-2018
Authors: M. Sitoe, O. Zacarias
Abstract:
University education is influenced by several factors that range from the adoption of strategies to strengthen the whole process to the academic performance improvement of the students themselves. This work uses data mining techniques to develop a predictive model to identify students with a tendency to evasion and retention. To this end, a database of real students’ data from the Department of University Admission (DAU) and the Department of Mathematics and Informatics (DMI) was used. The data comprised 388 undergraduate students admitted in the years 2009 to 2014. The Weka tool was used for model building, using three different techniques, namely: K-nearest neighbor, random forest, and logistic regression. To allow for training on multiple train-test splits, a cross-validation approach was employed with a varying number of folds. To reduce bias variance and improve the performance of the models, ensemble methods of Bagging and Stacking were used. After comparing the results obtained by the three classifiers, Logistic Regression using Bagging with seven folds obtained the best performance, showing results above 90% in all evaluated metrics: accuracy, rate of true positives, and precision. Retention is the most common tendency.
Keywords: Evasion and retention, cross validation, bagging, stacking.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1197844 Adaptive Distributed Genetic Algorithms and Its VLSI Design
Authors: Kazutaka Kobayashi, Norihiko Yoshida, Shuji Narazaki
Abstract:
This paper presents a dynamic adaptation scheme for the frequency of inter-deme migration in distributed genetic algorithms (GA), and its VLSI hardware design. Distributed GA, or multi-deme-based GA, uses multiple populations which evolve concurrently. The purpose of dynamic adaptation is to improve convergence performance so as to obtain better solutions. Through simulation experiments, we proved that our scheme achieves better performance than fixed frequency migration schemes.Keywords: Genetic algorithms, dynamic adaptation, VLSI hardware.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16697843 Redesigning Business Processes: A Method Based on Simulation and Process Mining Techniques
Authors: Zahra Mohammadnazari, Fateme Rostambeygi, Fatemeh Dehrouyeh, Hwang Ki-Soon, Amir Aghsami
Abstract:
Corporations have always prioritized efforts to examine and improve processes. Various metrics, such as the cost and time required to implement the process and can be specified in this regard. Process improvement can be defined as an improvement of these indicators. This is accomplished by looking at prospective adjustments to the current executive process model or the resources allotted to it. Research has been conducted in this paper to the improve the procurement process and aims to explore assessment prospects in the project using a combination of process mining and simulation (benefiting from Play-In and Play-Out methodologies). To run the simulation, we will need to complete the control flow diagram, institution settings, resource settings, and activity settings. The process of mining event logs yields the process control flow. However, both the entry of institutions and the distribution of resources must be modeled. The rate of admission of institutions and the distribution of time for the implementation of activities will be determined in the next step.
Keywords: Business reengineering, Petri net, process-based simulation, process mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4837842 The Impacts of Local Decision Making on Customisation Process Speed across Distributed Boundaries: A Case Study
Authors: A. M. Qahtani, G. B. Wills, A. M. Gravell
Abstract:
Communicating and managing customers’ requirements in software development projects play a vital role in the software development process. While it is difficult to do so locally, it is even more difficult to communicate these requirements over distributed boundaries and to convey them to multiple distribution customers. This paper discusses the communication of multiple distribution customers’ requirements in the context of customised software products. The main purpose is to understand the challenges of communicating and managing customisation requirements across distributed boundaries. We propose a model for Communicating Customisation Requirements of Multi-Clients in a Distributed Domain (CCRD). Thereafter, we evaluate that model by presenting the findings of a case study conducted with a company with customisation projects for 18 distributed customers. Then, we compare the outputs of the real case process and the outputs of the CCRD model using simulation methods. Our conjecture is that the CCRD model can reduce the challenge of communication requirements over distributed organisational boundaries, and the delay in decision making and in the entire customisation process time.
Keywords: Customisation Software Products, Global Software Engineering, Local Decision Making, Requirement Engineering, Simulation Model.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18977841 Latent Topic Based Medical Data Classification
Authors: Jian-hua Yeh, Shi-yi Kuo
Abstract:
This paper discusses the classification process for medical data. In this paper, we use the data from ACM KDDCup 2008 to demonstrate our classification process based on latent topic discovery. In this data set, the target set and outliers are quite different in their nature: target set is only 0.6% size in total, while the outliers consist of 99.4% of the data set. We use this data set as an example to show how we dealt with this extremely biased data set with latent topic discovery and noise reduction techniques. Our experiment faces two major challenge: (1) extremely distributed outliers, and (2) positive samples are far smaller than negative ones. We try to propose a suitable process flow to deal with these issues and get a best AUC result of 0.98.
Keywords: classification, latent topics, outlier adjustment, feature scaling
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16427840 Consistency Model and Synchronization Primitives in SDSMS
Authors: Dalvinder Singh Dhaliwal, Parvinder S. Sandhu, S. N. Panda
Abstract:
This paper is on the general discussion of memory consistency model like Strict Consistency, Sequential Consistency, Processor Consistency, Weak Consistency etc. Then the techniques for implementing distributed shared memory Systems and Synchronization Primitives in Software Distributed Shared Memory Systems are discussed. The analysis involves the performance measurement of the protocol concerned that is Multiple Writer Protocol. Each protocol has pros and cons. So, the problems that are associated with each protocol is discussed and other related things are explored.
Keywords: Distributed System, Single owner protocol, Multiple owner protocol
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13907839 Evaluation of the Performance of ACTIFLO® Clarifier in the Treatment of Mining Wastewaters: Case Study of Costerfield Mining Operations, Victoria, Australia
Authors: Seyed Mohsen Samaei, Shirley Gato-Trinidad
Abstract:
A pre-treatment stage prior to reverse osmosis (RO) is very important to ensure the long-term performance of the RO membranes in any wastewater treatment using RO. This study aims to evaluate the application of the Actiflo® clarifier as part of a pre-treatment unit in mining operations. It involves performing analytical testing on RO feed water before and after installation of Actiflo® unit. Water samples prior to RO plant stage were obtained on different dates from Costerfield mining operations in Victoria, Australia. Tests were conducted in an independent laboratory to determine the concentration of various compounds in RO feed water before and after installation of Actiflo® unit during the entire evaluated period from December 2015 to June 2018. Water quality analysis shows that the quality of RO feed water has remarkably improved since installation of Actiflo® clarifier. Suspended solids (SS) and turbidity removal efficiencies has been improved by 91 and 85 percent respectively in pre-treatment system since the installation of Actiflo®. The Actiflo® clarifier proved to be a valuable part of pre-treatment system prior to RO. It has the potential to conveniently condition the mining wastewater prior to RO unit, and reduce the risk of RO physical failure and irreversible fouling. Consequently, reliable and durable operation of RO unit with minimum requirement for RO membrane replacement is expected with Actiflo® in use.
Keywords: Actiflo® clarifier, membrane, mining wastewater, reverse osmosis, wastewater treatment.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12007838 Performance Evaluation of Data Mining Techniques for Predicting Software Reliability
Authors: Pradeep Kumar, Abdul Wahid
Abstract:
Accurate software reliability prediction not only enables developers to improve the quality of software but also provides useful information to help them for planning valuable resources. This paper examines the performance of three well-known data mining techniques (CART, TreeNet and Random Forest) for predicting software reliability. We evaluate and compare the performance of proposed models with Cascade Correlation Neural Network (CCNN) using sixteen empirical databases from the Data and Analysis Center for Software. The goal of our study is to help project managers to concentrate their testing efforts to minimize the software failures in order to improve the reliability of the software systems. Two performance measures, Normalized Root Mean Squared Error (NRMSE) and Mean Absolute Errors (MAE), illustrate that CART model is accurate than the models predicted using Random Forest, TreeNet and CCNN in all datasets used in our study. Finally, we conclude that such methods can help in reliability prediction using real-life failure datasets.
Keywords: Classification, Cascade Correlation Neural Network, Random Forest, Software reliability, TreeNet.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18397837 Observations about the Principal Components Analysis and Data Clustering Techniques in the Study of Medical Data
Authors: Cristina G. Dascâlu, Corina Dima Cozma, Elena Carmen Cotrutz
Abstract:
The medical data statistical analysis often requires the using of some special techniques, because of the particularities of these data. The principal components analysis and the data clustering are two statistical methods for data mining very useful in the medical field, the first one as a method to decrease the number of studied parameters, and the second one as a method to analyze the connections between diagnosis and the data about the patient-s condition. In this paper we investigate the implications obtained from a specific data analysis technique: the data clustering preceded by a selection of the most relevant parameters, made using the principal components analysis. Our assumption was that, using the principal components analysis before data clustering - in order to select and to classify only the most relevant parameters – the accuracy of clustering is improved, but the practical results showed the opposite fact: the clustering accuracy decreases, with a percentage approximately equal with the percentage of information loss reported by the principal components analysis.Keywords: Data clustering, medical data, principal components analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15017836 Daemon- Based Distributed Deadlock Detection and Resolution
Authors: Z. RahimAlipour, A. T. Haghighat
Abstract:
detecting the deadlock is one of the important problems in distributed systems and different solutions have been proposed for it. Among the many deadlock detection algorithms, Edge-chasing has been the most widely used. In Edge-chasing algorithm, a special message called probe is made and sent along dependency edges. When the initiator of a probe receives the probe back the existence of a deadlock is revealed. But these algorithms are not problem-free. One of the problems associated with them is that they cannot detect some deadlocks and they even identify false deadlocks. A key point not mentioned in the literature is that when the process is waiting to obtain the required resources and its execution has been blocked, how it can actually respond to probe messages in the system. Also the question of 'which process should be victimized in order to achieve a better performance when multiple cycles exist within one single process in the system' has received little attention. In this paper, one of the basic concepts of the operating system - daemon - will be used to solve the problems mentioned. The proposed Algorithm becomes engaged in sending probe messages to the mandatory daemons and collects enough information to effectively identify and resolve multi-cycle deadlocks in distributed systems.Keywords: Distributed system, distributed deadlock detectionand resolution, daemon, false deadlock.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19367835 Distributed Estimation Using an Improved Incremental Distributed LMS Algorithm
Authors: Amir Rastegarnia, Mohammad Ali Tinati, Azam Khalili
Abstract:
In this paper we consider the problem of distributed adaptive estimation in wireless sensor networks for two different observation noise conditions. In the first case, we assume that there are some sensors with high observation noise variance (noisy sensors) in the network. In the second case, different variance for observation noise is assumed among the sensors which is more close to real scenario. In both cases, an initial estimate of each sensor-s observation noise is obtained. For the first case, we show that when there are such sensors in the network, the performance of conventional distributed adaptive estimation algorithms such as incremental distributed least mean square (IDLMS) algorithm drastically decreases. In addition, detecting and ignoring these sensors leads to a better performance in a sense of estimation. In the next step, we propose a simple algorithm to detect theses noisy sensors and modify the IDLMS algorithm to deal with noisy sensors. For the second case, we propose a new algorithm in which the step-size parameter is adjusted for each sensor according to its observation noise variance. As the simulation results show, the proposed methods outperforms the IDLMS algorithm in the same condition.
Keywords: Distributes estimation, sensor networks, adaptive filter, IDLMS.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14457834 Distributed Multi-Agent Based Approach on an Intelligent Transportation Network
Authors: Xiao Yihong, Yu Kexin, Burra Venkata Durga Kumar
Abstract:
With the accelerating process of urbanization, the problem of urban road congestion is becoming more and more serious. Intelligent transportation system combining distributed and artificial intelligence has become a research hotspot. As the core development direction of the intelligent transportation system, Cooperative Intelligent Transportation System (C-ITS) integrates advanced information technology and communication methods and realizes the integration of human, vehicle, roadside infrastructure and other elements through the multi-agent distributed system. By analyzing the system architecture and technical characteristics of C-ITS, the paper proposes a distributed multi-agent C-ITS. The system consists of Roadside Subsystem, Vehicle Subsystem and Personal Subsystem. At the same time, we explore the scalability of the C-ITS and put forward incorporating local rewards in the centralized training decentralized execution paradigm, hoping to add a scalable value decomposition method. In addition, we also suggest introducing blockchain to improve the safety of the traffic information transmission process. The system is expected to improve vehicle capacity and traffic safety.
Keywords: Distributed system, artificial intelligence, multi-agent, Cooperative Intelligent Transportation System.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5727833 Affine Radial Basis Function Neural Networks for the Robust Control of Hyperbolic Distributed Parameter Systems
Authors: Eleni Aggelogiannaki, Haralambos Sarimveis
Abstract:
In this work, a radial basis function (RBF) neural network is developed for the identification of hyperbolic distributed parameter systems (DPSs). This empirical model is based only on process input-output data and used for the estimation of the controlled variables at specific locations, without the need of online solution of partial differential equations (PDEs). The nonlinear model that is obtained is suitably transformed to a nonlinear state space formulation that also takes into account the model mismatch. A stable robust control law is implemented for the attenuation of external disturbances. The proposed identification and control methodology is applied on a long duct, a common component of thermal systems, for a flow based control of temperature distribution. The closed loop performance is significantly improved in comparison to existing control methodologies.
Keywords: Hyperbolic Distributed Parameter Systems, Radial Basis Function Neural Networks, H∞ control, Thermal systems.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14207832 CoP-Networks: Virtual Spaces for New Faculty’s Professional Development in the 21st Higher Education
Authors: Eman AbuKhousa, Marwan Z. Bataineh
Abstract:
The 21st century higher education and globalization challenge new faculty members to build effective professional networks and partnership with industry in order to accelerate their growth and success. This creates the need for community of practice (CoP)-oriented development approaches that focus on cognitive apprenticeship while considering individual predisposition and future career needs. This work adopts data mining, clustering analysis, and social networking technologies to present the CoP-Network as a virtual space that connects together similar career-aspiration individuals who are socially influenced to join and engage in a process for domain-related knowledge and practice acquisitions. The CoP-Network model can be integrated into higher education to extend traditional graduate and professional development programs.Keywords: Clustering analysis, community of practice, data mining, higher education, new faculty challenges, social networks, social influence, professional development.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9727831 Predication Model for Leukemia Diseases Based on Data Mining Classification Algorithms with Best Accuracy
Authors: Fahd Sabry Esmail, M. Badr Senousy, Mohamed Ragaie
Abstract:
In recent years, there has been an explosion in the rate of using technology that help discovering the diseases. For example, DNA microarrays allow us for the first time to obtain a "global" view of the cell. It has great potential to provide accurate medical diagnosis, to help in finding the right treatment and cure for many diseases. Various classification algorithms can be applied on such micro-array datasets to devise methods that can predict the occurrence of Leukemia disease. In this study, we compared the classification accuracy and response time among eleven decision tree methods and six rule classifier methods using five performance criteria. The experiment results show that the performance of Random Tree is producing better result. Also it takes lowest time to build model in tree classifier. The classification rules algorithms such as nearest- neighbor-like algorithm (NNge) is the best algorithm due to the high accuracy and it takes lowest time to build model in classification.
Keywords: Data mining, classification techniques, decision tree, classification rule, leukemia diseases, microarray data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 25577830 Determination of the Bank's Customer Risk Profile: Data Mining Applications
Authors: Taner Ersoz, Filiz Ersoz, Seyma Ozbilge
Abstract:
In this study, the clients who applied to a bank branch for loan were analyzed through data mining. The study was composed of the information such as amounts of loans received by personal and SME clients working with the bank branch, installment numbers, number of delays in loan installments, payments available in other banks and number of banks to which they are in debt between 2010 and 2013. The client risk profile was examined through Classification and Regression Tree (CART) analysis, one of the decision tree classification methods. At the end of the study, 5 different types of customers have been determined on the decision tree. The classification of these types of customers has been created with the rating of those posing a risk for the bank branch and the customers have been classified according to the risk ratings.
Keywords: Client classification, loan suitability, risk rating, CART analysis, decision tree.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10727829 Ontology-Based Backpropagation Neural Network Classification and Reasoning Strategy for NoSQL and SQL Databases
Authors: Hao-Hsiang Ku, Ching-Ho Chi
Abstract:
Big data applications have become an imperative for many fields. Many researchers have been devoted into increasing correct rates and reducing time complexities. Hence, the study designs and proposes an Ontology-based backpropagation neural network classification and reasoning strategy for NoSQL big data applications, which is called ON4NoSQL. ON4NoSQL is responsible for enhancing the performances of classifications in NoSQL and SQL databases to build up mass behavior models. Mass behavior models are made by MapReduce techniques and Hadoop distributed file system based on Hadoop service platform. The reference engine of ON4NoSQL is the ontology-based backpropagation neural network classification and reasoning strategy. Simulation results indicate that ON4NoSQL can efficiently achieve to construct a high performance environment for data storing, searching, and retrieving.
Keywords: Hadoop, NoSQL, ontology, backpropagation neural network, and high distributed file system.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9997828 IMDC: An Image-Mapped Data Clustering Technique for Large Datasets
Authors: Faruq A. Al-Omari, Nabeel I. Al-Fayoumi
Abstract:
In this paper, we present a new algorithm for clustering data in large datasets using image processing approaches. First the dataset is mapped into a binary image plane. The synthesized image is then processed utilizing efficient image processing techniques to cluster the data in the dataset. Henceforth, the algorithm avoids exhaustive search to identify clusters. The algorithm considers only a small set of the data that contains critical boundary information sufficient to identify contained clusters. Compared to available data clustering techniques, the proposed algorithm produces similar quality results and outperforms them in execution time and storage requirements.
Keywords: Data clustering, Data mining, Image-mapping, Pattern discovery, Predictive analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15007827 An Enhanced Distributed System to improve theTime Complexity of Binary Indexed Trees
Authors: Ahmed M. Elhabashy, A. Baes Mohamed, Abou El Nasr Mohamad
Abstract:
Distributed Computing Systems are usually considered the most suitable model for practical solutions of many parallel algorithms. In this paper an enhanced distributed system is presented to improve the time complexity of Binary Indexed Trees (BIT). The proposed system uses multi-uniform processors with identical architectures and a specially designed distributed memory system. The analysis of this system has shown that it has reduced the time complexity of the read query to O(Log(Log(N))), and the update query to constant complexity, while the naive solution has a time complexity of O(Log(N)) for both queries. The system was implemented and simulated using VHDL and Verilog Hardware Description Languages, with xilinx ISE 10.1, as the development environment and ModelSim 6.1c, similarly as the simulation tool. The simulation has shown that the overhead resulting by the wiring and communication between the system fragments could be fairly neglected, which makes it applicable to practically reach the maximum speed up offered by the proposed model.
Keywords: Binary Index Tree (BIT), Least Significant Bit (LSB), Parallel Adder (PA), Very High Speed Integrated Circuits HardwareDescription Language (VHDL), Distributed Parallel Computing System(DPCS).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17707826 Clinical Decision Support for Disease Classification based on the Tests Association
Authors: Sung Ho Ha, Seong Hyeon Joo, Eun Kyung Kwon
Abstract:
Until recently, researchers have developed various tools and methodologies for effective clinical decision-making. Among those decisions, chest pain diseases have been one of important diagnostic issues especially in an emergency department. To improve the ability of physicians in diagnosis, many researchers have developed diagnosis intelligence by using machine learning and data mining. However, most of the conventional methodologies have been generally based on a single classifier for disease classification and prediction, which shows moderate performance. This study utilizes an ensemble strategy to combine multiple different classifiers to help physicians diagnose chest pain diseases more accurately than ever. Specifically the ensemble strategy is applied by using the integration of decision trees, neural networks, and support vector machines. The ensemble models are applied to real-world emergency data. This study shows that the performance of the ensemble models is superior to each of single classifiers.Keywords: Diagnosis intelligence, ensemble approach, data mining, emergency department
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16347825 Optimal Placement of DG in Distribution System to Mitigate Power Quality Disturbances
Authors: G.V.K Murthy, S. Sivanagaraju, S. Satyanarayana, B. Hanumantha Rao
Abstract:
Distributed Generation (DG) systems are considered an integral part in future distribution system planning. Appropriate size and location of distributed generation plays a significant role in minimizing power losses in distribution systems. Among the benefits of distributed generation is the reduction in active power losses, which can improve the system performance, reliability and power quality. In this paper, Artificial Bee Colony (ABC) algorithm is proposed to determine the optimal DG-unit size and location by loss sensitivity index in order to minimize the real power loss, total harmonic distortion (THD) and voltage sag index improvement. Simulation study is conducted on 69-bus radial test system to verify the efficacy of the proposed method.
Keywords: Distributed generation, artificial bee colony method, loss reduction, radial distribution network.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 28597824 Conceptualization of the Attractive Work Environment and Organizational Activity for Humans in Future Deep Mines
Authors: M. A. Sanda, B. Johansson, J. Johansson
Abstract:
The purpose of this paper is to conceptualize a futureoriented human work environment and organizational activity in deep mines that entails a vision of good and safe workplace. Futureoriented technological challenges and mental images required for modern work organization design were appraised. It is argued that an intelligent-deep-mine covering the entire value chain, including environmental issues and with work organization that supports good working and social conditions towards increased human productivity could be designed. With such intelligent system and work organization in place, the mining industry could be seen as a place where cooperation, skills development and gender equality are key components. By this perspective, both the youth and women might view mining activity as an attractive job and the work environment as a safe, and this could go a long way in breaking the unequal gender balance that exists in most mines today. Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1653