Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 101

Search results for: decision tree

101 Machine Learning for Aiding Meningitis Diagnosis in Pediatric Patients

Authors: Karina Zaccari, Ernesto Cordeiro Marujo

Abstract:

This paper presents a Machine Learning (ML) approach to support Meningitis diagnosis in patients at a children’s hospital in Sao Paulo, Brazil. The aim is to use ML techniques to reduce the use of invasive procedures, such as cerebrospinal fluid (CSF) collection, as much as possible. In this study, we focus on predicting the probability of Meningitis given the results of a blood and urine laboratory tests, together with the analysis of pain or other complaints from the patient. We tested a number of different ML algorithms, including: Adaptative Boosting (AdaBoost), Decision Tree, Gradient Boosting, K-Nearest Neighbors (KNN), Logistic Regression, Random Forest and Support Vector Machines (SVM). Decision Tree algorithm performed best, with 94.56% and 96.18% accuracy for training and testing data, respectively. These results represent a significant aid to doctors in diagnosing Meningitis as early as possible and in preventing expensive and painful procedures on some children.

Keywords: Machine learning, medical diagnosis, meningitis detection, gradient boosting.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 69
100 Development of Requirements Analysis Tool for Medical Autonomy in Long-Duration Space Exploration Missions

Authors: Lara Dutil-Fafard, Caroline Rhéaume, Patrick Archambault, Daniel Lafond, Neal W. Pollock

Abstract:

Improving resources for medical autonomy of astronauts in prolonged space missions, such as a Mars mission, requires not only technology development, but also decision-making support systems. The Advanced Crew Medical System - Medical Condition Requirements study, funded by the Canadian Space Agency, aimed to create knowledge content and a scenario-based query capability to support medical autonomy of astronauts. The key objective of this study was to create a prototype tool for identifying medical infrastructure requirements in terms of medical knowledge, skills and materials. A multicriteria decision-making method was used to prioritize the highest risk medical events anticipated in a long-term space mission. Starting with those medical conditions, event sequence diagrams (ESDs) were created in the form of decision trees where the entry point is the diagnosis and the end points are the predicted outcomes (full recovery, partial recovery, or death/severe incapacitation). The ESD formalism was adapted to characterize and compare possible outcomes of medical conditions as a function of available medical knowledge, skills, and supplies in a given mission scenario. An extensive literature review was performed and summarized in a medical condition database. A PostgreSQL relational database was created to allow query-based evaluation of health outcome metrics with different medical infrastructure scenarios. Critical decision points, skill and medical supply requirements, and probable health outcomes were compared across chosen scenarios. The three medical conditions with the highest risk rank were acute coronary syndrome, sepsis, and stroke. Our efforts demonstrate the utility of this approach and provide insight into the effort required to develop appropriate content for the range of medical conditions that may arise.

Keywords: Decision support system, event sequence diagram, exploration mission, medical autonomy, scenario-based queries, space medicine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 67
99 EEG-Based Screening Tool for School Student’s Brain Disorders Using Machine Learning Algorithms

Authors: Abdelrahman A. Ramzy, Bassel S. Abdallah, Mohamed E. Bahgat, Sarah M. Abdelkader, Sherif H. ElGohary

Abstract:

Attention-Deficit/Hyperactivity Disorder (ADHD), epilepsy, and autism affect millions of children worldwide, many of which are undiagnosed despite the fact that all of these disorders are detectable in early childhood. Late diagnosis can cause severe problems due to the late treatment and to the misconceptions and lack of awareness as a whole towards these disorders. Moreover, electroencephalography (EEG) has played a vital role in the assessment of neural function in children. Therefore, quantitative EEG measurement will be utilized as a tool for use in the evaluation of patients who may have ADHD, epilepsy, and autism. We propose a screening tool that uses EEG signals and machine learning algorithms to detect these disorders at an early age in an automated manner. The proposed classifiers used with epilepsy as a step taken for the work done so far, provided an accuracy of approximately 97% using SVM, Naïve Bayes and Decision tree, while 98% using KNN, which gives hope for the work yet to be conducted.

Keywords: ADHD, autism, epilepsy, EEG, SVM.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 265
98 Performance Analysis of Search Medical Imaging Service on Cloud Storage Using Decision Trees

Authors: González A. Julio, Ramírez L. Leonardo, Puerta A. Gabriel

Abstract:

Telemedicine services use a large amount of data, most of which are diagnostic images in Digital Imaging and Communications in Medicine (DICOM) and Health Level Seven (HL7) formats. Metadata is generated from each related image to support their identification. This study presents the use of decision trees for the optimization of information search processes for diagnostic images, hosted on the cloud server. To analyze the performance in the server, the following quality of service (QoS) metrics are evaluated: delay, bandwidth, jitter, latency and throughput in five test scenarios for a total of 26 experiments during the loading and downloading of DICOM images, hosted by the telemedicine group server of the Universidad Militar Nueva Granada, Bogotá, Colombia. By applying decision trees as a data mining technique and comparing it with the sequential search, it was possible to evaluate the search times of diagnostic images in the server. The results show that by using the metadata in decision trees, the search times are substantially improved, the computational resources are optimized and the request management of the telemedicine image service is improved. Based on the experiments carried out, search efficiency increased by 45% in relation to the sequential search, given that, when downloading a diagnostic image, false positives are avoided in management and acquisition processes of said information. It is concluded that, for the diagnostic images services in telemedicine, the technique of decision trees guarantees the accessibility and robustness in the acquisition and manipulation of medical images, in improvement of the diagnoses and medical procedures in patients.

Keywords: Cloud storage, decision trees, diagnostic image, search, telemedicine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 244
97 PM10 Prediction and Forecasting Using CART: A Case Study for Pleven, Bulgaria

Authors: Snezhana G. Gocheva-Ilieva, Maya P. Stoimenova

Abstract:

Ambient air pollution with fine particulate matter (PM10) is a systematic permanent problem in many countries around the world. The accumulation of a large number of measurements of both the PM10 concentrations and the accompanying atmospheric factors allow for their statistical modeling to detect dependencies and forecast future pollution. This study applies the classification and regression trees (CART) method for building and analyzing PM10 models. In the empirical study, average daily air data for the city of Pleven, Bulgaria for a period of 5 years are used. Predictors in the models are seven meteorological variables, time variables, as well as lagged PM10 variables and some lagged meteorological variables, delayed by 1 or 2 days with respect to the initial time series, respectively. The degree of influence of the predictors in the models is determined. The selected best CART models are used to forecast future PM10 concentrations for two days ahead after the last date in the modeling procedure and show very accurate results.

Keywords: Cross-validation, decision tree, lagged variables, short-term forecasting.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 241
96 Hybrid Anomaly Detection Using Decision Tree and Support Vector Machine

Authors: Elham Serkani, Hossein Gharaee Garakani, Naser Mohammadzadeh, Elaheh Vaezpour

Abstract:

Intrusion detection systems (IDS) are the main components of network security. These systems analyze the network events for intrusion detection. The design of an IDS is through the training of normal traffic data or attack. The methods of machine learning are the best ways to design IDSs. In the method presented in this article, the pruning algorithm of C5.0 decision tree is being used to reduce the features of traffic data used and training IDS by the least square vector algorithm (LS-SVM). Then, the remaining features are arranged according to the predictor importance criterion. The least important features are eliminated in the order. The remaining features of this stage, which have created the highest level of accuracy in LS-SVM, are selected as the final features. The features obtained, compared to other similar articles which have examined the selected features in the least squared support vector machine model, are better in the accuracy, true positive rate, and false positive. The results are tested by the UNSW-NB15 dataset.

Keywords: Intrusion detection system, decision tree, support vector machine, feature selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 518
95 Evaluating Machine Learning Techniques for Activity Classification in Smart Home Environments

Authors: Talal Alshammari, Nasser Alshammari, Mohamed Sedky, Chris Howard

Abstract:

With the widespread adoption of the Internet-connected devices, and with the prevalence of the Internet of Things (IoT) applications, there is an increased interest in machine learning techniques that can provide useful and interesting services in the smart home domain. The areas that machine learning techniques can help advance are varied and ever-evolving. Classifying smart home inhabitants’ Activities of Daily Living (ADLs), is one prominent example. The ability of machine learning technique to find meaningful spatio-temporal relations of high-dimensional data is an important requirement as well. This paper presents a comparative evaluation of state-of-the-art machine learning techniques to classify ADLs in the smart home domain. Forty-two synthetic datasets and two real-world datasets with multiple inhabitants are used to evaluate and compare the performance of the identified machine learning techniques. Our results show significant performance differences between the evaluated techniques. Such as AdaBoost, Cortical Learning Algorithm (CLA), Decision Trees, Hidden Markov Model (HMM), Multi-layer Perceptron (MLP), Structured Perceptron and Support Vector Machines (SVM). Overall, neural network based techniques have shown superiority over the other tested techniques.

Keywords: Activities of daily living, classification, internet of things, machine learning, smart home.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1064
94 Comparative Evaluation of Accuracy of Selected Machine Learning Classification Techniques for Diagnosis of Cancer: A Data Mining Approach

Authors: Rajvir Kaur, Jeewani Anupama Ginige

Abstract:

With recent trends in Big Data and advancements in Information and Communication Technologies, the healthcare industry is at the stage of its transition from clinician oriented to technology oriented. Many people around the world die of cancer because the diagnosis of disease was not done at an early stage. Nowadays, the computational methods in the form of Machine Learning (ML) are used to develop automated decision support systems that can diagnose cancer with high confidence in a timely manner. This paper aims to carry out the comparative evaluation of a selected set of ML classifiers on two existing datasets: breast cancer and cervical cancer. The ML classifiers compared in this study are Decision Tree (DT), Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Logistic Regression, Ensemble (Bagged Tree) and Artificial Neural Networks (ANN). The evaluation is carried out based on standard evaluation metrics Precision (P), Recall (R), F1-score and Accuracy. The experimental results based on the evaluation metrics show that ANN showed the highest-level accuracy (99.4%) when tested with breast cancer dataset. On the other hand, when these ML classifiers are tested with the cervical cancer dataset, Ensemble (Bagged Tree) technique gave better accuracy (93.1%) in comparison to other classifiers.

Keywords: Artificial neural networks, breast cancer, cancer dataset, classifiers, cervical cancer, F-score, logistic regression, machine learning, precision, recall, support vector machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 791
93 Hybrid Approach for Software Defect Prediction Using Machine Learning with Optimization Technique

Authors: C. Manjula, Lilly Florence

Abstract:

Software technology is developing rapidly which leads to the growth of various industries. Now-a-days, software-based applications have been adopted widely for business purposes. For any software industry, development of reliable software is becoming a challenging task because a faulty software module may be harmful for the growth of industry and business. Hence there is a need to develop techniques which can be used for early prediction of software defects. Due to complexities in manual prediction, automated software defect prediction techniques have been introduced. These techniques are based on the pattern learning from the previous software versions and finding the defects in the current version. These techniques have attracted researchers due to their significant impact on industrial growth by identifying the bugs in software. Based on this, several researches have been carried out but achieving desirable defect prediction performance is still a challenging task. To address this issue, here we present a machine learning based hybrid technique for software defect prediction. First of all, Genetic Algorithm (GA) is presented where an improved fitness function is used for better optimization of features in data sets. Later, these features are processed through Decision Tree (DT) classification model. Finally, an experimental study is presented where results from the proposed GA-DT based hybrid approach is compared with those from the DT classification technique. The results show that the proposed hybrid approach achieves better classification accuracy.

Keywords: Decision tree, genetic algorithm, machine learning, software defect prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 713
92 Development of the Academic Model to Predict Student Success at VUT-FSASEC Using Decision Trees

Authors: Langa Hendrick Musawenkosi, Twala Bhekisipho

Abstract:

The success or failure of students is a concern for every academic institution, college, university, governments and students themselves. Several approaches have been researched to address this concern. In this paper, a view is held that when a student enters a university or college or an academic institution, he or she enters an academic environment. The academic environment is unique concept used to develop the solution for making predictions effectively. This paper presents a model to determine the propensity of a student to succeed or fail in the French South African Schneider Electric Education Center (FSASEC) at the Vaal University of Technology (VUT). The Decision Tree algorithm is used to implement the model at FSASEC.

Keywords: Academic environment model, decision trees, FSASEC, K-nearest neighbor, machine learning, popularity index, support vector machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 536
91 Knowledge Discovery and Data Mining Techniques in Textile Industry

Authors: Filiz Ersoz, Taner Ersoz, Erkin Guler

Abstract:

This paper addresses the issues and technique for textile industry using data mining techniques. Data mining has been applied to the stitching of garments products that were obtained from a textile company. Data mining techniques were applied to the data obtained from the CHAID algorithm, CART algorithm, Regression Analysis and, Artificial Neural Networks. Classification technique based analyses were used while data mining and decision model about the production per person and variables affecting about production were found by this method. In the study, the results show that as the daily working time increases, the production per person also decreases. In addition, the relationship between total daily working and production per person shows a negative result and the production per person show the highest and negative relationship.

Keywords: Data mining, textile production, decision trees, classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 756
90 Load Forecasting in Microgrid Systems with R and Cortana Intelligence Suite

Authors: F. Lazzeri, I. Reiter

Abstract:

Energy production optimization has been traditionally very important for utilities in order to improve resource consumption. However, load forecasting is a challenging task, as there are a large number of relevant variables that must be considered, and several strategies have been used to deal with this complex problem. This is especially true also in microgrids where many elements have to adjust their performance depending on the future generation and consumption conditions. The goal of this paper is to present a solution for short-term load forecasting in microgrids, based on three machine learning experiments developed in R and web services built and deployed with different components of Cortana Intelligence Suite: Azure Machine Learning, a fully managed cloud service that enables to easily build, deploy, and share predictive analytics solutions; SQL database, a Microsoft database service for app developers; and PowerBI, a suite of business analytics tools to analyze data and share insights. Our results show that Boosted Decision Tree and Fast Forest Quantile regression methods can be very useful to predict hourly short-term consumption in microgrids; moreover, we found that for these types of forecasting models, weather data (temperature, wind, humidity and dew point) can play a crucial role in improving the accuracy of the forecasting solution. Data cleaning and feature engineering methods performed in R and different types of machine learning algorithms (Boosted Decision Tree, Fast Forest Quantile and ARIMA) will be presented, and results and performance metrics discussed.

Keywords: Time-series, features engineering methods for forecasting, energy demand forecasting, Azure machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 772
89 An Alternative Approach for Assessing the Impact of Cutting Conditions on Surface Roughness Using Single Decision Tree

Authors: S. Ghorbani, N. I. Polushin

Abstract:

In this study, an approach to identify factors affecting on surface roughness in a machining process is presented. This study is based on 81 data about surface roughness over a wide range of cutting tools (conventional, cutting tool with holes, cutting tool with composite material), workpiece materials (AISI 1045 Steel, AA2024 aluminum alloy, A48-class30 gray cast iron), spindle speed (630-1000 rpm), feed rate (0.05-0.075 mm/rev), depth of cut (0.05-0.15 mm) and tool overhang (41-65 mm). A single decision tree (SDT) analysis was done to identify factors for predicting a model of surface roughness, and the CART algorithm was employed for building and evaluating regression tree. Results show that a single decision tree is better than traditional regression models with higher rate and forecast accuracy and strong value.

Keywords: Cutting condition, surface roughness, decision tree, CART algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 487
88 A Comparison of Single of Decision Tree, Decision Tree Forest and Group Method of Data Handling to Evaluate the Surface Roughness in Machining Process

Authors: S. Ghorbani, N. I. Polushin

Abstract:

The machinability of workpieces (AISI 1045 Steel, AA2024 aluminum alloy, A48-class30 gray cast iron) in turning operation has been carried out using different types of cutting tool (conventional, cutting tool with holes in toolholder and cutting tool filled up with composite material) under dry conditions on a turning machine at different stages of spindle speed (630-1000 rpm), feed rate (0.05-0.075 mm/rev), depth of cut (0.05-0.15 mm) and tool overhang (41-65 mm). Experimentation was performed as per Taguchi’s orthogonal array. To evaluate the relative importance of factors affecting surface roughness the single decision tree (SDT), Decision tree forest (DTF) and Group method of data handling (GMDH) were applied.

Keywords: Decision Tree Forest, GMDH, surface roughness, taguchi method, turning process.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 548
87 Application of Data Mining Techniques for Tourism Knowledge Discovery

Authors: Teklu Urgessa, Wookjae Maeng, Joong Seek Lee

Abstract:

Application of five implementations of three data mining classification techniques was experimented for extracting important insights from tourism data. The aim was to find out the best performing algorithm among the compared ones for tourism knowledge discovery. Knowledge discovery process from data was used as a process model. 10-fold cross validation method is used for testing purpose. Various data preprocessing activities were performed to get the final dataset for model building. Classification models of the selected algorithms were built with different scenarios on the preprocessed dataset. The outperformed algorithm tourism dataset was Random Forest (76%) before applying information gain based attribute selection and J48 (C4.5) (75%) after selection of top relevant attributes to the class (target) attribute. In terms of time for model building, attribute selection improves the efficiency of all algorithms. Artificial Neural Network (multilayer perceptron) showed the highest improvement (90%). The rules extracted from the decision tree model are presented, which showed intricate, non-trivial knowledge/insight that would otherwise not be discovered by simple statistical analysis with mediocre accuracy of the machine using classification algorithms.

Keywords: Classification algorithms; data mining; tourism; knowledge discovery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1694
86 Using Single Decision Tree to Assess the Impact of Cutting Conditions on Vibration

Authors: S. Ghorbani, N. I. Polushin

Abstract:

Vibration during machining process is crucial since it affects cutting tool, machine, and workpiece leading to a tool wear, tool breakage, and an unacceptable surface roughness. This paper applies a nonparametric statistical method, single decision tree (SDT), to identify factors affecting on vibration in machining process. Workpiece material (AISI 1045 Steel, AA2024 Aluminum alloy, A48-class30 Gray Cast Iron), cutting tool (conventional, cutting tool with holes in toolholder, cutting tool filled up with epoxy-granite), tool overhang (41-65 mm), spindle speed (630-1000 rpm), feed rate (0.05-0.075 mm/rev) and depth of cut (0.05-0.15 mm) were used as input variables, while vibration was the output parameter. It is concluded that workpiece material is the most important parameters for natural frequency followed by cutting tool and overhang.

Keywords: Cutting condition, vibration, natural frequency, decision tree, CART algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 893
85 Developing Rice Disease Analysis System on Mobile via iOS Operating System

Authors: Rujijan Vichivanives, Kittiya Poonsilp, Canasanan Wanavijit

Abstract:

This research aims to create mobile tools to analyze rice disease quickly and easily. The principle of object-oriented software engineering and objective-C language were used for software development methodology and the principle of decision tree technique was used for analysis method. Application users can select the features of rice disease or the color appears on the rice leaves for recognition analysis results on iOS mobile screen. After completing the software development, unit testing and integrating testing method were used to check for program validity. In addition, three plant experts and forty farmers have been assessed for usability and benefit of this system. The overall of users’ satisfaction was found in a good level, 57%. The plant experts give a comment on the addition of various disease symptoms in the database for more precise results of the analysis. For further research, it is suggested that image processing system should be developed as a tool that allows users search and analyze for rice diseases more convenient with great accuracy.

Keywords: Rice disease, analysis system, mobile application, iOS operating system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 896
84 Determination of the Bank's Customer Risk Profile: Data Mining Applications

Authors: Taner Ersoz, Filiz Ersoz, Seyma Ozbilge

Abstract:

In this study, the clients who applied to a bank branch for loan were analyzed through data mining. The study was composed of the information such as amounts of loans received by personal and SME clients working with the bank branch, installment numbers, number of delays in loan installments, payments available in other banks and number of banks to which they are in debt between 2010 and 2013. The client risk profile was examined through Classification and Regression Tree (CART) analysis, one of the decision tree classification methods. At the end of the study, 5 different types of customers have been determined on the decision tree. The classification of these types of customers has been created with the rating of those posing a risk for the bank branch and the customers have been classified according to the risk ratings.

Keywords: Client classification, loan suitability, risk rating, CART analysis, decision tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 624
83 A Case-Based Reasoning-Decision Tree Hybrid System for Stock Selection

Authors: Yaojun Wang, Yaoqing Wang

Abstract:

Stock selection is an important decision-making problem. Many machine learning and data mining technologies are employed to build automatic stock-selection system. A profitable stock-selection system should consider the stock’s investment value and the market timing. In this paper, we present a hybrid system including both engage for stock selection. This system uses a case-based reasoning (CBR) model to execute the stock classification, uses a decision-tree model to help with market timing and stock selection. The experiments show that the performance of this hybrid system is better than that of other techniques regarding to the classification accuracy, the average return and the Sharpe ratio.

Keywords: Case-based reasoning, decision tree, stock selection, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1009
82 Breast Cancer Survivability Prediction via Classifier Ensemble

Authors: Mohamed Al-Badrashiny, Abdelghani Bellaachia

Abstract:

This paper presents a classifier ensemble approach for predicting the survivability of the breast cancer patients using the latest database version of the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute. The system consists of two main components; features selection and classifier ensemble components. The features selection component divides the features in SEER database into four groups. After that it tries to find the most important features among the four groups that maximizes the weighted average F-score of a certain classification algorithm. The ensemble component uses three different classifiers, each of which models different set of features from SEER through the features selection module. On top of them, another classifier is used to give the final decision based on the output decisions and confidence scores from each of the underlying classifiers. Different classification algorithms have been examined; the best setup found is by using the decision tree, Bayesian network, and Na¨ıve Bayes algorithms for the underlying classifiers and Na¨ıve Bayes for the classifier ensemble step. The system outperforms all published systems to date when evaluated against the exact same data of SEER (period of 1973-2002). It gives 87.39% weighted average F-score compared to 85.82% and 81.34% of the other published systems. By increasing the data size to cover the whole database (period of 1973-2014), the overall weighted average F-score jumps to 92.4% on the held out unseen test set.

Keywords: Classifier ensemble, breast cancer survivability, data mining, SEER.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1206
81 Predication Model for Leukemia Diseases Based on Data Mining Classification Algorithms with Best Accuracy

Authors: Fahd Sabry Esmail, M. Badr Senousy, Mohamed Ragaie

Abstract:

In recent years, there has been an explosion in the rate of using technology that help discovering the diseases. For example, DNA microarrays allow us for the first time to obtain a "global" view of the cell. It has great potential to provide accurate medical diagnosis, to help in finding the right treatment and cure for many diseases. Various classification algorithms can be applied on such micro-array datasets to devise methods that can predict the occurrence of Leukemia disease. In this study, we compared the classification accuracy and response time among eleven decision tree methods and six rule classifier methods using five performance criteria. The experiment results show that the performance of Random Tree is producing better result. Also it takes lowest time to build model in tree classifier. The classification rules algorithms such as nearest- neighbor-like algorithm (NNge) is the best algorithm due to the high accuracy and it takes lowest time to build model in classification.

Keywords: Data mining, classification techniques, decision tree, classification rule, leukemia diseases, microarray data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1892
80 Performance Comparison of ADTree and Naive Bayes Algorithms for Spam Filtering

Authors: Thanh Nguyen, Andrei Doncescu, Pierre Siegel

Abstract:

Classification is an important data mining technique and could be used as data filtering in artificial intelligence. The broad application of classification for all kind of data leads to be used in nearly every field of our modern life. Classification helps us to put together different items according to the feature items decided as interesting and useful. In this paper, we compare two classification methods Naïve Bayes and ADTree use to detect spam e-mail. This choice is motivated by the fact that Naive Bayes algorithm is based on probability calculus while ADTree algorithm is based on decision tree. The parameter settings of the above classifiers use the maximization of true positive rate and minimization of false positive rate. The experiment results present classification accuracy and cost analysis in view of optimal classifier choice for Spam Detection. It is point out the number of attributes to obtain a tradeoff between number of them and the classification accuracy.

Keywords: Classification, data mining, spam filtering, naive Bayes, decision tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1030
79 Performance Comparison of Situation-Aware Models for Activating Robot Vacuum Cleaner in a Smart Home

Authors: Seongcheol Kwon, Jeongmin Kim, Kwang Ryel Ryu

Abstract:

We assume an IoT-based smart-home environment where the on-off status of each of the electrical appliances including the room lights can be recognized in a real time by monitoring and analyzing the smart meter data. At any moment in such an environment, we can recognize what the household or the user is doing by referring to the status data of the appliances. In this paper, we focus on a smart-home service that is to activate a robot vacuum cleaner at right time by recognizing the user situation, which requires a situation-aware model that can distinguish the situations that allow vacuum cleaning (Yes) from those that do not (No). We learn as our candidate models a few classifiers such as naïve Bayes, decision tree, and logistic regression that can map the appliance-status data into Yes and No situations. Our training and test data are obtained from simulations of user behaviors, in which a sequence of user situations such as cooking, eating, dish washing, and so on is generated with the status of the relevant appliances changed in accordance with the situation changes. During the simulation, both the situation transition and the resulting appliance status are determined stochastically. To compare the performances of the aforementioned classifiers we obtain their learning curves for different types of users through simulations. The result of our empirical study reveals that naïve Bayes achieves a slightly better classification accuracy than the other compared classifiers.

Keywords: Situation-awareness, Smart home, IoT, Machine learning, Classifier.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1263
78 Patient-Specific Modeling Algorithm for Medical Data Based on AUC

Authors: Guilherme Ribeiro, Alexandre Oliveira, Antonio Ferreira, Shyam Visweswaran, Gregory Cooper

Abstract:

Patient-specific models are instance-based learning algorithms that take advantage of the particular features of the patient case at hand to predict an outcome. We introduce two patient-specific algorithms based on decision tree paradigm that use AUC as a metric to select an attribute. We apply the patient specific algorithms to predict outcomes in several datasets, including medical datasets. Compared to the patient-specific decision path (PSDP) entropy-based and CART methods, the AUC-based patient-specific decision path models performed equivalently on area under the ROC curve (AUC). Our results provide support for patient-specific methods being a promising approach for making clinical predictions.

Keywords: Approach instance-based, area Under the ROC Curve, Patient-specific Decision Path, clinical predictions.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1138
77 Performance Analysis of Artificial Neural Network with Decision Tree in Prediction of Diabetes Mellitus

Authors: J. K. Alhassan, B. Attah, S. Misra

Abstract:

Human beings have the ability to make logical decisions. Although human decision - making is often optimal, it is insufficient when huge amount of data is to be classified. Medical dataset is a vital ingredient used in predicting patient’s health condition. In other to have the best prediction, there calls for most suitable machine learning algorithms. This work compared the performance of Artificial Neural Network (ANN) and Decision Tree Algorithms (DTA) as regards to some performance metrics using diabetes data. WEKA software was used for the implementation of the algorithms. Multilayer Perceptron (MLP) and Radial Basis Function (RBF) were the two algorithms used for ANN, while RegTree and LADTree algorithms were the DTA models used. From the results obtained, DTA performed better than ANN. The Root Mean Squared Error (RMSE) of MLP is 0.3913 that of RBF is 0.3625, that of RepTree is 0.3174 and that of LADTree is 0.3206 respectively.

Keywords: Artificial neural network, classification, decision tree, diabetes mellitus.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2052
76 Calcification Classification in Mammograms Using Decision Trees

Authors: S. Usha, S. Arumugam

Abstract:

Cancer affects people globally with breast cancer being a leading killer. Breast cancer is due to the uncontrollable multiplication of cells resulting in a tumour or neoplasm. Tumours are called ‘benign’ when cancerous cells do not ravage other body tissues and ‘malignant’ if they do so. As mammography is an effective breast cancer detection tool at an early stage which is the most treatable stage it is the primary imaging modality for screening and diagnosis of this cancer type. This paper presents an automatic mammogram classification technique using wavelet and Gabor filter. Correlation feature selection is used to reduce the feature set and selected features are classified using different decision trees.

Keywords: Breast Cancer, Mammogram, Symlet Wavelets, Gabor Filters, Decision Trees

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1115
75 A New DIDS Design Based on a Combination Feature Selection Approach

Authors: Adel Sabry Eesa, Adnan Mohsin Abdulazeez Brifcani, Zeynep Orman

Abstract:

Feature selection has been used in many fields such as classification, data mining and object recognition and proven to be effective for removing irrelevant and redundant features from the original dataset. In this paper, a new design of distributed intrusion detection system using a combination feature selection model based on bees and decision tree. Bees algorithm is used as the search strategy to find the optimal subset of features, whereas decision tree is used as a judgment for the selected features. Both the produced features and the generated rules are used by Decision Making Mobile Agent to decide whether there is an attack or not in the networks. Decision Making Mobile Agent will migrate through the networks, moving from node to another, if it found that there is an attack on one of the nodes, it then alerts the user through User Interface Agent or takes some action through Action Mobile Agent. The KDD Cup 99 dataset is used to test the effectiveness of the proposed system. The results show that even if only four features are used, the proposed system gives a better performance when it is compared with the obtained results using all 41 features.

Keywords: Distributed intrusion detection system, mobile agent, feature selection, Bees Algorithm, decision tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1562
74 Using Data Mining Technique for Scholarship Disbursement

Authors: J. K. Alhassan, S. A. Lawal

Abstract:

This work is on decision tree-based classification for the disbursement of scholarship. Tree-based data mining classification technique is used in other to determine the generic rule to be used to disburse the scholarship. The system based on the defined rules from the tree is able to determine the class (status) to which an applicant shall belong whether Granted or Not Granted. The applicants that fall to the class of granted denote a successful acquirement of scholarship while those in not granted class are unsuccessful in the scheme. An algorithm that can be used to classify the applicants based on the rules from tree-based classification was also developed. The tree-based classification is adopted because of its efficiency, effectiveness, and easy to comprehend features. The system was tested with the data of National Information Technology Development Agency (NITDA) Abuja, a Parastatal of Federal Ministry of Communication Technology that is mandated to develop and regulate information technology in Nigeria. The system was found working according to the specification. It is therefore recommended for all scholarship disbursement organizations.

Keywords: Decision tree, classification, data mining, scholarship.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1626
73 Classification of Political Affiliations by Reduced Number of Features

Authors: Vesile Evrim, Aliyu Awwal

Abstract:

By the evolvement in technology, the way of expressing opinions switched direction to the digital world. The domain of politics, as one of the hottest topics of opinion mining research, merged together with the behavior analysis for affiliation determination in texts, which constitutes the subject of this paper. This study aims to classify the text in news/blogs either as Republican or Democrat with the minimum number of features. As an initial set, 68 features which 64 were constituted by Linguistic Inquiry and Word Count (LIWC) features were tested against 14 benchmark classification algorithms. In the later experiments, the dimensions of the feature vector reduced based on the 7 feature selection algorithms. The results show that the “Decision Tree”, “Rule Induction” and “M5 Rule” classifiers when used with “SVM” and “IGR” feature selection algorithms performed the best up to 82.5% accuracy on a given dataset. Further tests on a single feature and the linguistic based feature sets showed the similar results. The feature “Function”, as an aggregate feature of the linguistic category, was found as the most differentiating feature among the 68 features with the accuracy of 81% in classifying articles either as Republican or Democrat.

Keywords: Politics, machine learning, feature selection, LIWC.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1542
72 A Comparative Study of Malware Detection Techniques Using Machine Learning Methods

Authors: Cristina Vatamanu, Doina Cosovan, Dragoş Gavriluţ, Henri Luchian

Abstract:

In the past few years, the amount of malicious software increased exponentially and, therefore, machine learning algorithms became instrumental in identifying clean and malware files through (semi)-automated classification. When working with very large datasets, the major challenge is to reach both a very high malware detection rate and a very low false positive rate. Another challenge is to minimize the time needed for the machine learning algorithm to do so. This paper presents a comparative study between different machine learning techniques such as linear classifiers, ensembles, decision trees or various hybrids thereof. The training dataset consists of approximately 2 million clean files and 200.000 infected files, which is a realistic quantitative mixture. The paper investigates the above mentioned methods with respect to both their performance (detection rate and false positive rate) and their practicability.

Keywords: Detection Rate, False Positives, Perceptron, One Side Class, Ensembles, Decision Tree, Hybrid methods, Feature Selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2668