Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 4777

Search results for: algorithms decision tree

4777 Performance Analysis of Artificial Neural Network with Decision Tree in Prediction of Diabetes Mellitus

Authors: J. K. Alhassan, B. Attah, S. Misra

Abstract:

Human beings have the ability to make logical decisions. Although human decision - making is often optimal, it is insufficient when huge amount of data is to be classified. medical dataset is a vital ingredient used in predicting patients health condition. In other to have the best prediction, there calls for most suitable machine learning algorithms. This work compared the performance of Artificial Neural Network (ANN) and Decision Tree Algorithms (DTA) as regards to some performance metrics using diabetes data. The evaluations was done using weka software and found out that DTA performed better than ANN. Multilayer Perceptron (MLP) and Radial Basis Function (RBF) were the two algorithms used for ANN, while RegTree and LADTree algorithms were the DTA models used. The Root Mean Squared Error (RMSE) of MLP is 0.3913,that of RBF is 0.3625, that of RepTree is 0.3174 and that of LADTree is 0.3206 respectively.

Keywords: artificial neural network, classification, decision tree algorithms, diabetes mellitus

Procedia PDF Downloads 325
4776 Understanding Farmers’ Perceptions Towards Agrivoltaics Using Decision Tree Algorithms

Authors: Mayuri Roy Choudhury

Abstract:

In recent times the concept of agrivoltaics has gained popularity due to the dual use of land and the added value provided by photovoltaics in terms of renewable energy and crop production on farms. However, the transition towards agrivoltaics has been slow, and our research tries to investigate the obstacles leading towards the slow progress of agrivoltaics. We applied data science decision tree algorithms to quantify qualitative perceptions of farmers in the United States for agrivoltaics. To date, there has not been much research that mentions farmers' perceptions, as most of the research focuses on the benefits of agrivoltaics. Our study adds value by putting forward the voices of farmers, which play a crucial towards the transition to agrivoltaics in the future. Our results show a mixture of responses in favor of agrivoltaics. Furthermore, it also portrays significant concerns of farmers, which is useful for decision-makers when it comes to formulating policies for agrivoltaics.

Keywords: agrivoltaics, decision-tree algorithms, farmers perception, transition

Procedia PDF Downloads 113
4775 Data Mining Algorithms Analysis: Case Study of Price Predictions of Lands

Authors: Julio Albuja, David Zaldumbide

Abstract:

Data analysis is an important step before taking a decision about money. The aim of this work is to analyze the factors that influence the final price of the houses through data mining algorithms. To our best knowledge, previous work was researched just to compare results. Furthermore, before using the data of the data set, the Z-Transformation were used to standardize the data in the same range. Hence, the data was classified into two groups to visualize them in a readability format. A decision tree was built, and graphical data is displayed where clearly is easy to see the results and the factors' influence in these graphics. The definitions of these methods are described, as well as the descriptions of the results. Finally, conclusions and recommendations are presented related to the released results that our research showed making it easier to apply these algorithms using a customized data set.

Keywords: algorithms, data, decision tree, transformation

Procedia PDF Downloads 199
4774 Complex Decision Rules in the Form of Decision Trees

Authors: Avinash S. Jagtap, Sharad D. Gore, Rajendra G. Gurao

Abstract:

Decision rules become more and more complex as the number of conditions increase. As a consequence, the complexity of the decision rule also influences the time complexity of computer implementation of such a rule. Consider, for example, a decision that depends on four conditions A, B, C and D. For simplicity, suppose each of these four conditions is binary. Even then the decision rule will consist of 16 lines, where each line will be of the form: If A and B and C and D, then action 1. If A and B and C but not D, then action 2 and so on. While executing this decision rule, each of the four conditions will be checked every time until all the four conditions in a line are satisfied. The minimum number of logical comparisons is 4 whereas the maximum number is 64. This paper proposes to present a complex decision rule in the form of a decision tree. A decision tree divides the cases into branches every time a condition is checked. In the form of a decision tree, every branching eliminates half of the cases that do not satisfy the related conditions. As a result, every branch of the decision tree involves only four logical comparisons and hence is significantly simpler than the corresponding complex decision rule. The conclusion of this paper is that every complex decision rule can be represented as a decision tree and the decision tree is mathematically equivalent but computationally much simpler than the original complex decision rule

Keywords: strategic, tactical, operational, adaptive, innovative

Procedia PDF Downloads 159
4773 Amharic Text News Classification Using Supervised Learning

Authors: Misrak Assefa

Abstract:

The Amharic language is the second most widely spoken Semitic language in the world. There are several new overloaded on the web. Searching some useful documents from the web on a specific topic, which is written in the Amharic language, is a challenging task. Hence, document categorization is required for managing and filtering important information. In the classification of Amharic text news, there is still a gap in the domain of information that needs to be launch. This study attempts to design an automatic Amharic news classification using a supervised learning mechanism on four un-touch classes. To achieve this research, 4,182 news articles were used. Naive Bayes (NB) and Decision tree (j48) algorithms were used to classify the given Amharic dataset. In this paper, k-fold cross-validation is used to estimate the accuracy of the classifier. As a result, it shows those algorithms can be applicable in Amharic news categorization. The best average accuracy result is achieved by j48 decision tree and naïve Bayes is 95.2345 %, and 94.6245 % respectively using three categories. This research indicated that a typical decision tree algorithm is more applicable to Amharic news categorization.

Keywords: text categorization, supervised machine learning, naive Bayes, decision tree

Procedia PDF Downloads 78
4772 A Novel PSO Based Decision Tree Classification

Authors: Ali Farzan

Abstract:

Classification of data objects or patterns is a major part in most of Decision making systems. One of the popular and commonly used classification methods is Decision Tree (DT). It is a hierarchical decision making system by which a binary tree is constructed and starting from root, at each node some of the classes is rejected until reaching the leaf nods. Each leaf node is a representative of one specific class. Finding the splitting criteria in each node for constructing or training the tree is a major problem. Particle Swarm Optimization (PSO) has been adopted as a metaheuristic searching method for finding the best splitting criteria. Result of evaluating the proposed method over benchmark datasets indicates the higher accuracy of the new PSO based decision tree.

Keywords: decision tree, particle swarm optimization, splitting criteria, metaheuristic

Procedia PDF Downloads 325
4771 Predication Model for Leukemia Diseases Based on Data Mining Classification Algorithms with Best Accuracy

Authors: Fahd Sabry Esmail, M. Badr Senousy, Mohamed Ragaie

Abstract:

In recent years, there has been an explosion in the rate of using technology that help discovering the diseases. For example, DNA microarrays allow us for the first time to obtain a "global" view of the cell. It has great potential to provide accurate medical diagnosis, to help in finding the right treatment and cure for many diseases. Various classification algorithms can be applied on such micro-array datasets to devise methods that can predict the occurrence of Leukemia disease. In this study, we compared the classification accuracy and response time among eleven decision tree methods and six rule classifier methods using five performance criteria. The experiment results show that the performance of Random Tree is producing better result. Also it takes lowest time to build model in tree classifier. The classification rules algorithms such as nearest- neighbor-like algorithm (NNge) is the best algorithm due to the high accuracy and it takes lowest time to build model in classification.

Keywords: data mining, classification techniques, decision tree, classification rule, leukemia diseases, microarray data

Procedia PDF Downloads 237
4770 Decision Tree Based Scheduling for Flexible Job Shops with Multiple Process Plans

Authors: H.-H. Doh, J.-M. Yu, Y.-J. Kwon, J.-H. Shin, H.-W. Kim, S.-H. Nam, D.-H. Lee

Abstract:

This paper suggests a decision tree based approach for flexible job shop scheduling with multiple process plans, i. e. each job can be processed through alternative operations, each of which can be processed on alternative machines. The main decision variables are: (a) selecting operation/machine pair; and (b) sequencing the jobs assigned to each machine. As an extension of the priority scheduling approach that selects the best priority rule combination after many simulation runs, this study suggests a decision tree based approach in which a decision tree is used to select a priority rule combination adequate for a specific system state and hence the burdens required for developing simulation models and carrying out simulation runs can be eliminated. The decision tree based scheduling approach consists of construction and scheduling modules. In the construction module, a decision tree is constructed using a four-stage algorithm, and in the scheduling module, a priority rule combination is selected using the decision tree. To show the performance of the decision tree based approach suggested in this study, a case study was done on a flexible job shop with reconfigurable manufacturing cells and a conventional job shop, and the results are reported by comparing it with individual priority rule combinations for the objectives of minimizing total flow time and total tardiness.

Keywords: flexible job shop scheduling, decision tree, priority rules, case study

Procedia PDF Downloads 220
4769 Model for Introducing Products to New Customers through Decision Tree Using Algorithm C4.5 (J-48)

Authors: Komol Phaisarn, Anuphan Suttimarn, Vitchanan Keawtong, Kittisak Thongyoun, Chaiyos Jamsawang

Abstract:

This article is intended to analyze insurance information which contains information on the customer decision when purchasing life insurance pay package. The data were analyzed in order to present new customers with Life Insurance Perfect Pay package to meet new customers’ needs as much as possible. The basic data of insurance pay package were collect to get data mining; thus, reducing the scattering of information. The data were then classified in order to get decision model or decision tree using Algorithm C4.5 (J-48). In the classification, WEKA tools are used to form the model and testing datasets are used to test the decision tree for the accurate decision. The validation of this model in classifying showed that the accurate prediction was 68.43% while 31.25% were errors. The same set of data were then tested with other models, i.e. Naive Bayes and Zero R. The results showed that J-48 method could predict more accurately. So, the researcher applied the decision tree in writing the program used to introduce the product to new customers to persuade customers’ decision making in purchasing the insurance package that meets the new customers’ needs as much as possible.

Keywords: decision tree, data mining, customers, life insurance pay package

Procedia PDF Downloads 310
4768 The Best Prediction Data Mining Model for Breast Cancer Probability in Women Residents in Kabul

Authors: Mina Jafari, Kobra Hamraee, Saied Hossein Hosseini

Abstract:

The prediction of breast cancer disease is one of the challenges in medicine. In this paper we collected 528 records of women’s information who live in Kabul including demographic, life style, diet and pregnancy data. There are many classification algorithm in breast cancer prediction and tried to find the best model with most accurate result and lowest error rate. We evaluated some other common supervised algorithms in data mining to find the best model in prediction of breast cancer disease among afghan women living in Kabul regarding to momography result as target variable. For evaluating these algorithms we used Cross Validation which is an assured method for measuring the performance of models. After comparing error rate and accuracy of three models: Decision Tree, Naive Bays and Rule Induction, Decision Tree with accuracy of 94.06% and error rate of %15 is found the best model to predicting breast cancer disease based on the health care records.

Keywords: decision tree, breast cancer, probability, data mining

Procedia PDF Downloads 48
4767 Enunciation on Complexities of Selected Tree Searching Algorithms

Authors: Parag Bhalchandra, S. D. Khamitkar

Abstract:

Searching trees is a most interesting application of Artificial Intelligence. Over the period of time, many innovative methods have been evolved to better search trees with respect to computational complexities. Tree searches are difficult to understand due to the exponential growth of possibilities when increasing the number of nodes or levels in the tree. Usually it is understood when we traverse down in the tree, traverse down to greater depth, in the search of a solution or a goal. However, this does not happen in reality as explicit enumeration is not a very efficient method and there are many algorithmic speedups that will find the optimal solution without the burden of evaluating all possible trees. It was a common question before all researchers where they often wonder what algorithms will yield the best and fastest result The intention of this paper is two folds, one to review selected tree search algorithms and search strategies that can be applied to a problem space and the second objective is to stimulate to implement recent developments in the complexity behavior of search strategies. The algorithms discussed here apply in general to both brute force and heuristic searches.

Keywords: trees search, asymptotic complexity, brute force, heuristics algorithms

Procedia PDF Downloads 234
4766 Artificial Neural Networks with Decision Trees for Diagnosis Issues

Authors: Y. Kourd, D. Lefebvre, N. Guersi

Abstract:

This paper presents a new idea for fault detection and isolation (FDI) technique which is applied to industrial system. This technique is based on Neural Networks fault-free and Faulty behaviors Models (NNFM's). NNFM's are used for residual generation, while decision tree architecture is used for residual evaluation. The decision tree is realized with data collected from the NNFM’s outputs and is used to isolate detectable faults depending on computed threshold. Each part of the tree corresponds to specific residual. With the decision tree, it becomes possible to take the appropriate decision regarding the actual process behavior by evaluating few numbers of residuals. In comparison to usual systematic evaluation of all residuals, the proposed technique requires less computational effort and can be used for on line diagnosis. An application example is presented to illustrate and confirm the effectiveness and the accuracy of the proposed approach.

Keywords: neural networks, decision trees, diagnosis, behaviors

Procedia PDF Downloads 389
4765 A Comparison of Single of Decision Tree, Decision Tree Forest and Group Method of Data Handling to Evaluate the Surface Roughness in Machining Process

Authors: S. Ghorbani, N. I. Polushin

Abstract:

The machinability of workpieces (AISI 1045 Steel, AA2024 aluminum alloy, A48-class30 gray cast iron) in turning operation has been carried out using different types of cutting tool (conventional, cutting tool with holes in toolholder and cutting tool filled up with composite material) under dry conditions on a turning machine at different stages of spindle speed (630-1000 rpm), feed rate (0.05-0.075 mm/rev), depth of cut (0.05-0.15 mm) and tool overhang (41-65 mm). Experimentation was performed as per Taguchi’s orthogonal array. To evaluate the relative importance of factors affecting surface roughness the single decision tree (SDT), Decision tree forest (DTF) and Group method of data handling (GMDH) were applied.

Keywords: decision tree forest, GMDH, surface roughness, Taguchi method, turning process

Procedia PDF Downloads 359
4764 Patient-Specific Modeling Algorithm for Medical Data Based on AUC

Authors: Guilherme Ribeiro, Alexandre Oliveira, Antonio Ferreira, Shyam Visweswaran, Gregory Cooper

Abstract:

Patient-specific models are instance-based learning algorithms that take advantage of the particular features of the patient case at hand to predict an outcome. We introduce two patient-specific algorithms based on decision tree paradigm that use AUC as a metric to select an attribute. We apply the patient specific algorithms to predict outcomes in several datasets, including medical datasets. Compared to the patient-specific decision path (PSDP) entropy-based and CART methods, the AUC-based patient-specific decision path models performed equivalently on area under the ROC curve (AUC). Our results provide support for patient-specific methods being a promising approach for making clinical predictions.

Keywords: approach instance-based, area under the ROC curve, patient-specific decision path, clinical predictions

Procedia PDF Downloads 393
4763 Semi-Supervised Hierarchical Clustering Given a Reference Tree of Labeled Documents

Authors: Ying Zhao, Xingyan Bin

Abstract:

Semi-supervised clustering algorithms have been shown effective to improve clustering process with even limited supervision. However, semi-supervised hierarchical clustering remains challenging due to the complexities of expressing constraints for agglomerative clustering algorithms. This paper proposes novel semi-supervised agglomerative clustering algorithms to build a hierarchy based on a known reference tree. We prove that by enforcing distance constraints defined by a reference tree during the process of hierarchical clustering, the resultant tree is guaranteed to be consistent with the reference tree. We also propose a framework that allows the hierarchical tree generation be aware of levels of levels of the agglomerative tree under creation, so that metric weights can be learned and adopted at each level in a recursive fashion. The experimental evaluation shows that the additional cost of our contraint-based semi-supervised hierarchical clustering algorithm (HAC) is negligible, and our combined semi-supervised HAC algorithm outperforms the state-of-the-art algorithms on real-world datasets. The experiments also show that our proposed methods can improve clustering performance even with a small number of unevenly distributed labeled data.

Keywords: semi-supervised clustering, hierarchical agglomerative clustering, reference trees, distance constraints

Procedia PDF Downloads 431
4762 Data Mining in Medicine Domain Using Decision Trees and Vector Support Machine

Authors: Djamila Benhaddouche, Abdelkader Benyettou

Abstract:

In this paper, we used data mining to extract biomedical knowledge. In general, complex biomedical data collected in studies of populations are treated by statistical methods, although they are robust, they are not sufficient in themselves to harness the potential wealth of data. For that you used in step two learning algorithms: the Decision Trees and Support Vector Machine (SVM). These supervised classification methods are used to make the diagnosis of thyroid disease. In this context, we propose to promote the study and use of symbolic data mining techniques.

Keywords: biomedical data, learning, classifier, algorithms decision tree, knowledge extraction

Procedia PDF Downloads 446
4761 An Alternative Approach for Assessing the Impact of Cutting Conditions on Surface Roughness Using Single Decision Tree

Authors: S. Ghorbani, N. I. Polushin

Abstract:

In this study, an approach to identify factors affecting on surface roughness in a machining process is presented. This study is based on 81 data about surface roughness over a wide range of cutting tools (conventional, cutting tool with holes, cutting tool with composite material), workpiece materials (AISI 1045 Steel, AA2024 aluminum alloy, A48-class30 gray cast iron), spindle speed (630-1000 rpm), feed rate (0.05-0.075 mm/rev), depth of cut (0.05-0.15 mm) and tool overhang (41-65 mm). A single decision tree (SDT) analysis was done to identify factors for predicting a model of surface roughness, and the CART algorithm was employed for building and evaluating regression tree. Results show that a single decision tree is better than traditional regression models with higher rate and forecast accuracy and strong value.

Keywords: cutting condition, surface roughness, decision tree, CART algorithm

Procedia PDF Downloads 300
4760 Machine Learning Predictive Models for Hydroponic Systems: A Case Study Nutrient Film Technique and Deep Flow Technique

Authors: Kritiyaporn Kunsook

Abstract:

Machine learning algorithms (MLAs) such us artificial neural networks (ANNs), decision tree, support vector machines (SVMs), Naïve Bayes, and ensemble classifier by voting are powerful data driven methods that are relatively less widely used in the mapping of technique of system, and thus have not been comparatively evaluated together thoroughly in this field. The performances of a series of MLAs, ANNs, decision tree, SVMs, Naïve Bayes, and ensemble classifier by voting in technique of hydroponic systems prospectively modeling are compared based on the accuracy of each model. Classification of hydroponic systems only covers the test samples from vegetables grown with Nutrient film technique (NFT) and Deep flow technique (DFT). The feature, which are the characteristics of vegetables compose harvesting height width, temperature, require light and color. The results indicate that the classification performance of the ANNs is 98%, decision tree is 98%, SVMs is 97.33%, Naïve Bayes is 96.67%, and ensemble classifier by voting is 98.96% algorithm respectively.

Keywords: artificial neural networks, decision tree, support vector machines, naïve Bayes, ensemble classifier by voting

Procedia PDF Downloads 219
4759 Determining of the Performance of Data Mining Algorithm Determining the Influential Factors and Prediction of Ischemic Stroke: A Comparative Study in the Southeast of Iran

Authors: Y. Mehdipour, S. Ebrahimi, A. Jahanpour, F. Seyedzaei, B. Sabayan, A. Karimi, H. Amirifard

Abstract:

Ischemic stroke is one of the common reasons for disability and mortality. The fourth leading cause of death in the world and the third in some other sources. Only 1/3 of the patients with ischemic stroke fully recover, 1/3 of them end in permanent disability and 1/3 face death. Thus, the use of predictive models to predict stroke has a vital role in reducing the complications and costs related to this disease. Thus, the aim of this study was to specify the effective factors and predict ischemic stroke with the help of DM methods. The present study was a descriptive-analytic study. The population was 213 cases from among patients referring to Ali ibn Abi Talib (AS) Hospital in Zahedan. Data collection tool was a checklist with the validity and reliability confirmed. This study used DM algorithms of decision tree for modeling. Data analysis was performed using SPSS-19 and SPSS Modeler 14.2. The results of the comparison of algorithms showed that CHAID algorithm with 95.7% accuracy has the best performance. Moreover, based on the model created, factors such as anemia, diabetes mellitus, hyperlipidemia, transient ischemic attacks, coronary artery disease, and atherosclerosis are the most effective factors in stroke. Decision tree algorithms, especially CHAID algorithm, have acceptable precision and predictive ability to determine the factors affecting ischemic stroke. Thus, by creating predictive models through this algorithm, will play a significant role in decreasing the mortality and disability caused by ischemic stroke.

Keywords: data mining, ischemic stroke, decision tree, Bayesian network

Procedia PDF Downloads 98
4758 Heart Failure Identification and Progression by Classifying Cardiac Patients

Authors: Muhammad Saqlain, Nazar Abbas Saqib, Muazzam A. Khan

Abstract:

Heart Failure (HF) has become the major health problem in our society. The prevalence of HF has increased as the patient’s ages and it is the major cause of the high mortality rate in adults. A successful identification and progression of HF can be helpful to reduce the individual and social burden from this syndrome. In this study, we use a real data set of cardiac patients to propose a classification model for the identification and progression of HF. The data set has divided into three age groups, namely young, adult, and old and then each age group have further classified into four classes according to patient’s current physical condition. Contemporary Data Mining classification algorithms have been applied to each individual class of every age group to identify the HF. Decision Tree (DT) gives the highest accuracy of 90% and outperform all other algorithms. Our model accurately diagnoses different stages of HF for each age group and it can be very useful for the early prediction of HF.

Keywords: decision tree, heart failure, data mining, classification model

Procedia PDF Downloads 279
4757 Feature Weighting Comparison Based on Clustering Centers in the Detection of Diabetic Retinopathy

Authors: Kemal Polat

Abstract:

In this paper, three feature weighting methods have been used to improve the classification performance of diabetic retinopathy (DR). To classify the diabetic retinopathy, features extracted from the output of several retinal image processing algorithms, such as image-level, lesion-specific and anatomical components, have been used and fed them into the classifier algorithms. The dataset used in this study has been taken from University of California, Irvine (UCI) machine learning repository. Feature weighting methods including the fuzzy c-means clustering based feature weighting, subtractive clustering based feature weighting, and Gaussian mixture clustering based feature weighting, have been used and compered with each other in the classification of DR. After feature weighting, five different classifier algorithms comprising multi-layer perceptron (MLP), k- nearest neighbor (k-NN), decision tree, support vector machine (SVM), and Naïve Bayes have been used. The hybrid method based on combination of subtractive clustering based feature weighting and decision tree classifier has been obtained the classification accuracy of 100% in the screening of DR. These results have demonstrated that the proposed hybrid scheme is very promising in the medical data set classification.

Keywords: machine learning, data weighting, classification, data mining

Procedia PDF Downloads 249
4756 Angular-Coordinate Driven Radial Tree Drawing

Authors: Farshad Ghassemi Toosi, Nikola S. Nikolov

Abstract:

We present a visualization technique for radial drawing of trees consisting of two slightly different algorithms. Both of them make use of node-link diagrams for visual encoding. This visualization creates clear drawings without edge crossing. One of the algorithms is suitable for real-time visualization of large trees, as it requires minimal recalculation of the layout if leaves are inserted or removed from the tree; while the other algorithm makes better utilization of the drawing space. The algorithms are very similar and follow almost the same procedure but with different parameters. Both algorithms assign angular coordinates for all nodes which are then converted into 2D Cartesian coordinates for visualization. We present both algorithms and discuss how they compare to each other.

Keywords: Radial drawing, Visualization, Algorithm, Use of node-link diagrams

Procedia PDF Downloads 251
4755 Experimental Evaluation of Succinct Ternary Tree

Authors: Dmitriy Kuptsov

Abstract:

Tree data structures, such as binary or in general k-ary trees, are essential in computer science. The applications of these data structures can range from data search and retrieval to sorting and ranking algorithms. Naive implementations of these data structures can consume prohibitively large volumes of random access memory limiting their applicability in certain solutions. Thus, in these cases, more advanced representation of these data structures is essential. In this paper we present the design of the compact version of ternary tree data structure and demonstrate the results for the experimental evaluation using static dictionary problem. We compare these results with the results for binary and regular ternary trees. The conducted evaluation study shows that our design, in the best case, consumes up to 12 times less memory (for the dictionary used in our experimental evaluation) than a regular ternary tree and in certain configuration shows performance comparable to regular ternary trees. We have evaluated the performance of the algorithms using both 32 and 64 bit operating systems.

Keywords: algorithms, data structures, succinct ternary tree, per- formance evaluation

Procedia PDF Downloads 64
4754 Real-Time Classification of Marbles with Decision-Tree Method

Authors: K. S. Parlak, E. Turan

Abstract:

The separation of marbles according to the pattern quality is a process made according to expert decision. The classification phase is the most critical part in terms of economic value. In this study, a self-learning system is proposed which performs the classification of marbles quickly and with high success. This system performs ten feature extraction by taking ten marble images from the camera. The marbles are classified by decision tree method using the obtained properties. The user forms the training set by training the system at the marble classification stage. The system evolves itself in every marble image that is classified. The aim of the proposed system is to minimize the error caused by the person performing the classification and achieve it quickly.

Keywords: decision tree, feature extraction, k-means clustering, marble classification

Procedia PDF Downloads 287
4753 Analysis on Thermococcus achaeans with Frequent Pattern Mining

Authors: Jeongyeob Hong, Myeonghoon Park, Taeson Yoon

Abstract:

After the advent of Achaeans which utilize different metabolism pathway and contain conspicuously different cellular structure, they have been recognized as possible materials for developing quality of human beings. Among diverse Achaeans, in this paper, we compared 16s RNA Sequences of four different species of Thermococcus: Achaeans genus specialized in sulfur-dealing metabolism. Four Species, Barophilus, Kodakarensis, Hydrothermalis, and Onnurineus, live near the hydrothermal vent that emits extreme amount of sulfur and heat. By comparing ribosomal sequences of aforementioned four species, we found similarities in their sequences and expressed protein, enabling us to expect that certain ribosomal sequence or proteins are vital for their survival. Apriori algorithms and Decision Tree were used. for comparison.

Keywords: Achaeans, Thermococcus, apriori algorithm, decision tree

Procedia PDF Downloads 220
4752 Survey on Big Data Stream Classification by Decision Tree

Authors: Mansoureh Ghiasabadi Farahani, Samira Kalantary, Sara Taghi-Pour, Mahboubeh Shamsi

Abstract:

Nowadays, the development of computers technology and its recent applications provide access to new types of data, which have not been considered by the traditional data analysts. Two particularly interesting characteristics of such data sets include their huge size and streaming nature .Incremental learning techniques have been used extensively to address the data stream classification problem. This paper presents a concise survey on the obstacles and the requirements issues classifying data streams with using decision tree. The most important issue is to maintain a balance between accuracy and efficiency, the algorithm should provide good classification performance with a reasonable time response.

Keywords: big data, data streams, classification, decision tree

Procedia PDF Downloads 411
4751 Scalable Learning of Tree-Based Models on Sparsely Representable Data

Authors: Fares Hedayatit, Arnauld Joly, Panagiotis Papadimitriou

Abstract:

Many machine learning tasks such as text annotation usually require training over very big datasets, e.g., millions of web documents, that can be represented in a sparse input space. State-of the-art tree-based ensemble algorithms cannot scale to such datasets, since they include operations whose running time is a function of the input space size rather than a function of the non-zero input elements. In this paper, we propose an efficient splitting algorithm to leverage input sparsity within decision tree methods. Our algorithm improves training time over sparse datasets by more than two orders of magnitude and it has been incorporated in the current version of scikit-learn.org, the most popular open source Python machine learning library.

Keywords: big data, sparsely representable data, tree-based models, scalable learning

Procedia PDF Downloads 182
4750 Sexual Crime Prediction in an African Context

Authors: Olayemi Success Falope, Surendra Thakur

Abstract:

The significant rise of sexual crime around the world and the inability to control this violation has resulted in serious attacks, murders, and injuries to sexual crime victims in many situations. Crime in general, of which sexual crime is a major contributor, has brought about a downgrade in South Africa’s economy. The growing need to mitigate sexual crime across the country prompted this study. Data mining techniques were applied to a sexual crime dataset extracted from the South African crime statistics database on the Kaggle website to visualize sexual crime trends and build a model to predict future sexual crime occurrences, thereby assisting the government and law enforcement agencies with gaining insight into the most common sexual crime hotspots across all nine provinces of South Africa. The model could enable law officials to take more timeous action to curb sexual crimes in the country. This paper focuses on the identification of suitable data analytics algorithms available for sexual crime prediction and then determining the most suitable algorithm for the study. The linear regression and decision tree classifier algorithms were applied to the extracted sexual crime dataset to predict the features responsible for causing sexual crime in South Africa. Of these, linear regression was the most effective algorithm. The researcher found that a linear relationship exists between the dependent variable (sexual crime) and the independent variables (population and density) of this study. Accuracy, precision, recall, and f1 score were used to measure the performance of the decision tree algorithm, while linear regression was measured using the coefficient of determination measured by the R-squared score, which is a key output of regression analysis. The 91% accuracy achieved is an indication of how effective the model will predict sexual crime occurrences.

Keywords: algorithms, data analytics, data mining, decision tree classifier, linear regression, sexual crime prediction, south africa

Procedia PDF Downloads 103
4749 Decision Tree Modeling in Emergency Logistics Planning

Authors: Yousef Abu Nahleh, Arun Kumar, Fugen Daver, Reham Al-Hindawi

Abstract:

Despite the availability of natural disaster related time series data for last 110 years, there is no forecasting tool available to humanitarian relief organizations to determine forecasts for emergency logistics planning. This study develops a forecasting tool based on identifying probability of disaster for each country in the world by using decision tree modeling. Further, the determination of aggregate forecasts leads to efficient pre-disaster planning. Based on the research findings, the relief agencies can optimize the various resources allocation in emergency logistics planning.

Keywords: decision tree modeling, forecasting, humanitarian relief, emergency supply chain

Procedia PDF Downloads 374
4748 A Case-Based Reasoning-Decision Tree Hybrid System for Stock Selection

Authors: Yaojun Wang, Yaoqing Wang

Abstract:

Stock selection is an important decision-making problem. Many machine learning and data mining technologies are employed to build automatic stock-selection system. A profitable stock-selection system should consider the stock’s investment value and the market timing. In this paper, we present a hybrid system including both engage for stock selection. This system uses a case-based reasoning (CBR) model to execute the stock classification, uses a decision-tree model to help with market timing and stock selection. The experiments show that the performance of this hybrid system is better than that of other techniques regarding to the classification accuracy, the average return and the Sharpe ratio.

Keywords: case-based reasoning, decision tree, stock selection, machine learning

Procedia PDF Downloads 293