Search results for: decision tree classifier
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 4758

Search results for: decision tree classifier

4698 Optimization Model for Support Decision for Maximizing Production of Mixed Fruit Tree Farms

Authors: Andrés I. Ávila, Patricia Aros, César San Martín, Elizabeth Kehr, Yovana Leal

Abstract:

We consider a linear programming model to help farmers to decide if it is convinient to choose among three kinds of export fruits for their future investment. We consider area, investment, water, productivitiy minimal unit, and harvest restrictions and a monthly based model to compute the average income in five years. Also, conditions on the field as area, water availability and initia investment are required. Using the Chilean costs and dollar-peso exchange rate, we can simulate several scenarios to understand the possible risks associated to this market.

Keywords: mixed integer problem, fruit production, support decision model, fruit tree farms

Procedia PDF Downloads 428
4697 Discerning Divergent Nodes in Social Networks

Authors: Mehran Asadi, Afrand Agah

Abstract:

In data mining, partitioning is used as a fundamental tool for classification. With the help of partitioning, we study the structure of data, which allows us to envision decision rules, which can be applied to classification trees. In this research, we used online social network dataset and all of its attributes (e.g., Node features, labels, etc.) to determine what constitutes an above average chance of being a divergent node. We used the R statistical computing language to conduct the analyses in this report. The data were found on the UC Irvine Machine Learning Repository. This research introduces the basic concepts of classification in online social networks. In this work, we utilize overfitting and describe different approaches for evaluation and performance comparison of different classification methods. In classification, the main objective is to categorize different items and assign them into different groups based on their properties and similarities. In data mining, recursive partitioning is being utilized to probe the structure of a data set, which allow us to envision decision rules and apply them to classify data into several groups. Estimating densities is hard, especially in high dimensions, with limited data. Of course, we do not know the densities, but we could estimate them using classical techniques. First, we calculated the correlation matrix of the dataset to see if any predictors are highly correlated with one another. By calculating the correlation coefficients for the predictor variables, we see that density is strongly correlated with transitivity. We initialized a data frame to easily compare the quality of the result classification methods and utilized decision trees (with k-fold cross validation to prune the tree). The method performed on this dataset is decision trees. Decision tree is a non-parametric classification method, which uses a set of rules to predict that each observation belongs to the most commonly occurring class label of the training data. Our method aggregates many decision trees to create an optimized model that is not susceptible to overfitting. When using a decision tree, however, it is important to use cross-validation to prune the tree in order to narrow it down to the most important variables.

Keywords: online social networks, data mining, social cloud computing, interaction and collaboration

Procedia PDF Downloads 116
4696 Hyperspectral Image Classification Using Tree Search Algorithm

Authors: Shreya Pare, Parvin Akhter

Abstract:

Remotely sensing image classification becomes a very challenging task owing to the high dimensionality of hyperspectral images. The pixel-wise classification methods fail to take the spatial structure information of an image. Therefore, to improve the performance of classification, spatial information can be integrated into the classification process. In this paper, the multilevel thresholding algorithm based on a modified fuzzy entropy function is used to perform the segmentation of hyperspectral images. The fuzzy parameters of the MFE function have been optimized by using a new meta-heuristic algorithm based on the Tree-Search algorithm. The segmented image is classified by a large distribution machine (LDM) classifier. Experimental results are shown on a hyperspectral image dataset. The experimental outputs indicate that the proposed technique (MFE-TSA-LDM) achieves much higher classification accuracy for hyperspectral images when compared to state-of-art classification techniques. The proposed algorithm provides accurate segmentation and classification maps, thus becoming more suitable for image classification with large spatial structures.

Keywords: classification, hyperspectral images, large distribution margin, modified fuzzy entropy function, multilevel thresholding, tree search algorithm, hyperspectral image classification using tree search algorithm

Procedia PDF Downloads 137
4695 Analysis on Thermococcus achaeans with Frequent Pattern Mining

Authors: Jeongyeob Hong, Myeonghoon Park, Taeson Yoon

Abstract:

After the advent of Achaeans which utilize different metabolism pathway and contain conspicuously different cellular structure, they have been recognized as possible materials for developing quality of human beings. Among diverse Achaeans, in this paper, we compared 16s RNA Sequences of four different species of Thermococcus: Achaeans genus specialized in sulfur-dealing metabolism. Four Species, Barophilus, Kodakarensis, Hydrothermalis, and Onnurineus, live near the hydrothermal vent that emits extreme amount of sulfur and heat. By comparing ribosomal sequences of aforementioned four species, we found similarities in their sequences and expressed protein, enabling us to expect that certain ribosomal sequence or proteins are vital for their survival. Apriori algorithms and Decision Tree were used. for comparison.

Keywords: Achaeans, Thermococcus, apriori algorithm, decision tree

Procedia PDF Downloads 265
4694 Scalable Learning of Tree-Based Models on Sparsely Representable Data

Authors: Fares Hedayatit, Arnauld Joly, Panagiotis Papadimitriou

Abstract:

Many machine learning tasks such as text annotation usually require training over very big datasets, e.g., millions of web documents, that can be represented in a sparse input space. State-of the-art tree-based ensemble algorithms cannot scale to such datasets, since they include operations whose running time is a function of the input space size rather than a function of the non-zero input elements. In this paper, we propose an efficient splitting algorithm to leverage input sparsity within decision tree methods. Our algorithm improves training time over sparse datasets by more than two orders of magnitude and it has been incorporated in the current version of scikit-learn.org, the most popular open source Python machine learning library.

Keywords: big data, sparsely representable data, tree-based models, scalable learning

Procedia PDF Downloads 236
4693 Developing Structured Sizing Systems for Manufacturing Ready-Made Garments of Indian Females Using Decision Tree-Based Data Mining

Authors: Hina Kausher, Sangita Srivastava

Abstract:

In India, there is a lack of standard, systematic sizing approach for producing readymade garments. Garments manufacturing companies use their own created size tables by modifying international sizing charts of ready-made garments. The purpose of this study is to tabulate the anthropometric data which covers the variety of figure proportions in both height and girth. 3,000 data has been collected by an anthropometric survey undertaken over females between the ages of 16 to 80 years from some states of India to produce the sizing system suitable for clothing manufacture and retailing. This data is used for the statistical analysis of body measurements, the formulation of sizing systems and body measurements tables. Factor analysis technique is used to filter the control body dimensions from a large number of variables. Decision tree-based data mining is used to cluster the data. The standard and structured sizing system can facilitate pattern grading and garment production. Moreover, it can exceed buying ratios and upgrade size allocations to retail segments.

Keywords: anthropometric data, data mining, decision tree, garments manufacturing, sizing systems, ready-made garments

Procedia PDF Downloads 110
4692 The Use of Boosted Multivariate Trees in Medical Decision-Making for Repeated Measurements

Authors: Ebru Turgal, Beyza Doganay Erdogan

Abstract:

Machine learning aims to model the relationship between the response and features. Medical decision-making researchers would like to make decisions about patients’ course and treatment, by examining the repeated measurements over time. Boosting approach is now being used in machine learning area for these aims as an influential tool. The aim of this study is to show the usage of multivariate tree boosting in this field. The main reason for utilizing this approach in the field of decision-making is the ease solutions of complex relationships. To show how multivariate tree boosting method can be used to identify important features and feature-time interaction, we used the data, which was collected retrospectively from Ankara University Chest Diseases Department records. Dataset includes repeated PF ratio measurements. The follow-up time is planned for 120 hours. A set of different models is tested. In conclusion, main idea of classification with weighed combination of classifiers is a reliable method which was shown with simulations several times. Furthermore, time varying variables will be taken into consideration within this concept and it could be possible to make accurate decisions about regression and survival problems.

Keywords: boosted multivariate trees, longitudinal data, multivariate regression tree, panel data

Procedia PDF Downloads 173
4691 Measuring Multi-Class Linear Classifier for Image Classification

Authors: Fatma Susilawati Mohamad, Azizah Abdul Manaf, Fadhillah Ahmad, Zarina Mohamad, Wan Suryani Wan Awang

Abstract:

A simple and robust multi-class linear classifier is proposed and implemented. For a pair of classes of the linear boundary, a collection of segments of hyper planes created as perpendicular bisectors of line segments linking centroids of the classes or part of classes. Nearest Neighbor and Linear Discriminant Analysis are compared in the experiments to see the performances of each classifier in discriminating ripeness of oil palm. This paper proposes a multi-class linear classifier using Linear Discriminant Analysis (LDA) for image identification. Result proves that LDA is well capable in separating multi-class features for ripeness identification.

Keywords: multi-class, linear classifier, nearest neighbor, linear discriminant analysis

Procedia PDF Downloads 501
4690 A Comparative Study on Automatic Feature Classification Methods of Remote Sensing Images

Authors: Lee Jeong Min, Lee Mi Hee, Eo Yang Dam

Abstract:

Geospatial feature extraction is a very important issue in the remote sensing research. In the meantime, the image classification based on statistical techniques, but, in recent years, data mining and machine learning techniques for automated image processing technology is being applied to remote sensing it has focused on improved results generated possibility. In this study, artificial neural network and decision tree technique is applied to classify the high-resolution satellite images, as compared to the MLC processing result is a statistical technique and an analysis of the pros and cons between each of the techniques.

Keywords: remote sensing, artificial neural network, decision tree, maximum likelihood classification

Procedia PDF Downloads 320
4689 Diabetes Diagnosis Model Using Rough Set and K- Nearest Neighbor Classifier

Authors: Usiobaifo Agharese Rosemary, Osaseri Roseline Oghogho

Abstract:

Diabetes is a complex group of disease with a variety of causes; it is a disorder of the body metabolism in the digestion of carbohydrates food. The application of machine learning in the field of medical diagnosis has been the focus of many researchers and the use of recognition and classification model as a decision support tools has help the medical expert in diagnosis of diseases. Considering the large volume of medical data which require special techniques, experience, and high diagnostic skill in the diagnosis of diseases, the application of an artificial intelligent system to assist medical personnel in order to enhance their efficiency and accuracy in diagnosis will be an invaluable tool. In this study will propose a diabetes diagnosis model using rough set and K-nearest Neighbor classifier algorithm. The system consists of two modules: the feature extraction module and predictor module, rough data set is used to preprocess the attributes while K-nearest neighbor classifier is used to classify the given data. The dataset used for this model was taken for University of Benin Teaching Hospital (UBTH) database. Half of the data was used in the training while the other half was used in testing the system. The proposed model was able to achieve over 80% accuracy.

Keywords: classifier algorithm, diabetes, diagnostic model, machine learning

Procedia PDF Downloads 304
4688 A Kruskal Based Heuxistic for the Application of Spanning Tree

Authors: Anjan Naidu

Abstract:

In this paper we first discuss the minimum spanning tree, then we use the Kruskal algorithm to obtain minimum spanning tree. Based on Kruskal algorithm we propose Kruskal algorithm to apply an application to find minimum cost applying the concept of spanning tree.

Keywords: Minimum Spanning tree, algorithm, Heuxistic, application, classification of Sub 97K90

Procedia PDF Downloads 417
4687 Comparison of Deep Learning and Machine Learning Algorithms to Diagnose and Predict Breast Cancer

Authors: F. Ghazalnaz Sharifonnasabi, Iman Makhdoom

Abstract:

Breast cancer is a serious health concern that affects many people around the world. According to a study published in the Breast journal, the global burden of breast cancer is expected to increase significantly over the next few decades. The number of deaths from breast cancer has been increasing over the years, but the age-standardized mortality rate has decreased in some countries. It’s important to be aware of the risk factors for breast cancer and to get regular check- ups to catch it early if it does occur. Machin learning techniques have been used to aid in the early detection and diagnosis of breast cancer. These techniques, that have been shown to be effective in predicting and diagnosing the disease, have become a research hotspot. In this study, we consider two deep learning approaches including: Multi-Layer Perceptron (MLP), and Convolutional Neural Network (CNN). We also considered the five-machine learning algorithm titled: Decision Tree (C4.5), Naïve Bayesian (NB), Support Vector Machine (SVM), K-Nearest Neighbors (KNN) Algorithm and XGBoost (eXtreme Gradient Boosting) on the Breast Cancer Wisconsin Diagnostic dataset. We have carried out the process of evaluating and comparing classifiers involving selecting appropriate metrics to evaluate classifier performance and selecting an appropriate tool to quantify this performance. The main purpose of the study is predicting and diagnosis breast cancer, applying the mentioned algorithms and also discovering of the most effective with respect to confusion matrix, accuracy and precision. It is realized that CNN outperformed all other classifiers and achieved the highest accuracy (0.982456). The work is implemented in the Anaconda environment based on Python programing language.

Keywords: breast cancer, multi-layer perceptron, Naïve Bayesian, SVM, decision tree, convolutional neural network, XGBoost, KNN

Procedia PDF Downloads 43
4686 The Use of AI to Measure Gross National Happiness

Authors: Riona Dighe

Abstract:

This research attempts to identify an alternative approach to the measurement of Gross National Happiness (GNH). It uses artificial intelligence (AI), incorporating natural language processing (NLP) and sentiment analysis to measure GNH. We use ‘off the shelf’ NLP models responsible for the sentiment analysis of a sentence as a building block for this research. We constructed an algorithm using NLP models to derive a sentiment analysis score against sentences. This was then tested against a sample of 20 respondents to derive a sentiment analysis score. The scores generated resembled human responses. By utilising the MLP classifier, decision tree, linear model, and K-nearest neighbors, we were able to obtain a test accuracy of 89.97%, 54.63%, 52.13%, and 47.9%, respectively. This gave us the confidence to use the NLP models against sentences in websites to measure the GNH of a country.

Keywords: artificial intelligence, NLP, sentiment analysis, gross national happiness

Procedia PDF Downloads 70
4685 Enhanced Extra Trees Classifier for Epileptic Seizure Prediction

Authors: Maurice Ntahobari, Levin Kuhlmann, Mario Boley, Zhinoos Razavi Hesabi

Abstract:

For machine learning based epileptic seizure prediction, it is important for the model to be implemented in small implantable or wearable devices that can be used to monitor epilepsy patients; however, current state-of-the-art methods are complex and computationally intensive. We use Shapley Additive Explanation (SHAP) to find relevant intracranial electroencephalogram (iEEG) features and improve the computational efficiency of a state-of-the-art seizure prediction method based on the extra trees classifier while maintaining prediction performance. Results for a small contest dataset and a much larger dataset with continuous recordings of up to 3 years per patient from 15 patients yield better than chance prediction performance (p < 0.004). Moreover, while the performance of the SHAP-based model is comparable to that of the benchmark, the overall training and prediction time of the model has been reduced by a factor of 1.83. It can also be noted that the feature called zero crossing value is the best EEG feature for seizure prediction. These results suggest state-of-the-art seizure prediction performance can be achieved using efficient methods based on optimal feature selection.

Keywords: machine learning, seizure prediction, extra tree classifier, SHAP, epilepsy

Procedia PDF Downloads 79
4684 Using Time Series NDVI to Model Land Cover Change: A Case Study in the Berg River Catchment Area, Western Cape, South Africa

Authors: Adesuyi Ayodeji Steve, Zahn Munch

Abstract:

This study investigates the use of MODIS NDVI to identify agricultural land cover change areas on an annual time step (2007 - 2012) and characterize the trend in the study area. An ISODATA classification was performed on the MODIS imagery to select only the agricultural class producing 3 class groups namely: agriculture, agriculture/semi-natural, and semi-natural. NDVI signatures were created for the time series to identify areas dominated by cereals and vineyards with the aid of ancillary, pictometry and field sample data. The NDVI signature curve and training samples aided in creating a decision tree model in WEKA 3.6.9. From the training samples two classification models were built in WEKA using decision tree classifier (J48) algorithm; Model 1 included ISODATA classification and Model 2 without, both having accuracies of 90.7% and 88.3% respectively. The two models were used to classify the whole study area, thus producing two land cover maps with Model 1 and 2 having classification accuracies of 77% and 80% respectively. Model 2 was used to create change detection maps for all the other years. Subtle changes and areas of consistency (unchanged) were observed in the agricultural classes and crop practices over the years as predicted by the land cover classification. 41% of the catchment comprises of cereals with 35% possibly following a crop rotation system. Vineyard largely remained constant over the years, with some conversion to vineyard (1%) from other land cover classes. Some of the changes might be as a result of misclassification and crop rotation system.

Keywords: change detection, land cover, modis, NDVI

Procedia PDF Downloads 372
4683 Nearest Neighbor Investigate Using R+ Tree

Authors: Rutuja Desai

Abstract:

Search engine is fundamentally a framework used to search the data which is pertinent to the client via WWW. Looking close-by spot identified with the keywords is an imperative concept in developing web advances. For such kind of searching, extent pursuit or closest neighbor is utilized. In range search the forecast is made whether the objects meet to query object. Nearest neighbor is the forecast of the focuses close to the query set by the client. Here, the nearest neighbor methodology is utilized where Data recovery R+ tree is utilized rather than IR2 tree. The disadvantages of IR2 tree is: The false hit number can surpass the limit and the mark in Information Retrieval R-tree must have Voice over IP bit for each one of a kind word in W set is recouped by Data recovery R+ tree. The inquiry is fundamentally subordinate upon the key words and the geometric directions.

Keywords: information retrieval, nearest neighbor search, keyword search, R+ tree

Procedia PDF Downloads 261
4682 Classification of Potential Biomarkers in Breast Cancer Using Artificial Intelligence Algorithms and Anthropometric Datasets

Authors: Aref Aasi, Sahar Ebrahimi Bajgani, Erfan Aasi

Abstract:

Breast cancer (BC) continues to be the most frequent cancer in females and causes the highest number of cancer-related deaths in women worldwide. Inspired by recent advances in studying the relationship between different patient attributes and features and the disease, in this paper, we have tried to investigate the different classification methods for better diagnosis of BC in the early stages. In this regard, datasets from the University Hospital Centre of Coimbra were chosen, and different machine learning (ML)-based and neural network (NN) classifiers have been studied. For this purpose, we have selected favorable features among the nine provided attributes from the clinical dataset by using a random forest algorithm. This dataset consists of both healthy controls and BC patients, and it was noted that glucose, BMI, resistin, and age have the most importance, respectively. Moreover, we have analyzed these features with various ML-based classifier methods, including Decision Tree (DT), K-Nearest Neighbors (KNN), eXtreme Gradient Boosting (XGBoost), Logistic Regression (LR), Naive Bayes (NB), and Support Vector Machine (SVM) along with NN-based Multi-Layer Perceptron (MLP) classifier. The results revealed that among different techniques, the SVM and MLP classifiers have the most accuracy, with amounts of 96% and 92%, respectively. These results divulged that the adopted procedure could be used effectively for the classification of cancer cells, and also it encourages further experimental investigations with more collected data for other types of cancers.

Keywords: breast cancer, diagnosis, machine learning, biomarker classification, neural network

Procedia PDF Downloads 97
4681 Expert Supporting System for Diagnosing Lymphoid Neoplasms Using Probabilistic Decision Tree Algorithm and Immunohistochemistry Profile Database

Authors: Yosep Chong, Yejin Kim, Jingyun Choi, Hwanjo Yu, Eun Jung Lee, Chang Suk Kang

Abstract:

For the past decades, immunohistochemistry (IHC) has been playing an important role in the diagnosis of human neoplasms, by helping pathologists to make a clearer decision on differential diagnosis, subtyping, personalized treatment plan, and finally prognosis prediction. However, the IHC performed in various tumors of daily practice often shows conflicting and very challenging results to interpret. Even comprehensive diagnosis synthesizing clinical, histologic and immunohistochemical findings can be helpless in some twisted cases. Another important issue is that the IHC data is increasing exponentially and more and more information have to be taken into account. For this reason, we reached an idea to develop an expert supporting system to help pathologists to make a better decision in diagnosing human neoplasms with IHC results. We gave probabilistic decision tree algorithm and tested the algorithm with real case data of lymphoid neoplasms, in which the IHC profile is more important to make a proper diagnosis than other human neoplasms. We designed probabilistic decision tree based on Bayesian theorem, program computational process using MATLAB (The MathWorks, Inc., USA) and prepared IHC profile database (about 104 disease category and 88 IHC antibodies) based on WHO classification by reviewing the literature. The initial probability of each neoplasm was set with the epidemiologic data of lymphoid neoplasm in Korea. With the IHC results of 131 patients sequentially selected, top three presumptive diagnoses for each case were made and compared with the original diagnoses. After the review of the data, 124 out of 131 were used for final analysis. As a result, the presumptive diagnoses were concordant with the original diagnoses in 118 cases (93.7%). The major reason of discordant cases was that the similarity of the IHC profile between two or three different neoplasms. The expert supporting system algorithm presented in this study is in its elementary stage and need more optimization using more advanced technology such as deep-learning with data of real cases, especially in differentiating T-cell lymphomas. Although it needs more refinement, it may be used to aid pathological decision making in future. A further application to determine IHC antibodies for a certain subset of differential diagnoses might be possible in near future.

Keywords: database, expert supporting system, immunohistochemistry, probabilistic decision tree

Procedia PDF Downloads 204
4680 Segmentation of Liver Using Random Forest Classifier

Authors: Gajendra Kumar Mourya, Dinesh Bhatia, Akash Handique, Sunita Warjri, Syed Achaab Amir

Abstract:

Nowadays, Medical imaging has become an integral part of modern healthcare. Abdominal CT images are an invaluable mean for abdominal organ investigation and have been widely studied in the recent years. Diagnosis of liver pathologies is one of the major areas of current interests in the field of medical image processing and is still an open problem. To deeply study and diagnose the liver, segmentation of liver is done to identify which part of the liver is mostly affected. Manual segmentation of the liver in CT images is time-consuming and suffers from inter- and intra-observer differences. However, automatic or semi-automatic computer aided segmentation of the Liver is a challenging task due to inter-patient Liver shape and size variability. In this paper, we present a technique for automatic segmenting the liver from CT images using Random Forest Classifier. Random forests or random decision forests are an ensemble learning method for classification that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes of the individual trees. After comparing with various other techniques, it was found that Random Forest Classifier provide a better segmentation results with respect to accuracy and speed. We have done the validation of our results using various techniques and it shows above 89% accuracy in all the cases.

Keywords: CT images, image validation, random forest, segmentation

Procedia PDF Downloads 281
4679 Performance Analysis with the Combination of Visualization and Classification Technique for Medical Chatbot

Authors: Shajida M., Sakthiyadharshini N. P., Kamalesh S., Aswitha B.

Abstract:

Natural Language Processing (NLP) continues to play a strategic part in complaint discovery and medicine discovery during the current epidemic. This abstract provides an overview of performance analysis with a combination of visualization and classification techniques of NLP for a medical chatbot. Sentiment analysis is an important aspect of NLP that is used to determine the emotional tone behind a piece of text. This technique has been applied to various domains, including medical chatbots. In this, we have compared the combination of the decision tree with heatmap and Naïve Bayes with Word Cloud. The performance of the chatbot was evaluated using accuracy, and the results indicate that the combination of visualization and classification techniques significantly improves the chatbot's performance.

Keywords: sentimental analysis, NLP, medical chatbot, decision tree, heatmap, naïve bayes, word cloud

Procedia PDF Downloads 47
4678 Spatial Relationship of Drug Smuggling Based on Geographic Information System Knowledge Discovery Using Decision Tree Algorithm

Authors: S. Niamkaeo, O. Robert, O. Chaowalit

Abstract:

In this investigation, we focus on discovering spatial relationship of drug smuggling along the northern border of Thailand. Thailand is no longer a drug production site, but Thailand is still one of the major drug trafficking hubs due to its topographic characteristics facilitating drug smuggling from neighboring countries. Our study areas cover three districts (Mae-jan, Mae-fahluang, and Mae-sai) in Chiangrai city and four districts (Chiangdao, Mae-eye, Chaiprakarn, and Wienghang) in Chiangmai city where drug smuggling of methamphetamine crystal and amphetamine occurs mostly. The data on drug smuggling incidents from 2011 to 2017 was collected from several national and local published news. Geo-spatial drug smuggling database was prepared. Decision tree algorithm was applied in order to discover the spatial relationship of factors related to drug smuggling, which was converted into rules using rule-based system. The factors including land use type, smuggling route, season and distance within 500 meters from check points were found that they were related to drug smuggling in terms of rules-based relationship. It was illustrated that drug smuggling was occurred mostly in forest area in winter. Drug smuggling exhibited was discovered mainly along topographic road where check points were not reachable. This spatial relationship of drug smuggling could support the Thai Office of Narcotics Control Board in surveillance drug smuggling.

Keywords: decision tree, drug smuggling, Geographic Information System, GIS knowledge discovery, rule-based system

Procedia PDF Downloads 143
4677 Efficient Credit Card Fraud Detection Based on Multiple ML Algorithms

Authors: Neha Ahirwar

Abstract:

In the contemporary digital era, the rise of credit card fraud poses a significant threat to both financial institutions and consumers. As fraudulent activities become more sophisticated, there is an escalating demand for robust and effective fraud detection mechanisms. Advanced machine learning algorithms have become crucial tools in addressing this challenge. This paper conducts a thorough examination of the design and evaluation of a credit card fraud detection system, utilizing four prominent machine learning algorithms: random forest, logistic regression, decision tree, and XGBoost. The surge in digital transactions has opened avenues for fraudsters to exploit vulnerabilities within payment systems. Consequently, there is an urgent need for proactive and adaptable fraud detection systems. This study addresses this imperative by exploring the efficacy of machine learning algorithms in identifying fraudulent credit card transactions. The selection of random forest, logistic regression, decision tree, and XGBoost for scrutiny in this study is based on their documented effectiveness in diverse domains, particularly in credit card fraud detection. These algorithms are renowned for their capability to model intricate patterns and provide accurate predictions. Each algorithm is implemented and evaluated for its performance in a controlled environment, utilizing a diverse dataset comprising both genuine and fraudulent credit card transactions.

Keywords: efficient credit card fraud detection, random forest, logistic regression, XGBoost, decision tree

Procedia PDF Downloads 25
4676 A Dynamic Solution Approach for Heart Disease Prediction

Authors: Walid Moudani

Abstract:

The healthcare environment is generally perceived as being information rich yet knowledge poor. However, there is a lack of effective analysis tools to discover hidden relationships and trends in data. In fact, valuable knowledge can be discovered from application of data mining techniques in healthcare system. In this study, a proficient methodology for the extraction of significant patterns from the coronary heart disease warehouses for heart attack prediction, which unfortunately continues to be a leading cause of mortality in the whole world, has been presented. For this purpose, we propose to enumerate dynamically the optimal subsets of the reduced features of high interest by using rough sets technique associated to dynamic programming. Therefore, we propose to validate the classification using Random Forest (RF) decision tree to identify the risky heart disease cases. This work is based on a large amount of data collected from several clinical institutions based on the medical profile of patient. Moreover, the experts’ knowledge in this field has been taken into consideration in order to define the disease, its risk factors, and to establish significant knowledge relationships among the medical factors. A computer-aided system is developed for this purpose based on a population of 525 adults. The performance of the proposed model is analyzed and evaluated based on set of benchmark techniques applied in this classification problem.

Keywords: multi-classifier decisions tree, features reduction, dynamic programming, rough sets

Procedia PDF Downloads 381
4675 Monitoring Three-Dimensional Models of Tree and Forest by Using Digital Close-Range Photogrammetry

Authors: S. Y. Cicekli

Abstract:

In this study, tree-dimensional model of tree was created by using terrestrial close range photogrammetry. For this close range photos were taken. Photomodeler Pro 5 software was used for camera calibration and create three-dimensional model of trees. In first test, three-dimensional model of a tree was created, in the second test three-dimensional model of three trees were created. This study aim is creating three-dimensional model of trees and indicate the use of close-range photogrammetry in forestry. At the end of the study, three-dimensional model of tree and three trees were created. This study showed that usability of close-range photogrammetry for monitoring tree and forests three-dimensional model.

Keywords: close- range photogrammetry, forest, tree, three-dimensional model

Procedia PDF Downloads 363
4674 The Effect of Feature Selection on Pattern Classification

Authors: Chih-Fong Tsai, Ya-Han Hu

Abstract:

The aim of feature selection (or dimensionality reduction) is to filter out unrepresentative features (or variables) making the classifier perform better than the one without feature selection. Since there are many well-known feature selection algorithms, and different classifiers based on different selection results may perform differently, very few studies consider examining the effect of performing different feature selection algorithms on the classification performances by different classifiers over different types of datasets. In this paper, two widely used algorithms, which are the genetic algorithm (GA) and information gain (IG), are used to perform feature selection. On the other hand, three well-known classifiers are constructed, which are the CART decision tree (DT), multi-layer perceptron (MLP) neural network, and support vector machine (SVM). Based on 14 different types of datasets, the experimental results show that in most cases IG is a better feature selection algorithm than GA. In addition, the combinations of IG with DT and IG with SVM perform best and second best for small and large scale datasets.

Keywords: data mining, feature selection, pattern classification, dimensionality reduction

Procedia PDF Downloads 635
4673 Machine Learning for Aiding Meningitis Diagnosis in Pediatric Patients

Authors: Karina Zaccari, Ernesto Cordeiro Marujo

Abstract:

This paper presents a Machine Learning (ML) approach to support Meningitis diagnosis in patients at a children’s hospital in Sao Paulo, Brazil. The aim is to use ML techniques to reduce the use of invasive procedures, such as cerebrospinal fluid (CSF) collection, as much as possible. In this study, we focus on predicting the probability of Meningitis given the results of a blood and urine laboratory tests, together with the analysis of pain or other complaints from the patient. We tested a number of different ML algorithms, including: Adaptative Boosting (AdaBoost), Decision Tree, Gradient Boosting, K-Nearest Neighbors (KNN), Logistic Regression, Random Forest and Support Vector Machines (SVM). Decision Tree algorithm performed best, with 94.56% and 96.18% accuracy for training and testing data, respectively. These results represent a significant aid to doctors in diagnosing Meningitis as early as possible and in preventing expensive and painful procedures on some children.

Keywords: machine learning, medical diagnosis, meningitis detection, pediatric research

Procedia PDF Downloads 122
4672 Determining of the Performance of Data Mining Algorithm Determining the Influential Factors and Prediction of Ischemic Stroke: A Comparative Study in the Southeast of Iran

Authors: Y. Mehdipour, S. Ebrahimi, A. Jahanpour, F. Seyedzaei, B. Sabayan, A. Karimi, H. Amirifard

Abstract:

Ischemic stroke is one of the common reasons for disability and mortality. The fourth leading cause of death in the world and the third in some other sources. Only 1/3 of the patients with ischemic stroke fully recover, 1/3 of them end in permanent disability and 1/3 face death. Thus, the use of predictive models to predict stroke has a vital role in reducing the complications and costs related to this disease. Thus, the aim of this study was to specify the effective factors and predict ischemic stroke with the help of DM methods. The present study was a descriptive-analytic study. The population was 213 cases from among patients referring to Ali ibn Abi Talib (AS) Hospital in Zahedan. Data collection tool was a checklist with the validity and reliability confirmed. This study used DM algorithms of decision tree for modeling. Data analysis was performed using SPSS-19 and SPSS Modeler 14.2. The results of the comparison of algorithms showed that CHAID algorithm with 95.7% accuracy has the best performance. Moreover, based on the model created, factors such as anemia, diabetes mellitus, hyperlipidemia, transient ischemic attacks, coronary artery disease, and atherosclerosis are the most effective factors in stroke. Decision tree algorithms, especially CHAID algorithm, have acceptable precision and predictive ability to determine the factors affecting ischemic stroke. Thus, by creating predictive models through this algorithm, will play a significant role in decreasing the mortality and disability caused by ischemic stroke.

Keywords: data mining, ischemic stroke, decision tree, Bayesian network

Procedia PDF Downloads 147
4671 Heart Failure Identification and Progression by Classifying Cardiac Patients

Authors: Muhammad Saqlain, Nazar Abbas Saqib, Muazzam A. Khan

Abstract:

Heart Failure (HF) has become the major health problem in our society. The prevalence of HF has increased as the patient’s ages and it is the major cause of the high mortality rate in adults. A successful identification and progression of HF can be helpful to reduce the individual and social burden from this syndrome. In this study, we use a real data set of cardiac patients to propose a classification model for the identification and progression of HF. The data set has divided into three age groups, namely young, adult, and old and then each age group have further classified into four classes according to patient’s current physical condition. Contemporary Data Mining classification algorithms have been applied to each individual class of every age group to identify the HF. Decision Tree (DT) gives the highest accuracy of 90% and outperform all other algorithms. Our model accurately diagnoses different stages of HF for each age group and it can be very useful for the early prediction of HF.

Keywords: decision tree, heart failure, data mining, classification model

Procedia PDF Downloads 380
4670 Determinant Factor of Farm Household Fruit Tree Planting: The Case of Habru Woreda, North Wollo

Authors: Getamesay Kassaye Dimru

Abstract:

The cultivation of fruit tree in degraded areas has two-fold importance. Firstly, it improves food availability and income, and secondly, it promotes the conservation of soil and water improving, in turn, the productivity of the land. The main objectives of this study are to identify the determinant of farmer's fruit trees plantation decision and to major fruit production challenges and opportunities of the study area. The analysis was made using primary data collected from 60 sample household selected randomly from the study area in 2016. The primary data was supplemented by data collected from a key informant. In addition to the descriptive statistics and statistical tests (Chi-square test and t-test), a logit model was employed to identify the determinant of fruit tree plantation decision. Drought, pest incidence, land degradation, lack of input, lack of capital and irrigation schemes maintenance, lack of misuse of irrigation water and limited agricultural personnel are the major production constraints identified. The opportunities that need to further exploited are better access to irrigation, main road access, endowment of preferred guava variety, experience of farmers, and proximity of the study area to research center. The result of logit model shows that from different factors hypothesized to determine fruit tree plantation decision, age of the household head accesses to market and perception of farmers about fruits' disease and pest resistance are found to be significant. The result has revealed important implications for the promotion of fruit production for both land degradation control and rehabilitation and increasing the livelihood of farming households.

Keywords: degradation, fruit, irrigation, pest

Procedia PDF Downloads 181
4669 Predictive Analysis of the Stock Price Market Trends with Deep Learning

Authors: Suraj Mehrotra

Abstract:

The stock market is a volatile, bustling marketplace that is a cornerstone of economics. It defines whether companies are successful or in spiral. A thorough understanding of it is important - many companies have whole divisions dedicated to analysis of both their stock and of rivaling companies. Linking the world of finance and artificial intelligence (AI), especially the stock market, has been a relatively recent development. Predicting how stocks will do considering all external factors and previous data has always been a human task. With the help of AI, however, machine learning models can help us make more complete predictions in financial trends. Taking a look at the stock market specifically, predicting the open, closing, high, and low prices for the next day is very hard to do. Machine learning makes this task a lot easier. A model that builds upon itself that takes in external factors as weights can predict trends far into the future. When used effectively, new doors can be opened up in the business and finance world, and companies can make better and more complete decisions. This paper explores the various techniques used in the prediction of stock prices, from traditional statistical methods to deep learning and neural networks based approaches, among other methods. It provides a detailed analysis of the techniques and also explores the challenges in predictive analysis. For the accuracy of the testing set, taking a look at four different models - linear regression, neural network, decision tree, and naïve Bayes - on the different stocks, Apple, Google, Tesla, Amazon, United Healthcare, Exxon Mobil, J.P. Morgan & Chase, and Johnson & Johnson, the naïve Bayes model and linear regression models worked best. For the testing set, the naïve Bayes model had the highest accuracy along with the linear regression model, followed by the neural network model and then the decision tree model. The training set had similar results except for the fact that the decision tree model was perfect with complete accuracy in its predictions, which makes sense. This means that the decision tree model likely overfitted the training set when used for the testing set.

Keywords: machine learning, testing set, artificial intelligence, stock analysis

Procedia PDF Downloads 59