Search results for: decision tree classifiers
4761 Optimization Model for Support Decision for Maximizing Production of Mixed Fruit Tree Farms
Authors: Andrés I. Ávila, Patricia Aros, César San Martín, Elizabeth Kehr, Yovana Leal
Abstract:
We consider a linear programming model to help farmers to decide if it is convinient to choose among three kinds of export fruits for their future investment. We consider area, investment, water, productivitiy minimal unit, and harvest restrictions and a monthly based model to compute the average income in five years. Also, conditions on the field as area, water availability and initia investment are required. Using the Chilean costs and dollar-peso exchange rate, we can simulate several scenarios to understand the possible risks associated to this market.Keywords: mixed integer problem, fruit production, support decision model, fruit tree farms
Procedia PDF Downloads 4564760 Discerning Divergent Nodes in Social Networks
Authors: Mehran Asadi, Afrand Agah
Abstract:
In data mining, partitioning is used as a fundamental tool for classification. With the help of partitioning, we study the structure of data, which allows us to envision decision rules, which can be applied to classification trees. In this research, we used online social network dataset and all of its attributes (e.g., Node features, labels, etc.) to determine what constitutes an above average chance of being a divergent node. We used the R statistical computing language to conduct the analyses in this report. The data were found on the UC Irvine Machine Learning Repository. This research introduces the basic concepts of classification in online social networks. In this work, we utilize overfitting and describe different approaches for evaluation and performance comparison of different classification methods. In classification, the main objective is to categorize different items and assign them into different groups based on their properties and similarities. In data mining, recursive partitioning is being utilized to probe the structure of a data set, which allow us to envision decision rules and apply them to classify data into several groups. Estimating densities is hard, especially in high dimensions, with limited data. Of course, we do not know the densities, but we could estimate them using classical techniques. First, we calculated the correlation matrix of the dataset to see if any predictors are highly correlated with one another. By calculating the correlation coefficients for the predictor variables, we see that density is strongly correlated with transitivity. We initialized a data frame to easily compare the quality of the result classification methods and utilized decision trees (with k-fold cross validation to prune the tree). The method performed on this dataset is decision trees. Decision tree is a non-parametric classification method, which uses a set of rules to predict that each observation belongs to the most commonly occurring class label of the training data. Our method aggregates many decision trees to create an optimized model that is not susceptible to overfitting. When using a decision tree, however, it is important to use cross-validation to prune the tree in order to narrow it down to the most important variables.Keywords: online social networks, data mining, social cloud computing, interaction and collaboration
Procedia PDF Downloads 1574759 Scalable Learning of Tree-Based Models on Sparsely Representable Data
Authors: Fares Hedayatit, Arnauld Joly, Panagiotis Papadimitriou
Abstract:
Many machine learning tasks such as text annotation usually require training over very big datasets, e.g., millions of web documents, that can be represented in a sparse input space. State-of the-art tree-based ensemble algorithms cannot scale to such datasets, since they include operations whose running time is a function of the input space size rather than a function of the non-zero input elements. In this paper, we propose an efficient splitting algorithm to leverage input sparsity within decision tree methods. Our algorithm improves training time over sparse datasets by more than two orders of magnitude and it has been incorporated in the current version of scikit-learn.org, the most popular open source Python machine learning library.Keywords: big data, sparsely representable data, tree-based models, scalable learning
Procedia PDF Downloads 2634758 Analysis on Thermococcus achaeans with Frequent Pattern Mining
Authors: Jeongyeob Hong, Myeonghoon Park, Taeson Yoon
Abstract:
After the advent of Achaeans which utilize different metabolism pathway and contain conspicuously different cellular structure, they have been recognized as possible materials for developing quality of human beings. Among diverse Achaeans, in this paper, we compared 16s RNA Sequences of four different species of Thermococcus: Achaeans genus specialized in sulfur-dealing metabolism. Four Species, Barophilus, Kodakarensis, Hydrothermalis, and Onnurineus, live near the hydrothermal vent that emits extreme amount of sulfur and heat. By comparing ribosomal sequences of aforementioned four species, we found similarities in their sequences and expressed protein, enabling us to expect that certain ribosomal sequence or proteins are vital for their survival. Apriori algorithms and Decision Tree were used. for comparison.Keywords: Achaeans, Thermococcus, apriori algorithm, decision tree
Procedia PDF Downloads 2904757 Bilingualism: A Case Study of Assamese and Bodo Classifiers
Authors: Samhita Bharadwaj
Abstract:
This is an empirical study of classifiers in Assamese and Bodo, two genetically unrelated languages of India. The objective of the paper is to address the language contact between Assamese and Bodo as reflected in classifiers. The data has been collected through fieldwork in Bodo recording narratives and folk tales and eliciting specific data from the speakers. The data for Assamese is self-produced as native speaker of the language. Assamese is the easternmost New-Indo-Aryan (henceforth NIA) language mainly spoken in the Brahmaputra valley of Assam and some other north-eastern states of India. It is the lingua franca of Assam and is creolised in the neighbouring state of Nagaland. Bodo, on the other hand, is a Tibeto-Burman (henceforth TB) language of the Bodo-Garo group. It has the highest number of speakers among the TB languages of Assam. However, compared to Assamese, it is still a lesser documented language and due to the prestige of Assamese, all the Bodo speakers are fluent bi-lingual in Assamese, though the opposite isn’t the case. With this context, classifiers, a characteristic phenomenon of TB languages, but not so much of NIA languages, presents an interesting case study on language contact caused by bilingualism. Assamese, as a result of its language contact with the TB languages which are rich in classifiers; has developed the richest classifier system among the IA languages in India. Yet, as a part of rampant borrowing of Assamese words and patterns into Bodo; Bodo is seen to borrow even Assamese classifiers into its system. This paper analyses the borrowed classifiers of Bodo and finds the route of this borrowing phenomenon in the number system of the languages. As the Bodo speakers start replacing the higher numbers from five with Assamese ones, they also choose the Assamese classifiers to attach to these numbers. Thus, the partial loss of number in Bodo as a result of language contact and bilingualism in Assamese is found to be the reason behind the borrowing of classifiers in Bodo. The significance of the study lies in exploring an interesting aspect of language contact in Assam. It is hoped that this will attract further research on bilingualism and classifiers in Assam.Keywords: Assamese, bi-lingual, Bodo, borrowing, classifier, language contact
Procedia PDF Downloads 2224756 Developing Structured Sizing Systems for Manufacturing Ready-Made Garments of Indian Females Using Decision Tree-Based Data Mining
Authors: Hina Kausher, Sangita Srivastava
Abstract:
In India, there is a lack of standard, systematic sizing approach for producing readymade garments. Garments manufacturing companies use their own created size tables by modifying international sizing charts of ready-made garments. The purpose of this study is to tabulate the anthropometric data which covers the variety of figure proportions in both height and girth. 3,000 data has been collected by an anthropometric survey undertaken over females between the ages of 16 to 80 years from some states of India to produce the sizing system suitable for clothing manufacture and retailing. This data is used for the statistical analysis of body measurements, the formulation of sizing systems and body measurements tables. Factor analysis technique is used to filter the control body dimensions from a large number of variables. Decision tree-based data mining is used to cluster the data. The standard and structured sizing system can facilitate pattern grading and garment production. Moreover, it can exceed buying ratios and upgrade size allocations to retail segments.Keywords: anthropometric data, data mining, decision tree, garments manufacturing, sizing systems, ready-made garments
Procedia PDF Downloads 1334755 A Kruskal Based Heuxistic for the Application of Spanning Tree
Authors: Anjan Naidu
Abstract:
In this paper we first discuss the minimum spanning tree, then we use the Kruskal algorithm to obtain minimum spanning tree. Based on Kruskal algorithm we propose Kruskal algorithm to apply an application to find minimum cost applying the concept of spanning tree.Keywords: Minimum Spanning tree, algorithm, Heuxistic, application, classification of Sub 97K90
Procedia PDF Downloads 4444754 A Comparative Study on Automatic Feature Classification Methods of Remote Sensing Images
Authors: Lee Jeong Min, Lee Mi Hee, Eo Yang Dam
Abstract:
Geospatial feature extraction is a very important issue in the remote sensing research. In the meantime, the image classification based on statistical techniques, but, in recent years, data mining and machine learning techniques for automated image processing technology is being applied to remote sensing it has focused on improved results generated possibility. In this study, artificial neural network and decision tree technique is applied to classify the high-resolution satellite images, as compared to the MLC processing result is a statistical technique and an analysis of the pros and cons between each of the techniques.Keywords: remote sensing, artificial neural network, decision tree, maximum likelihood classification
Procedia PDF Downloads 3474753 Nearest Neighbor Investigate Using R+ Tree
Authors: Rutuja Desai
Abstract:
Search engine is fundamentally a framework used to search the data which is pertinent to the client via WWW. Looking close-by spot identified with the keywords is an imperative concept in developing web advances. For such kind of searching, extent pursuit or closest neighbor is utilized. In range search the forecast is made whether the objects meet to query object. Nearest neighbor is the forecast of the focuses close to the query set by the client. Here, the nearest neighbor methodology is utilized where Data recovery R+ tree is utilized rather than IR2 tree. The disadvantages of IR2 tree is: The false hit number can surpass the limit and the mark in Information Retrieval R-tree must have Voice over IP bit for each one of a kind word in W set is recouped by Data recovery R+ tree. The inquiry is fundamentally subordinate upon the key words and the geometric directions.Keywords: information retrieval, nearest neighbor search, keyword search, R+ tree
Procedia PDF Downloads 2894752 Use of Fractal Geometry in Machine Learning
Authors: Fuad M. Alkoot
Abstract:
The main component of a machine learning system is the classifier. Classifiers are mathematical models that can perform classification tasks for a specific application area. Additionally, many classifiers are combined using any of the available methods to reduce the classifier error rate. The benefits gained from the combination of multiple classifier designs has motivated the development of diverse approaches to multiple classifiers. We aim to investigate using fractal geometry to develop an improved classifier combiner. Initially we experiment with measuring the fractal dimension of data and use the results in the development of a combiner strategy.Keywords: fractal geometry, machine learning, classifier, fractal dimension
Procedia PDF Downloads 2164751 Expert Supporting System for Diagnosing Lymphoid Neoplasms Using Probabilistic Decision Tree Algorithm and Immunohistochemistry Profile Database
Authors: Yosep Chong, Yejin Kim, Jingyun Choi, Hwanjo Yu, Eun Jung Lee, Chang Suk Kang
Abstract:
For the past decades, immunohistochemistry (IHC) has been playing an important role in the diagnosis of human neoplasms, by helping pathologists to make a clearer decision on differential diagnosis, subtyping, personalized treatment plan, and finally prognosis prediction. However, the IHC performed in various tumors of daily practice often shows conflicting and very challenging results to interpret. Even comprehensive diagnosis synthesizing clinical, histologic and immunohistochemical findings can be helpless in some twisted cases. Another important issue is that the IHC data is increasing exponentially and more and more information have to be taken into account. For this reason, we reached an idea to develop an expert supporting system to help pathologists to make a better decision in diagnosing human neoplasms with IHC results. We gave probabilistic decision tree algorithm and tested the algorithm with real case data of lymphoid neoplasms, in which the IHC profile is more important to make a proper diagnosis than other human neoplasms. We designed probabilistic decision tree based on Bayesian theorem, program computational process using MATLAB (The MathWorks, Inc., USA) and prepared IHC profile database (about 104 disease category and 88 IHC antibodies) based on WHO classification by reviewing the literature. The initial probability of each neoplasm was set with the epidemiologic data of lymphoid neoplasm in Korea. With the IHC results of 131 patients sequentially selected, top three presumptive diagnoses for each case were made and compared with the original diagnoses. After the review of the data, 124 out of 131 were used for final analysis. As a result, the presumptive diagnoses were concordant with the original diagnoses in 118 cases (93.7%). The major reason of discordant cases was that the similarity of the IHC profile between two or three different neoplasms. The expert supporting system algorithm presented in this study is in its elementary stage and need more optimization using more advanced technology such as deep-learning with data of real cases, especially in differentiating T-cell lymphomas. Although it needs more refinement, it may be used to aid pathological decision making in future. A further application to determine IHC antibodies for a certain subset of differential diagnoses might be possible in near future.Keywords: database, expert supporting system, immunohistochemistry, probabilistic decision tree
Procedia PDF Downloads 2244750 Sentiment Analysis of Ensemble-Based Classifiers for E-Mail Data
Authors: Muthukumarasamy Govindarajan
Abstract:
Detection of unwanted, unsolicited mails called spam from email is an interesting area of research. It is necessary to evaluate the performance of any new spam classifier using standard data sets. Recently, ensemble-based classifiers have gained popularity in this domain. In this research work, an efficient email filtering approach based on ensemble methods is addressed for developing an accurate and sensitive spam classifier. The proposed approach employs Naive Bayes (NB), Support Vector Machine (SVM) and Genetic Algorithm (GA) as base classifiers along with different ensemble methods. The experimental results show that the ensemble classifier was performing with accuracy greater than individual classifiers, and also hybrid model results are found to be better than the combined models for the e-mail dataset. The proposed ensemble-based classifiers turn out to be good in terms of classification accuracy, which is considered to be an important criterion for building a robust spam classifier.Keywords: accuracy, arcing, bagging, genetic algorithm, Naive Bayes, sentiment mining, support vector machine
Procedia PDF Downloads 1424749 Evaluation of the Effect of Learning Disabilities and Accommodations on the Prediction of the Exam Performance: Ordinal Decision-Tree Algorithm
Abstract:
Providing students with learning disabilities (LD) with extra time to grant them equal access to the exam is a necessary but insufficient condition to compensate for their LD; there should also be a clear indication that the additional time was actually used. For example, if students with LD use more time than students without LD and yet receive lower grades, this may indicate that a different accommodation is required. If they achieve higher grades but use the same amount of time, then the effectiveness of the accommodation has not been demonstrated. The main goal of this study is to evaluate the effect of including parameters related to LD and extended exam time, along with other commonly-used characteristics (e.g., student background and ability measures such as high-school grades), on the ability of ordinal decision-tree algorithms to predict exam performance. We use naturally-occurring data collected from hundreds of undergraduate engineering students. The sub-goals are i) to examine the improvement in prediction accuracy when the indicator of exam performance includes 'actual time used' in addition to the conventional indicator (exam grade) employed in most research; ii) to explore the effectiveness of extended exam time on exam performance for different courses and for LD students with different profiles (i.e., sets of characteristics). This is achieved by using the patterns (i.e., subgroups) generated by the algorithms to identify pairs of subgroups that differ in just one characteristic (e.g., course or type of LD) but have different outcomes in terms of exam performance (grade and time used). Since grade and time used to exhibit an ordering form, we propose a method based on ordinal decision-trees, which applies a weighted information-gain ratio (WIGR) measure for selecting the classifying attributes. Unlike other known ordinal algorithms, our method does not assume monotonicity in the data. The proposed WIGR is an extension of an information-theoretic measure, in the sense that it adjusts to the case of an ordinal target and takes into account the error severity between two different target classes. Specifically, we use ordinal C4.5, random-forest, and AdaBoost algorithms, as well as an ensemble technique composed of ordinal and non-ordinal classifiers. Firstly, we find that the inclusion of LD and extended exam-time parameters improves prediction of exam performance (compared to specifications of the algorithms that do not include these variables). Secondly, when the indicator of exam performance includes 'actual time used' together with grade (as opposed to grade only), the prediction accuracy improves. Thirdly, our subgroup analyses show clear differences in the effect of extended exam time on exam performance among different courses and different student profiles. From a methodological perspective, we find that the ordinal decision-tree based algorithms outperform their conventional, non-ordinal counterparts. Further, we demonstrate that the ensemble-based approach leverages the strengths of each type of classifier (ordinal and non-ordinal) and yields better performance than each classifier individually.Keywords: actual exam time usage, ensemble learning, learning disabilities, ordinal classification, time extension
Procedia PDF Downloads 1004748 Electroencephalogram Based Alzheimer Disease Classification using Machine and Deep Learning Methods
Authors: Carlos Roncero-Parra, Alfonso Parreño-Torres, Jorge Mateo Sotos, Alejandro L. Borja
Abstract:
In this research, different methods based on machine/deep learning algorithms are presented for the classification and diagnosis of patients with mental disorders such as alzheimer. For this purpose, the signals obtained from 32 unipolar electrodes identified by non-invasive EEG were examined, and their basic properties were obtained. More specifically, different well-known machine learning based classifiers have been used, i.e., support vector machine (SVM), Bayesian linear discriminant analysis (BLDA), decision tree (DT), Gaussian Naïve Bayes (GNB), K-nearest neighbor (KNN) and Convolutional Neural Network (CNN). A total of 668 patients from five different hospitals have been studied in the period from 2011 to 2021. The best accuracy is obtained was around 93 % in both ADM and ADA classifications. It can be concluded that such a classification will enable the training of algorithms that can be used to identify and classify different mental disorders with high accuracy.Keywords: alzheimer, machine learning, deep learning, EEG
Procedia PDF Downloads 1264747 EEG-Based Screening Tool for School Student’s Brain Disorders Using Machine Learning Algorithms
Authors: Abdelrahman A. Ramzy, Bassel S. Abdallah, Mohamed E. Bahgat, Sarah M. Abdelkader, Sherif H. ElGohary
Abstract:
Attention-Deficit/Hyperactivity Disorder (ADHD), epilepsy, and autism affect millions of children worldwide, many of which are undiagnosed despite the fact that all of these disorders are detectable in early childhood. Late diagnosis can cause severe problems due to the late treatment and to the misconceptions and lack of awareness as a whole towards these disorders. Moreover, electroencephalography (EEG) has played a vital role in the assessment of neural function in children. Therefore, quantitative EEG measurement will be utilized as a tool for use in the evaluation of patients who may have ADHD, epilepsy, and autism. We propose a screening tool that uses EEG signals and machine learning algorithms to detect these disorders at an early age in an automated manner. The proposed classifiers used with epilepsy as a step taken for the work done so far, provided an accuracy of approximately 97% using SVM, Naïve Bayes and Decision tree, while 98% using KNN, which gives hope for the work yet to be conducted.Keywords: ADHD, autism, epilepsy, EEG, SVM
Procedia PDF Downloads 1904746 Performance Analysis with the Combination of Visualization and Classification Technique for Medical Chatbot
Authors: Shajida M., Sakthiyadharshini N. P., Kamalesh S., Aswitha B.
Abstract:
Natural Language Processing (NLP) continues to play a strategic part in complaint discovery and medicine discovery during the current epidemic. This abstract provides an overview of performance analysis with a combination of visualization and classification techniques of NLP for a medical chatbot. Sentiment analysis is an important aspect of NLP that is used to determine the emotional tone behind a piece of text. This technique has been applied to various domains, including medical chatbots. In this, we have compared the combination of the decision tree with heatmap and Naïve Bayes with Word Cloud. The performance of the chatbot was evaluated using accuracy, and the results indicate that the combination of visualization and classification techniques significantly improves the chatbot's performance.Keywords: sentimental analysis, NLP, medical chatbot, decision tree, heatmap, naïve bayes, word cloud
Procedia PDF Downloads 734745 Spatial Relationship of Drug Smuggling Based on Geographic Information System Knowledge Discovery Using Decision Tree Algorithm
Authors: S. Niamkaeo, O. Robert, O. Chaowalit
Abstract:
In this investigation, we focus on discovering spatial relationship of drug smuggling along the northern border of Thailand. Thailand is no longer a drug production site, but Thailand is still one of the major drug trafficking hubs due to its topographic characteristics facilitating drug smuggling from neighboring countries. Our study areas cover three districts (Mae-jan, Mae-fahluang, and Mae-sai) in Chiangrai city and four districts (Chiangdao, Mae-eye, Chaiprakarn, and Wienghang) in Chiangmai city where drug smuggling of methamphetamine crystal and amphetamine occurs mostly. The data on drug smuggling incidents from 2011 to 2017 was collected from several national and local published news. Geo-spatial drug smuggling database was prepared. Decision tree algorithm was applied in order to discover the spatial relationship of factors related to drug smuggling, which was converted into rules using rule-based system. The factors including land use type, smuggling route, season and distance within 500 meters from check points were found that they were related to drug smuggling in terms of rules-based relationship. It was illustrated that drug smuggling was occurred mostly in forest area in winter. Drug smuggling exhibited was discovered mainly along topographic road where check points were not reachable. This spatial relationship of drug smuggling could support the Thai Office of Narcotics Control Board in surveillance drug smuggling.Keywords: decision tree, drug smuggling, Geographic Information System, GIS knowledge discovery, rule-based system
Procedia PDF Downloads 1694744 Classification Using Worldview-2 Imagery of Giant Panda Habitat in Wolong, Sichuan Province, China
Authors: Yunwei Tang, Linhai Jing, Hui Li, Qingjie Liu, Xiuxia Li, Qi Yan, Haifeng Ding
Abstract:
The giant panda (Ailuropoda melanoleuca) is an endangered species, mainly live in central China, where bamboos act as the main food source of wild giant pandas. Knowledge of spatial distribution of bamboos therefore becomes important for identifying the habitat of giant pandas. There have been ongoing studies for mapping bamboos and other tree species using remote sensing. WorldView-2 (WV-2) is the first high resolution commercial satellite with eight Multi-Spectral (MS) bands. Recent studies demonstrated that WV-2 imagery has a high potential in classification of tree species. The advanced classification techniques are important for utilising high spatial resolution imagery. It is generally agreed that object-based image analysis is a more desirable method than pixel-based analysis in processing high spatial resolution remotely sensed data. Classifiers that use spatial information combined with spectral information are known as contextual classifiers. It is suggested that contextual classifiers can achieve greater accuracy than non-contextual classifiers. Thus, spatial correlation can be incorporated into classifiers to improve classification results. The study area is located at Wuyipeng area in Wolong, Sichuan Province. The complex environment makes it difficult for information extraction since bamboos are sparsely distributed, mixed with brushes, and covered by other trees. Extensive fieldworks in Wuyingpeng were carried out twice. The first one was on 11th June, 2014, aiming at sampling feature locations for geometric correction and collecting training samples for classification. The second fieldwork was on 11th September, 2014, for the purposes of testing the classification results. In this study, spectral separability analysis was first performed to select appropriate MS bands for classification. Also, the reflectance analysis provided information for expanding sample points under the circumstance of knowing only a few. Then, a spatially weighted object-based k-nearest neighbour (k-NN) classifier was applied to the selected MS bands to identify seven land cover types (bamboo, conifer, broadleaf, mixed forest, brush, bare land, and shadow), accounting for spatial correlation within classes using geostatistical modelling. The spatially weighted k-NN method was compared with three alternatives: the traditional k-NN classifier, the Support Vector Machine (SVM) method and the Classification and Regression Tree (CART). Through field validation, it was proved that the classification result obtained using the spatially weighted k-NN method has the highest overall classification accuracy (77.61%) and Kappa coefficient (0.729); the producer’s accuracy and user’s accuracy achieve 81.25% and 95.12% for the bamboo class, respectively, also higher than the other methods. Photos of tree crowns were taken at sample locations using a fisheye camera, so the canopy density could be estimated. It is found that it is difficult to identify bamboo in the areas with a large canopy density (over 0.70); it is possible to extract bamboos in the areas with a median canopy density (from 0.2 to 0.7) and in a sparse forest (canopy density is less than 0.2). In summary, this study explores the ability of WV-2 imagery for bamboo extraction in a mountainous region in Sichuan. The study successfully identified the bamboo distribution, providing supporting knowledge for assessing the habitats of giant pandas.Keywords: bamboo mapping, classification, geostatistics, k-NN, worldview-2
Procedia PDF Downloads 3134743 Efficient Credit Card Fraud Detection Based on Multiple ML Algorithms
Authors: Neha Ahirwar
Abstract:
In the contemporary digital era, the rise of credit card fraud poses a significant threat to both financial institutions and consumers. As fraudulent activities become more sophisticated, there is an escalating demand for robust and effective fraud detection mechanisms. Advanced machine learning algorithms have become crucial tools in addressing this challenge. This paper conducts a thorough examination of the design and evaluation of a credit card fraud detection system, utilizing four prominent machine learning algorithms: random forest, logistic regression, decision tree, and XGBoost. The surge in digital transactions has opened avenues for fraudsters to exploit vulnerabilities within payment systems. Consequently, there is an urgent need for proactive and adaptable fraud detection systems. This study addresses this imperative by exploring the efficacy of machine learning algorithms in identifying fraudulent credit card transactions. The selection of random forest, logistic regression, decision tree, and XGBoost for scrutiny in this study is based on their documented effectiveness in diverse domains, particularly in credit card fraud detection. These algorithms are renowned for their capability to model intricate patterns and provide accurate predictions. Each algorithm is implemented and evaluated for its performance in a controlled environment, utilizing a diverse dataset comprising both genuine and fraudulent credit card transactions.Keywords: efficient credit card fraud detection, random forest, logistic regression, XGBoost, decision tree
Procedia PDF Downloads 664742 Monitoring Three-Dimensional Models of Tree and Forest by Using Digital Close-Range Photogrammetry
Authors: S. Y. Cicekli
Abstract:
In this study, tree-dimensional model of tree was created by using terrestrial close range photogrammetry. For this close range photos were taken. Photomodeler Pro 5 software was used for camera calibration and create three-dimensional model of trees. In first test, three-dimensional model of a tree was created, in the second test three-dimensional model of three trees were created. This study aim is creating three-dimensional model of trees and indicate the use of close-range photogrammetry in forestry. At the end of the study, three-dimensional model of tree and three trees were created. This study showed that usability of close-range photogrammetry for monitoring tree and forests three-dimensional model.Keywords: close- range photogrammetry, forest, tree, three-dimensional model
Procedia PDF Downloads 3894741 A Machine Learning Approach to Detecting Evasive PDF Malware
Authors: Vareesha Masood, Ammara Gul, Nabeeha Areej, Muhammad Asif Masood, Hamna Imran
Abstract:
The universal use of PDF files has prompted hackers to use them for malicious intent by hiding malicious codes in their victim’s PDF machines. Machine learning has proven to be the most efficient in identifying benign files and detecting files with PDF malware. This paper has proposed an approach using a decision tree classifier with parameters. A modern, inclusive dataset CIC-Evasive-PDFMal2022, produced by Lockheed Martin’s Cyber Security wing is used. It is one of the most reliable datasets to use in this field. We designed a PDF malware detection system that achieved 99.2%. Comparing the suggested model to other cutting-edge models in the same study field, it has a great performance in detecting PDF malware. Accordingly, we provide the fastest, most reliable, and most efficient PDF Malware detection approach in this paper.Keywords: PDF, PDF malware, decision tree classifier, random forest classifier
Procedia PDF Downloads 914740 Assessment of Planet Image for Land Cover Mapping Using Soft and Hard Classifiers
Authors: Lamyaa Gamal El-Deen Taha, Ashraf Sharawi
Abstract:
Planet image is a new data source from planet lab. This research is concerned with the assessment of Planet image for land cover mapping. Two pixel based classifiers and one subpixel based classifier were compared. Firstly, rectification of Planet image was performed. Secondly, a comparison between minimum distance, maximum likelihood and neural network classifications for classification of Planet image was performed. Thirdly, the overall accuracy of classification and kappa coefficient were calculated. Results indicate that neural network classification is best followed by maximum likelihood classifier then minimum distance classification for land cover mapping.Keywords: planet image, land cover mapping, rectification, neural network classification, multilayer perceptron, soft classifiers, hard classifiers
Procedia PDF Downloads 1874739 Machine Learning for Aiding Meningitis Diagnosis in Pediatric Patients
Authors: Karina Zaccari, Ernesto Cordeiro Marujo
Abstract:
This paper presents a Machine Learning (ML) approach to support Meningitis diagnosis in patients at a children’s hospital in Sao Paulo, Brazil. The aim is to use ML techniques to reduce the use of invasive procedures, such as cerebrospinal fluid (CSF) collection, as much as possible. In this study, we focus on predicting the probability of Meningitis given the results of a blood and urine laboratory tests, together with the analysis of pain or other complaints from the patient. We tested a number of different ML algorithms, including: Adaptative Boosting (AdaBoost), Decision Tree, Gradient Boosting, K-Nearest Neighbors (KNN), Logistic Regression, Random Forest and Support Vector Machines (SVM). Decision Tree algorithm performed best, with 94.56% and 96.18% accuracy for training and testing data, respectively. These results represent a significant aid to doctors in diagnosing Meningitis as early as possible and in preventing expensive and painful procedures on some children.Keywords: machine learning, medical diagnosis, meningitis detection, pediatric research
Procedia PDF Downloads 1504738 A Decision Support System to Detect the Lumbar Disc Disease on the Basis of Clinical MRI
Authors: Yavuz Unal, Kemal Polat, H. Erdinc Kocer
Abstract:
In this study, a decision support system comprising three stages has been proposed to detect the disc abnormalities of the lumbar region. In the first stage named the feature extraction, T2-weighted sagittal and axial Magnetic Resonance Images (MRI) were taken from 55 people and then 27 appearance and shape features were acquired from both sagittal and transverse images. In the second stage named the feature weighting process, k-means clustering based feature weighting (KMCBFW) proposed by Gunes et al. Finally, in the third stage named the classification process, the classifier algorithms including multi-layer perceptron (MLP- neural network), support vector machine (SVM), Naïve Bayes, and decision tree have been used to classify whether the subject has lumbar disc or not. In order to test the performance of the proposed method, the classification accuracy (%), sensitivity, specificity, precision, recall, f-measure, kappa value, and computation times have been used. The best hybrid model is the combination of k-means clustering based feature weighting and decision tree in the detecting of lumbar disc disease based on both sagittal and axial MR images.Keywords: lumbar disc abnormality, lumbar MRI, lumbar spine, hybrid models, hybrid features, k-means clustering based feature weighting
Procedia PDF Downloads 5204737 Determining of the Performance of Data Mining Algorithm Determining the Influential Factors and Prediction of Ischemic Stroke: A Comparative Study in the Southeast of Iran
Authors: Y. Mehdipour, S. Ebrahimi, A. Jahanpour, F. Seyedzaei, B. Sabayan, A. Karimi, H. Amirifard
Abstract:
Ischemic stroke is one of the common reasons for disability and mortality. The fourth leading cause of death in the world and the third in some other sources. Only 1/3 of the patients with ischemic stroke fully recover, 1/3 of them end in permanent disability and 1/3 face death. Thus, the use of predictive models to predict stroke has a vital role in reducing the complications and costs related to this disease. Thus, the aim of this study was to specify the effective factors and predict ischemic stroke with the help of DM methods. The present study was a descriptive-analytic study. The population was 213 cases from among patients referring to Ali ibn Abi Talib (AS) Hospital in Zahedan. Data collection tool was a checklist with the validity and reliability confirmed. This study used DM algorithms of decision tree for modeling. Data analysis was performed using SPSS-19 and SPSS Modeler 14.2. The results of the comparison of algorithms showed that CHAID algorithm with 95.7% accuracy has the best performance. Moreover, based on the model created, factors such as anemia, diabetes mellitus, hyperlipidemia, transient ischemic attacks, coronary artery disease, and atherosclerosis are the most effective factors in stroke. Decision tree algorithms, especially CHAID algorithm, have acceptable precision and predictive ability to determine the factors affecting ischemic stroke. Thus, by creating predictive models through this algorithm, will play a significant role in decreasing the mortality and disability caused by ischemic stroke.Keywords: data mining, ischemic stroke, decision tree, Bayesian network
Procedia PDF Downloads 1744736 Heart Failure Identification and Progression by Classifying Cardiac Patients
Authors: Muhammad Saqlain, Nazar Abbas Saqib, Muazzam A. Khan
Abstract:
Heart Failure (HF) has become the major health problem in our society. The prevalence of HF has increased as the patient’s ages and it is the major cause of the high mortality rate in adults. A successful identification and progression of HF can be helpful to reduce the individual and social burden from this syndrome. In this study, we use a real data set of cardiac patients to propose a classification model for the identification and progression of HF. The data set has divided into three age groups, namely young, adult, and old and then each age group have further classified into four classes according to patient’s current physical condition. Contemporary Data Mining classification algorithms have been applied to each individual class of every age group to identify the HF. Decision Tree (DT) gives the highest accuracy of 90% and outperform all other algorithms. Our model accurately diagnoses different stages of HF for each age group and it can be very useful for the early prediction of HF.Keywords: decision tree, heart failure, data mining, classification model
Procedia PDF Downloads 4024735 A Video Surveillance System Using an Ensemble of Simple Neural Network Classifiers
Authors: Rodrigo S. Moreira, Nelson F. F. Ebecken
Abstract:
This paper proposes a maritime vessel tracker composed of an ensemble of WiSARD weightless neural network classifiers. A failure detector analyzes vessel movement with a Kalman filter and corrects the tracking, if necessary, using FFT matching. The use of the WiSARD neural network to track objects is uncommon. The additional contributions of the present study include a performance comparison with four state-of-art trackers, an experimental study of the features that improve maritime vessel tracking, the first use of an ensemble of classifiers to track maritime vessels and a new quantization algorithm that compares the values of pixel pairs.Keywords: ram memory, WiSARD weightless neural network, object tracking, quantization
Procedia PDF Downloads 3104734 Determinant Factor of Farm Household Fruit Tree Planting: The Case of Habru Woreda, North Wollo
Authors: Getamesay Kassaye Dimru
Abstract:
The cultivation of fruit tree in degraded areas has two-fold importance. Firstly, it improves food availability and income, and secondly, it promotes the conservation of soil and water improving, in turn, the productivity of the land. The main objectives of this study are to identify the determinant of farmer's fruit trees plantation decision and to major fruit production challenges and opportunities of the study area. The analysis was made using primary data collected from 60 sample household selected randomly from the study area in 2016. The primary data was supplemented by data collected from a key informant. In addition to the descriptive statistics and statistical tests (Chi-square test and t-test), a logit model was employed to identify the determinant of fruit tree plantation decision. Drought, pest incidence, land degradation, lack of input, lack of capital and irrigation schemes maintenance, lack of misuse of irrigation water and limited agricultural personnel are the major production constraints identified. The opportunities that need to further exploited are better access to irrigation, main road access, endowment of preferred guava variety, experience of farmers, and proximity of the study area to research center. The result of logit model shows that from different factors hypothesized to determine fruit tree plantation decision, age of the household head accesses to market and perception of farmers about fruits' disease and pest resistance are found to be significant. The result has revealed important implications for the promotion of fruit production for both land degradation control and rehabilitation and increasing the livelihood of farming households.Keywords: degradation, fruit, irrigation, pest
Procedia PDF Downloads 2354733 A Dynamic Round Robin Routing for Z-Fat Tree
Authors: M. O. Adda
Abstract:
In this paper, we propose a topology called Zoned fat tree (Z-Fat tree) which is a further extension to the classical fat trees. The extension relates to the provision of extra degree of connectivity to maximize the number of deployed ports per routing nodes, and hence increases the bisection bandwidth especially for slimmed fat trees. The extra links, when classical routing is used, tend, in deterministic environment, to be under-utilized for some traffic patterns, hence achieving poor performance. We suggest two versions of a dynamic round robin scheme that outperforms the classical D-mod-k and S-mod-K routing and show by simulation that our proposal utilize all the extra added links to the classical fat tree, and achieve better performance for general applications.Keywords: deterministic routing, fat tree, interconnection, traffic pattern
Procedia PDF Downloads 4844732 Predictive Analysis of the Stock Price Market Trends with Deep Learning
Authors: Suraj Mehrotra
Abstract:
The stock market is a volatile, bustling marketplace that is a cornerstone of economics. It defines whether companies are successful or in spiral. A thorough understanding of it is important - many companies have whole divisions dedicated to analysis of both their stock and of rivaling companies. Linking the world of finance and artificial intelligence (AI), especially the stock market, has been a relatively recent development. Predicting how stocks will do considering all external factors and previous data has always been a human task. With the help of AI, however, machine learning models can help us make more complete predictions in financial trends. Taking a look at the stock market specifically, predicting the open, closing, high, and low prices for the next day is very hard to do. Machine learning makes this task a lot easier. A model that builds upon itself that takes in external factors as weights can predict trends far into the future. When used effectively, new doors can be opened up in the business and finance world, and companies can make better and more complete decisions. This paper explores the various techniques used in the prediction of stock prices, from traditional statistical methods to deep learning and neural networks based approaches, among other methods. It provides a detailed analysis of the techniques and also explores the challenges in predictive analysis. For the accuracy of the testing set, taking a look at four different models - linear regression, neural network, decision tree, and naïve Bayes - on the different stocks, Apple, Google, Tesla, Amazon, United Healthcare, Exxon Mobil, J.P. Morgan & Chase, and Johnson & Johnson, the naïve Bayes model and linear regression models worked best. For the testing set, the naïve Bayes model had the highest accuracy along with the linear regression model, followed by the neural network model and then the decision tree model. The training set had similar results except for the fact that the decision tree model was perfect with complete accuracy in its predictions, which makes sense. This means that the decision tree model likely overfitted the training set when used for the testing set.Keywords: machine learning, testing set, artificial intelligence, stock analysis
Procedia PDF Downloads 95