Search results for: binary classification tree
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1686

Search results for: binary classification tree

1326 Decomposition Method for Neural Multiclass Classification Problem

Authors: H. El Ayech, A. Trabelsi

Abstract:

In this article we are going to discuss the improvement of the multi classes- classification problem using multi layer Perceptron. The considered approach consists in breaking down the n-class problem into two-classes- subproblems. The training of each two-class subproblem is made independently; as for the phase of test, we are going to confront a vector that we want to classify to all two classes- models, the elected class will be the strongest one that won-t lose any competition with the other classes. Rates of recognition gotten with the multi class-s approach by two-class-s decomposition are clearly better that those gotten by the simple multi class-s approach.

Keywords: Artificial neural network, letter-recognition, Multi class Classification, Multi Layer Perceptron.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1534
1325 Pattern Classification of Back-Propagation Algorithm Using Exclusive Connecting Network

Authors: Insung Jung, Gi-Nam Wang

Abstract:

The objective of this paper is to a design of pattern classification model based on the back-propagation (BP) algorithm for decision support system. Standard BP model has done full connection of each node in the layers from input to output layers. Therefore, it takes a lot of computing time and iteration computing for good performance and less accepted error rate when we are doing some pattern generation or training the network. However, this model is using exclusive connection in between hidden layer nodes and output nodes. The advantage of this model is less number of iteration and better performance compare with standard back-propagation model. We simulated some cases of classification data and different setting of network factors (e.g. hidden layer number and nodes, number of classification and iteration). During our simulation, we found that most of simulations cases were satisfied by BP based using exclusive connection network model compared to standard BP. We expect that this algorithm can be available to identification of user face, analysis of data, mapping data in between environment data and information.

Keywords: Neural network, Back-propagation, classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1625
1324 A Parallel Implementation of the Reverse Converter for the Moduli Set {2n, 2n–1, 2n–1–1}

Authors: Mehdi Hosseinzadeh, Amir Sabbagh Molahosseini, Keivan Navi

Abstract:

In this paper, a new reverse converter for the moduli set {2n, 2n–1, 2n–1–1} is presented. We improved a previously introduced conversion algorithm for deriving an efficient hardware design for reverse converter. Hardware architecture of the proposed converter is based on carry-save adders and regular binary adders, without the requirement for modular adders. The presented design is faster than the latest introduced reverse converter for moduli set {2n, 2n–1, 2n–1–1}. Also, it has better performance than the reverse converters for the recently introduced moduli set {2n+1–1, 2n, 2n–1}

Keywords: Residue arithmetic, Residue number system, Residue-to-Binary converter, Reverse converter

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1274
1323 Feature Subset Selection Using Ant Colony Optimization

Authors: Ahmed Al-Ani

Abstract:

Feature selection is an important step in many pattern classification problems. It is applied to select a subset of features, from a much larger set, such that the selected subset is sufficient to perform the classification task. Due to its importance, the problem of feature selection has been investigated by many researchers. In this paper, a novel feature subset search procedure that utilizes the Ant Colony Optimization (ACO) is presented. The ACO is a metaheuristic inspired by the behavior of real ants in their search for the shortest paths to food sources. It looks for optimal solutions by considering both local heuristics and previous knowledge. When applied to two different classification problems, the proposed algorithm achieved very promising results.

Keywords: Ant Colony Optimization, ant systems, feature selection, pattern recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1553
1322 A Study of Classification Models to Predict Drill-Bit Breakage Using Degradation Signals

Authors: Bharatendra Rai

Abstract:

Cutting tools are widely used in manufacturing processes and drilling is the most commonly used machining process. Although drill-bits used in drilling may not be expensive, their breakage can cause damage to expensive work piece being drilled and at the same time has major impact on productivity. Predicting drill-bit breakage, therefore, is important in reducing cost and improving productivity. This study uses twenty features extracted from two degradation signals viz., thrust force and torque. The methodology used involves developing and comparing decision tree, random forest, and multinomial logistic regression models for classifying and predicting drill-bit breakage using degradation signals.

Keywords: Degradation signal, drill-bit breakage, random forest, multinomial logistic regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2205
1321 Classification and Resolving Urban Problems by Means of Fuzzy Approach

Authors: F. Habib, A. Shokoohi

Abstract:

Urban problems are problems of organized complexity. Thus, many models and scientific methods to resolve urban problems are failed. This study is concerned with proposing of a fuzzy system driven approach for classification and solving urban problems. The proposed study investigated mainly the selection of the inputs and outputs of urban systems for classification of urban problems. In this research, five categories of urban problems, respect to fuzzy system approach had been recognized: control, polytely, optimizing, open and decision making problems. Grounded Theory techniques were then applied to analyze the data and develop new solving method for each category. The findings indicate that the fuzzy system methods are powerful processes and analytic tools for helping planners to resolve urban complex problems. These tools can be successful where as others have failed because both incorporate or address uncertainty and risk; complexity and systems interacting with other systems.

Keywords: Classification, complexity, Fuzzy theory, urban problems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2078
1320 Customer Churn Prediction Using Four Machine Learning Algorithms Integrating Feature Selection and Normalization in the Telecom Sector

Authors: Alanoud Moraya Aldalan, Abdulaziz Almaleh

Abstract:

A crucial part of maintaining a customer-oriented business in the telecommunications industry is understanding the reasons and factors that lead to customer churn. Competition between telecom companies has greatly increased in recent years, which has made it more important to understand customers’ needs in this strong market. For those who are looking to turn over their service providers, understanding their needs is especially important. Predictive churn is now a mandatory requirement for retaining customers in the telecommunications industry. Machine learning can be used to accomplish this. Churn Prediction has become a very important topic in terms of machine learning classification in the telecommunications industry. Understanding the factors of customer churn and how they behave is very important to building an effective churn prediction model. This paper aims to predict churn and identify factors of customers’ churn based on their past service usage history. Aiming at this objective, the study makes use of feature selection, normalization, and feature engineering. Then, this study compared the performance of four different machine learning algorithms on the Orange dataset: Logistic Regression, Random Forest, Decision Tree, and Gradient Boosting. Evaluation of the performance was conducted by using the F1 score and ROC-AUC. Comparing the results of this study with existing models has proven to produce better results. The results showed the Gradients Boosting with feature selection technique outperformed in this study by achieving a 99% F1-score and 99% AUC, and all other experiments achieved good results as well.

Keywords: Machine Learning, Gradient Boosting, Logistic Regression, Churn, Random Forest, Decision Tree, ROC, AUC, F1-score.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 338
1319 Performance Analysis of Genetic Algorithm with kNN and SVM for Feature Selection in Tumor Classification

Authors: C. Gunavathi, K. Premalatha

Abstract:

Tumor classification is a key area of research in the field of bioinformatics. Microarray technology is commonly used in the study of disease diagnosis using gene expression levels. The main drawback of gene expression data is that it contains thousands of genes and a very few samples. Feature selection methods are used to select the informative genes from the microarray. These methods considerably improve the classification accuracy. In the proposed method, Genetic Algorithm (GA) is used for effective feature selection. Informative genes are identified based on the T-Statistics, Signal-to-Noise Ratio (SNR) and F-Test values. The initial candidate solutions of GA are obtained from top-m informative genes. The classification accuracy of k-Nearest Neighbor (kNN) method is used as the fitness function for GA. In this work, kNN and Support Vector Machine (SVM) are used as the classifiers. The experimental results show that the proposed work is suitable for effective feature selection. With the help of the selected genes, GA-kNN method achieves 100% accuracy in 4 datasets and GA-SVM method achieves in 5 out of 10 datasets. The GA with kNN and SVM methods are demonstrated to be an accurate method for microarray based tumor classification.

Keywords: F-Test, Gene Expression, Genetic Algorithm, k- Nearest-Neighbor, Microarray, Signal-to-Noise Ratio, Support Vector Machine, T-statistics, Tumor Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4494
1318 A Physical Theory of Information vs. a Mathematical Theory of Communication

Authors: Manouchehr Amiri

Abstract:

This article presents a general notion of physical bit information that is compatible with the basics of quantum mechanics and incorporates the Shannon entropy as a special case. This notion of physical information leads to the Binary Data Matrix model (BDM), which predicts the basic results of quantum mechanics, general relativity, and black hole thermodynamics. The compatibility of the model with holographic, information conservation, and Landauer’s principle is investigated. After deriving the “Bit Information principle” as a consequence of BDM, the fundamental equations of Planck, De Broglie, Bekenstein, and mass-energy equivalence are derived.

Keywords: Physical theory of information, binary data matrix model, Shannon information theory, bit information principle.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 60
1317 Development of an Ensemble Classification Model Based on Hybrid Filter-Wrapper Feature Selection for Email Phishing Detection

Authors: R. B. Ibrahim, M. S. Argungu, I. M. Mungadi

Abstract:

It is obvious in this present time, internet has become an indispensable part of human life since its inception. The Internet has provided diverse opportunities to make life so easy for human beings, through the adoption of various channels. Among these channels are email, internet banking, video conferencing, and the like. Email is one of the easiest means of communication hugely accepted among individuals and organizations globally. But over decades the security integrity of this platform has been challenged with malicious activities like Phishing. Email phishing is designed by phishers to fool the recipient into handing over sensitive personal information such as passwords, credit card numbers, account credentials, social security numbers, etc. This activity has caused a lot of financial damage to email users globally which has resulted in bankruptcy, sudden death of victims, and other health-related sicknesses. Although many methods have been proposed to detect email phishing, in this research, the results of multiple machine-learning methods for predicting email phishing have been compared with the use of filter-wrapper feature selection. It is worth noting that all three models performed substantially but one outperformed the other. The dataset used for these models is obtained from Kaggle online data repository, while three classifiers: decision tree, Naïve Bayes, and Logistic regression are ensemble (Bagging) respectively. Results from the study show that the Decision Tree (CART) bagging ensemble recorded the highest accuracy of 98.13% using PEF (Phishing Essential Features). This result further demonstrates the dependability of the proposed model.

Keywords: Ensemble, hybrid, filter-wrapper, phishing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 122
1316 Architectural Stratification and Woody Species Diversity of a Subtropical Forest Grown in a Limestone Habitat in Okinawa Island, Japan

Authors: S. M. Feroz, K. Yoshimura, A. Hagihara

Abstract:

The forest stand consisted of four layers. The species composition between the third and the bottom layers was almost similar, whereas it was almost exclusive between the top and the lower three layers. The values of Shannon-s index H' and Pielou-s index J ' tended to increase from the bottom layer upward, except for H' -value of the top layer. The values of H' and J ' were 4.21 bit and 0.73, respectively, for the total stand. High woody species diversity of the forest depended on large trees in the upper layers, which trend was different from a subtropical evergreen broadleaf forest grown in silicate habitat in the northern part of Okinawa Island. The spatial distribution of trees was overlapped between the third and the bottom layers, whereas it was independent or slightly exclusive between the top and the lower three layers. Mean tree weight of each layer decreased from the top toward the bottom layer, whereas the corresponding tree density increased from the top downward. This relationship was analogous to the process of self-thinning plant populations.

Keywords: Canopy multi-layering, limestone habitat, mean tree weight-density relationship, species diversity, subtropical forest.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1184
1315 Synthesis of Analogue to Camptothecine

Authors: Abdulkareem Hamid, Adam Daïch

Abstract:

Camptothecin (CPT) is a cytotoxic quinoline alkaloid, which inhibits the DNA enzyme topoisomerase I (topo I). It was discovered in 1966 by M. E. Wall and M. C. Wani in systematic screening of natural products for anticancer drugs. It was isolated from the bark and stem of Camptotheca acuminata (Camptotheca, Happy tree), a tree native in China. CPT showed remarkable anticancer activity in preliminary clinical trials but also low solubility and (high) adverse drug reaction. Because of these disadvantages synthetic and medicinal chemists have developed numerous syntheses of Camptothecine [1][2][3] and various derivatives to increase the benefits of the chemical, with good results. In our method CPT analogues has be six steps starting from available material DL Malic acid.

Keywords: Camptothecine, synthesis, analogue.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1575
1314 Tree Based Data Aggregation to Resolve Funneling Effect in Wireless Sensor Network

Authors: G. Rajesh, B. Vinayaga Sundaram, C. Aarthi

Abstract:

In wireless sensor network, sensor node transmits the sensed data to the sink node in multi-hop communication periodically. This high traffic induces congestion at the node which is present one-hop distance to the sink node. The packet transmission and reception rate of these nodes should be very high, when compared to other sensor nodes in the network. Therefore, the energy consumption of that node is very high and this effect is known as the “funneling effect”. The tree based-data aggregation technique (TBDA) is used to reduce the energy consumption of the node. The throughput of the overall performance shows a considerable decrease in the number of packet transmissions to the sink node. The proposed scheme, TBDA, avoids the funneling effect and extends the lifetime of the wireless sensor network. The average case time complexity for inserting the node in the tree is O(n log n) and for the worst case time complexity is O(n2).

Keywords: Data Aggregation, Funneling Effect, Traffic Congestion, Wireless Sensor Network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1276
1313 Resolving Dependency Ambiguity of Subordinate Clauses using Support Vector Machines

Authors: Sang-Soo Kim, Seong-Bae Park, Sang-Jo Lee

Abstract:

In this paper, we propose a method of resolving dependency ambiguities of Korean subordinate clauses based on Support Vector Machines (SVMs). Dependency analysis of clauses is well known to be one of the most difficult tasks in parsing sentences, especially in Korean. In order to solve this problem, we assume that the dependency relation of Korean subordinate clauses is the dependency relation among verb phrase, verb and endings in the clauses. As a result, this problem is represented as a binary classification task. In order to apply SVMs to this problem, we selected two kinds of features: static and dynamic features. The experimental results on STEP2000 corpus show that our system achieves the accuracy of 73.5%.

Keywords: Dependency analysis, subordinate clauses, binaryclassification, support vector machines.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1557
1312 Hybrid Anomaly Detection Using Decision Tree and Support Vector Machine

Authors: Elham Serkani, Hossein Gharaee Garakani, Naser Mohammadzadeh, Elaheh Vaezpour

Abstract:

Intrusion detection systems (IDS) are the main components of network security. These systems analyze the network events for intrusion detection. The design of an IDS is through the training of normal traffic data or attack. The methods of machine learning are the best ways to design IDSs. In the method presented in this article, the pruning algorithm of C5.0 decision tree is being used to reduce the features of traffic data used and training IDS by the least square vector algorithm (LS-SVM). Then, the remaining features are arranged according to the predictor importance criterion. The least important features are eliminated in the order. The remaining features of this stage, which have created the highest level of accuracy in LS-SVM, are selected as the final features. The features obtained, compared to other similar articles which have examined the selected features in the least squared support vector machine model, are better in the accuracy, true positive rate, and false positive. The results are tested by the UNSW-NB15 dataset.

Keywords: Intrusion detection system, decision tree, support vector machine, feature selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1194
1311 Discriminant Analysis as a Function of Predictive Learning to Select Evolutionary Algorithms in Intelligent Transportation System

Authors: Jorge A. Ruiz-Vanoye, Ocotlán Díaz-Parra, Alejandro Fuentes-Penna, Daniel Vélez-Díaz, Edith Olaco García

Abstract:

In this paper, we present the use of the discriminant analysis to select evolutionary algorithms that better solve instances of the vehicle routing problem with time windows. We use indicators as independent variables to obtain the classification criteria, and the best algorithm from the generic genetic algorithm (GA), random search (RS), steady-state genetic algorithm (SSGA), and sexual genetic algorithm (SXGA) as the dependent variable for the classification. The discriminant classification was trained with classic instances of the vehicle routing problem with time windows obtained from the Solomon benchmark. We obtained a classification of the discriminant analysis of 66.7%.

Keywords: Intelligent transportation systems, data-mining techniques, evolutionary algorithms, discriminant analysis, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1512
1310 Air Classification of Dust from Steel Converter Secondary De-dusting for Zinc Enrichment

Authors: C. Lanzerstorfer

Abstract:

The off-gas from the basic oxygen furnace (BOF), where pig iron is converted into steel, is treated in the primary ventilation system. This system is in full operation only during oxygen-blowing when the BOF converter vessel is in a vertical position. When pig iron and scrap are charged into the BOF and when slag or steel are tapped, the vessel is tilted. The generated emissions during charging and tapping cannot be captured by the primary off-gas system. To capture these emissions, a secondary ventilation system is usually installed. The emissions are captured by a canopy hood installed just above the converter mouth in tilted position. The aim of this study was to investigate the dependence of Zn and other components on the particle size of BOF secondary ventilation dust. Because of the high temperature of the BOF process it can be expected that Zn will be enriched in the fine dust fractions. If Zn is enriched in the fine fractions, classification could be applied to split the dust into two size fractions with a different content of Zn. For this air classification experiments with dust from the secondary ventilation system of a BOF were performed. The results show that Zn and Pb are highly enriched in the finest dust fraction. For Cd, Cu and Sb the enrichment is less. In contrast, the non-volatile metals Al, Fe, Mn and Ti were depleted in the fine fractions. Thus, air classification could be considered for the treatment of dust from secondary BOF off-gas cleaning.

Keywords: Air classification, converter dust, recycling, zinc.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1177
1309 Satellite Imagery Classification Based on Deep Convolution Network

Authors: Zhong Ma, Zhuping Wang, Congxin Liu, Xiangzeng Liu

Abstract:

Satellite imagery classification is a challenging problem with many practical applications. In this paper, we designed a deep convolution neural network (DCNN) to classify the satellite imagery. The contributions of this paper are twofold — First, to cope with the large-scale variance in the satellite image, we introduced the inception module, which has multiple filters with different size at the same level, as the building block to build our DCNN model. Second, we proposed a genetic algorithm based method to efficiently search the best hyper-parameters of the DCNN in a large search space. The proposed method is evaluated on the benchmark database. The results of the proposed hyper-parameters search method show it will guide the search towards better regions of the parameter space. Based on the found hyper-parameters, we built our DCNN models, and evaluated its performance on satellite imagery classification, the results show the classification accuracy of proposed models outperform the state of the art method.

Keywords: Satellite imagery classification, deep convolution network, genetic algorithm, hyper-parameter optimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2296
1308 Markov Random Field-Based Segmentation Algorithm for Detection of Land Cover Changes Using Uninhabited Aerial Vehicle Synthetic Aperture Radar Polarimetric Images

Authors: Mehrnoosh Omati, Mahmod Reza Sahebi

Abstract:

The information on land use/land cover changing plays an essential role for environmental assessment, planning and management in regional development. Remotely sensed imagery is widely used for providing information in many change detection applications. Polarimetric Synthetic aperture radar (PolSAR) image, with the discrimination capability between different scattering mechanisms, is a powerful tool for environmental monitoring applications. This paper proposes a new boundary-based segmentation algorithm as a fundamental step for land cover change detection. In this method, first, two PolSAR images are segmented using integration of marker-controlled watershed algorithm and coupled Markov random field (MRF). Then, object-based classification is performed to determine changed/no changed image objects. Compared with pixel-based support vector machine (SVM) classifier, this novel segmentation algorithm significantly reduces the speckle effect in PolSAR images and improves the accuracy of binary classification in object-based level. The experimental results on Uninhabited Aerial Vehicle Synthetic Aperture Radar (UAVSAR) polarimetric images show a 3% and 6% improvement in overall accuracy and kappa coefficient, respectively. Also, the proposed method can correctly distinguish homogeneous image parcels.

Keywords: Coupled Markov random field, environment, object-based analysis, Polarimetric SAR images.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 824
1307 Brainwave Classification for Brain Balancing Index (BBI) via 3D EEG Model Using k-NN Technique

Authors: N. Fuad, M. N. Taib, R. Jailani, M. E. Marwan

Abstract:

In this paper, the comparison between k-Nearest Neighbor (kNN) algorithms for classifying the 3D EEG model in brain balancing is presented. The EEG signal recording was conducted on 51 healthy subjects. Development of 3D EEG models involves pre-processing of raw EEG signals and construction of spectrogram images. Then, maximum PSD values were extracted as features from the model. There are three indexes for balanced brain; index 3, index 4 and index 5. There are significant different of the EEG signals due to the brain balancing index (BBI). Alpha-α (8–13 Hz) and beta-β (13–30 Hz) were used as input signals for the classification model. The k-NN classification result is 88.46% accuracy. These results proved that k-NN can be used in order to predict the brain balancing application.

Keywords: Brain balancing, kNN, power spectral density, 3D EEG model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2570
1306 Classification of Right and Left-Hand Movement Using Multi-Resolution Analysis Method

Authors: Nebi Gedik

Abstract:

The aim of the brain-computer interface studies on electroencephalogram (EEG) signals containing motor imagery is to extract the effective features that will provide the highest possible classification accuracy for the detection of the desired motor movement. However, achieving this goal is difficult as the most suitable frequency band and time frame vary from subject to subject. In this study, the classification success of the two-feature data obtained from raw EEG signals and the coefficients of the multi-resolution analysis method applied to the EEG signals were analyzed comparatively. The method was applied to several EEG channels (C3, Cz and C4) signals obtained from the EEG data set belonging to the publicly available BCI competition III.

Keywords: Motor imagery, EEG, wave atom transform, k-NN.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 541
1305 Meta Random Forests

Authors: Praveen Boinee, Alessandro De Angelis, Gian Luca Foresti

Abstract:

Leo Breimans Random Forests (RF) is a recent development in tree based classifiers and quickly proven to be one of the most important algorithms in the machine learning literature. It has shown robust and improved results of classifications on standard data sets. Ensemble learning algorithms such as AdaBoost and Bagging have been in active research and shown improvements in classification results for several benchmarking data sets with mainly decision trees as their base classifiers. In this paper we experiment to apply these Meta learning techniques to the random forests. We experiment the working of the ensembles of random forests on the standard data sets available in UCI data sets. We compare the original random forest algorithm with their ensemble counterparts and discuss the results.

Keywords: Random Forests [RF], ensembles, UCI.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2657
1304 Prediction Modeling of Alzheimer’s Disease and Its Prodromal Stages from Multimodal Data with Missing Values

Authors: M. Aghili, S. Tabarestani, C. Freytes, M. Shojaie, M. Cabrerizo, A. Barreto, N. Rishe, R. E. Curiel, D. Loewenstein, R. Duara, M. Adjouadi

Abstract:

A major challenge in medical studies, especially those that are longitudinal, is the problem of missing measurements which hinders the effective application of many machine learning algorithms. Furthermore, recent Alzheimer's Disease studies have focused on the delineation of Early Mild Cognitive Impairment (EMCI) and Late Mild Cognitive Impairment (LMCI) from cognitively normal controls (CN) which is essential for developing effective and early treatment methods. To address the aforementioned challenges, this paper explores the potential of using the eXtreme Gradient Boosting (XGBoost) algorithm in handling missing values in multiclass classification. We seek a generalized classification scheme where all prodromal stages of the disease are considered simultaneously in the classification and decision-making processes. Given the large number of subjects (1631) included in this study and in the presence of almost 28% missing values, we investigated the performance of XGBoost on the classification of the four classes of AD, NC, EMCI, and LMCI. Using 10-fold cross validation technique, XGBoost is shown to outperform other state-of-the-art classification algorithms by 3% in terms of accuracy and F-score. Our model achieved an accuracy of 80.52%, a precision of 80.62% and recall of 80.51%, supporting the more natural and promising multiclass classification.

Keywords: eXtreme Gradient Boosting, missing data, Alzheimer disease, early mild cognitive impairment, late mild cognitive impairment, multiclass classification, ADNI, support vector machine, random forest.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 904
1303 A Hybrid Data Mining Method for the Medical Classification of Chest Pain

Authors: Sung Ho Ha, Seong Hyeon Joo

Abstract:

Data mining techniques have been used in medical research for many years and have been known to be effective. In order to solve such problems as long-waiting time, congestion, and delayed patient care, faced by emergency departments, this study concentrates on building a hybrid methodology, combining data mining techniques such as association rules and classification trees. The methodology is applied to real-world emergency data collected from a hospital and is evaluated by comparing with other techniques. The methodology is expected to help physicians to make a faster and more accurate classification of chest pain diseases.

Keywords: Data mining, medical decisions, medical domainknowledge, chest pain.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2165
1302 On the Network Packet Loss Tolerance of SVM Based Activity Recognition

Authors: Gamze Uslu, Sebnem Baydere, Alper K. Demir

Abstract:

In this study, data loss tolerance of Support Vector Machines (SVM) based activity recognition model and multi activity classification performance when data are received over a lossy wireless sensor network is examined. Initially, the classification algorithm we use is evaluated in terms of resilience to random data loss with 3D acceleration sensor data for sitting, lying, walking and standing actions. The results show that the proposed classification method can recognize these activities successfully despite high data loss. Secondly, the effect of differentiated quality of service performance on activity recognition success is measured with activity data acquired from a multi hop wireless sensor network, which introduces  high data loss. The effect of number of nodes on the reliability and multi activity classification success is demonstrated in simulation environment. To the best of our knowledge, the effect of data loss in a wireless sensor network on activity detection success rate of an SVM based classification algorithm has not been studied before.

Keywords: Activity recognition, support vector machines, acceleration sensor, wireless sensor networks, packet loss.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2828
1301 Dynamic Attribute Dependencies in Relational Attribute Grammars

Authors: K. Barbar, M. Dehayni, A. Awada, M. Smaili

Abstract:

Considering the theory of attribute grammars, we use logical formulas instead of traditional functional semantic rules. Following the decoration of a derivation tree, a suitable algorithm should maintain the consistency of the formulas together with the evaluation of the attributes. This may be a Prolog-like resolution, but this paper examines a somewhat different strategy, based on production specialization, local consistency and propagation: given a derivation tree, it is interactively decorated, i.e. incrementally checked and evaluated. The non-directed dependencies are dynamically directed during attribute evaluation.

Keywords: Input/Output attribute grammars, local consistency, logical programming, propagation, relational attribute grammars.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1421
1300 Neural Network Based Speech to Text in Malay Language

Authors: H. F. A. Abdul Ghani, R. R. Porle

Abstract:

Speech to text in Malay language is a system that converts Malay speech into text. The Malay language recognition system is still limited, thus, this paper aims to investigate the performance of ten Malay words obtained from the online Malay news. The methodology consists of three stages, which are preprocessing, feature extraction, and speech classification. In preprocessing stage, the speech samples are filtered using pre emphasis. After that, feature extraction method is applied to the samples using Mel Frequency Cepstrum Coefficient (MFCC). Lastly, speech classification is performed using Feedforward Neural Network (FFNN). The accuracy of the classification is further investigated based on the hidden layer size. From experimentation, the classifier with 40 hidden neurons shows the highest classification rate which is 94%.  

Keywords: Feed-Forward Neural Network, FFNN, Malay speech recognition, Mel Frequency Cepstrum Coefficient, MFCC, speech-to-text.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 678
1299 Unambiguous Signal Acquisition Based On Recombination of Sub-Correlations of BOC Signals

Authors: Hongdeuk Kim, Youngpo Lee, Seokho Yoon

Abstract:

Due to side-peaks of autocorrelation function, the binary offset carrier (BOC) signal acquisition suffers from an ambiguity when one of the side-peaks is acquired. In this paper, we first analyze that the BOC autocorrelation is made up of the sum of subcorrelations, and then, remove the side-peaks causing the ambiguity by recombining the sub-correlations. The proposed scheme is shown to remove the side-peaks completely. From numerical results, it is confirmed that the proposed scheme outperforms the conventional schemes in terms of the receiver operating characteristic and mean acquisition time.

Keywords: Binary offset carrier (BOC), acquisition, ambiguity problem, side-peak.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1544
1298 Sentiment Analysis: Comparative Analysis of Multilingual Sentiment and Opinion Classification Techniques

Authors: Sannikumar Patel, Brian Nolan, Markus Hofmann, Philip Owende, Kunjan Patel

Abstract:

Sentiment analysis and opinion mining have become emerging topics of research in recent years but most of the work is focused on data in the English language. A comprehensive research and analysis are essential which considers multiple languages, machine translation techniques, and different classifiers. This paper presents, a comparative analysis of different approaches for multilingual sentiment analysis. These approaches are divided into two parts: one using classification of text without language translation and second using the translation of testing data to a target language, such as English, before classification. The presented research and results are useful for understanding whether machine translation should be used for multilingual sentiment analysis or building language specific sentiment classification systems is a better approach. The effects of language translation techniques, features, and accuracy of various classifiers for multilingual sentiment analysis is also discussed in this study.

Keywords: Cross-language analysis, machine learning, machine translation, sentiment analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1608
1297 Data Mining Using Learning Automata

Authors: M. R. Aghaebrahimi, S. H. Zahiri, M. Amiri

Abstract:

In this paper a data miner based on the learning automata is proposed and is called LA-miner. The LA-miner extracts classification rules from data sets automatically. The proposed algorithm is established based on the function optimization using learning automata. The experimental results on three benchmarks indicate that the performance of the proposed LA-miner is comparable with (sometimes better than) the Ant-miner (a data miner algorithm based on the Ant Colony optimization algorithm) and CNZ (a well-known data mining algorithm for classification).

Keywords: Data mining, Learning automata, Classification rules, Knowledge discovery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1901