Search results for: classifiers comparison
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2179

Search results for: classifiers comparison

2149 Enhance the Power of Sentiment Analysis

Authors: Yu Zhang, Pedro Desouza

Abstract:

Since big data has become substantially more accessible and manageable due to the development of powerful tools for dealing with unstructured data, people are eager to mine information from social media resources that could not be handled in the past. Sentiment analysis, as a novel branch of text mining, has in the last decade become increasingly important in marketing analysis, customer risk prediction and other fields. Scientists and researchers have undertaken significant work in creating and improving their sentiment models. In this paper, we present a concept of selecting appropriate classifiers based on the features and qualities of data sources by comparing the performances of five classifiers with three popular social media data sources: Twitter, Amazon Customer Reviews, and Movie Reviews. We introduced a couple of innovative models that outperform traditional sentiment classifiers for these data sources, and provide insights on how to further improve the predictive power of sentiment analysis. The modeling and testing work was done in R and Greenplum in-database analytic tools.

Keywords: Sentiment Analysis, Social Media, Twitter, Amazon, Data Mining, Machine Learning, Text Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3476
2148 Rank-Based Chain-Mode Ensemble for Binary Classification

Authors: Chongya Song, Kang Yen, Alexander Pons, Jin Liu

Abstract:

In the field of machine learning, the ensemble has been employed as a common methodology to improve the performance upon multiple base classifiers. However, the true predictions are often canceled out by the false ones during consensus due to a phenomenon called “curse of correlation” which is represented as the strong interferences among the predictions produced by the base classifiers. In addition, the existing practices are still not able to effectively mitigate the problem of imbalanced classification. Based on the analysis on our experiment results, we conclude that the two problems are caused by some inherent deficiencies in the approach of consensus. Therefore, we create an enhanced ensemble algorithm which adopts a designed rank-based chain-mode consensus to overcome the two problems. In order to evaluate the proposed ensemble algorithm, we employ a well-known benchmark data set NSL-KDD (the improved version of dataset KDDCup99 produced by University of New Brunswick) to make comparisons between the proposed and 8 common ensemble algorithms. Particularly, each compared ensemble classifier uses the same 22 base classifiers, so that the differences in terms of the improvements toward the accuracy and reliability upon the base classifiers can be truly revealed. As a result, the proposed rank-based chain-mode consensus is proved to be a more effective ensemble solution than the traditional consensus approach, which outperforms the 8 ensemble algorithms by 20% on almost all compared metrices which include accuracy, precision, recall, F1-score and area under receiver operating characteristic curve.

Keywords: Consensus, curse of correlation, imbalanced classification, rank-based chain-mode ensemble.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 671
2147 Meta-Classification using SVM Classifiers for Text Documents

Authors: Daniel I. Morariu, Lucian N. Vintan, Volker Tresp

Abstract:

Text categorization is the problem of classifying text documents into a set of predefined classes. In this paper, we investigated three approaches to build a meta-classifier in order to increase the classification accuracy. The basic idea is to learn a metaclassifier to optimally select the best component classifier for each data point. The experimental results show that combining classifiers can significantly improve the accuracy of classification and that our meta-classification strategy gives better results than each individual classifier. For 7083 Reuters text documents we obtained a classification accuracies up to 92.04%.

Keywords: Meta-classification, Learning with Kernels, Support Vector Machine, and Performance Evaluation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1572
2146 Improved Tropical Wood Species Recognition System based on Multi-feature Extractor and Classifier

Authors: Marzuki Khalid, RubiyahYusof, AnisSalwaMohdKhairuddin

Abstract:

An automated wood recognition system is designed to classify tropical wood species.The wood features are extracted based on two feature extractors: Basic Grey Level Aura Matrix (BGLAM) technique and statistical properties of pores distribution (SPPD) technique. Due to the nonlinearity of the tropical wood species separation boundaries, a pre classification stage is proposed which consists ofKmeans clusteringand kernel discriminant analysis (KDA). Finally, Linear Discriminant Analysis (LDA) classifier and KNearest Neighbour (KNN) are implemented for comparison purposes. The study involves comparison of the system with and without pre classification using KNN classifier and LDA classifier.The results show that the inclusion of the pre classification stage has improved the accuracy of both the LDA and KNN classifiers by more than 12%.

Keywords: Tropical wood species, nonlinear data, featureextractors, classification

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1940
2145 Improving Classification in Bayesian Networks using Structural Learning

Authors: Hong Choon Ong

Abstract:

Naïve Bayes classifiers are simple probabilistic classifiers. Classification extracts patterns by using data file with a set of labeled training examples and is currently one of the most significant areas in data mining. However, Naïve Bayes assumes the independence among the features. Structural learning among the features thus helps in the classification problem. In this study, the use of structural learning in Bayesian Network is proposed to be applied where there are relationships between the features when using the Naïve Bayes. The improvement in the classification using structural learning is shown if there exist relationship between the features or when they are not independent.

Keywords: Bayesian Network, Classification, Naïve Bayes, Structural Learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2557
2144 Development of Fake News Model Using Machine Learning through Natural Language Processing

Authors: Sajjad Ahmed, Knut Hinkelmann, Flavio Corradini

Abstract:

Fake news detection research is still in the early stage as this is a relatively new phenomenon in the interest raised by society. Machine learning helps to solve complex problems and to build AI systems nowadays and especially in those cases where we have tacit knowledge or the knowledge that is not known. We used machine learning algorithms and for identification of fake news; we applied three classifiers; Passive Aggressive, Naïve Bayes, and Support Vector Machine. Simple classification is not completely correct in fake news detection because classification methods are not specialized for fake news. With the integration of machine learning and text-based processing, we can detect fake news and build classifiers that can classify the news data. Text classification mainly focuses on extracting various features of text and after that incorporating those features into classification. The big challenge in this area is the lack of an efficient way to differentiate between fake and non-fake due to the unavailability of corpora. We applied three different machine learning classifiers on two publicly available datasets. Experimental analysis based on the existing dataset indicates a very encouraging and improved performance.

Keywords: Fake news detection, types of fake news, machine learning, natural language processing, classification techniques.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1449
2143 Multi-Channel Information Fusion in C-OTDR Monitoring Systems: Various Approaches to Classify of Targeted Events

Authors: Andrey V. Timofeev

Abstract:

The paper presents new results concerning selection of optimal information fusion formula for ensembles of C-OTDR channels. The goal of information fusion is to create an integral classificator designed for effective classification of seismoacoustic target events. The LPBoost (LP-β and LP-B variants), the Multiple Kernel Learning, and Weighing of Inversely as Lipschitz Constants (WILC) approaches were compared. The WILC is a brand new approach to optimal fusion of Lipschitz Classifiers Ensembles. Results of practical usage are presented.

Keywords: Lipschitz Classifier, Classifiers Ensembles, LPBoost, C-OTDR systems, ν-OTDR systems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1610
2142 Detecting Email Forgery using Random Forests and Naïve Bayes Classifiers

Authors: Emad E Abdallah, A.F. Otoom, ArwaSaqer, Ola Abu-Aisheh, Diana Omari, Ghadeer Salem

Abstract:

As emails communications have no consistent authentication procedure to ensure the authenticity, we present an investigation analysis approach for detecting forged emails based on Random Forests and Naïve Bays classifiers. Instead of investigating the email headers, we use the body content to extract a unique writing style for all the possible suspects. Our approach consists of four main steps: (1) The cybercrime investigator extract different effective features including structural, lexical, linguistic, and syntactic evidence from previous emails for all the possible suspects, (2) The extracted features vectors are normalized to increase the accuracy rate. (3) The normalized features are then used to train the learning engine, (4) upon receiving the anonymous email (M); we apply the feature extraction process to produce a feature vector. Finally, using the machine learning classifiers the email is assigned to one of the suspects- whose writing style closely matches M. Experimental results on real data sets show the improved performance of the proposed method and the ability of identifying the authors with a very limited number of features.

Keywords: Digital investigation, cybercrimes, emails forensics, anonymous emails, writing style, and authorship analysis

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5203
2141 Ensemble Approach for Predicting Student's Academic Performance

Authors: L. A. Muhammad, M. S. Argungu

Abstract:

Educational data mining (EDM) has recorded substantial considerations. Techniques of data mining in one way or the other have been proposed to dig out out-of-sight knowledge in educational data. The result of the study got assists academic institutions in further enhancing their process of learning and methods of passing knowledge to students. Consequently, the performance of students boasts and the educational products are by no doubt enhanced. This study adopted a student performance prediction model premised on techniques of data mining with Students' Essential Features (SEF). SEF are linked to the learner's interactivity with the e-learning management system. The performance of the student's predictive model is assessed by a set of classifiers, viz. Bayes Network, Logistic Regression, and Reduce Error Pruning Tree (REP). Consequently, ensemble methods of Bagging, Boosting, and Random Forest (RF) are applied to improve the performance of these single classifiers. The study reveals that the result shows a robust affinity between learners' behaviors and their academic attainment. Result from the study shows that the REP Tree and its ensemble record the highest accuracy of 83.33% using SEF. Hence, in terms of the Receiver Operating Curve (ROC), boosting method of REP Tree records 0.903, which is the best. This result further demonstrates the dependability of the proposed model.

Keywords: Ensemble, bagging, Random Forest, boosting, data mining, classifiers, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 670
2140 Comparing SVM and Naïve Bayes Classifier for Automatic Microaneurysm Detections

Authors: A. Sopharak, B. Uyyanonvara, S. Barman

Abstract:

Diabetic retinopathy is characterized by the development of retinal microaneurysms. The damage can be prevented if disease is treated in its early stages. In this paper, we are comparing Support Vector Machine (SVM) and Naïve Bayes (NB) classifiers for automatic microaneurysm detection in images acquired through non-dilated pupils. The Nearest Neighbor classifier is used as a baseline for comparison. Detected microaneurysms are validated with expert ophthalmologists’ hand-drawn ground-truths. The sensitivity, specificity, precision and accuracy of each method are also compared.

Keywords: Diabetic retinopathy, microaneurysm, Naïve Bayes classifier, SVM classifier.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6054
2139 Evaluation of the Impact of Dataset Characteristics for Classification Problems in Biological Applications

Authors: Kanthida Kusonmano, Michael Netzer, Bernhard Pfeifer, Christian Baumgartner, Klaus R. Liedl, Armin Graber

Abstract:

Availability of high dimensional biological datasets such as from gene expression, proteomic, and metabolic experiments can be leveraged for the diagnosis and prognosis of diseases. Many classification methods in this area have been studied to predict disease states and separate between predefined classes such as patients with a special disease versus healthy controls. However, most of the existing research only focuses on a specific dataset. There is a lack of generic comparison between classifiers, which might provide a guideline for biologists or bioinformaticians to select the proper algorithm for new datasets. In this study, we compare the performance of popular classifiers, which are Support Vector Machine (SVM), Logistic Regression, k-Nearest Neighbor (k-NN), Naive Bayes, Decision Tree, and Random Forest based on mock datasets. We mimic common biological scenarios simulating various proportions of real discriminating biomarkers and different effect sizes thereof. The result shows that SVM performs quite stable and reaches a higher AUC compared to other methods. This may be explained due to the ability of SVM to minimize the probability of error. Moreover, Decision Tree with its good applicability for diagnosis and prognosis shows good performance in our experimental setup. Logistic Regression and Random Forest, however, strongly depend on the ratio of discriminators and perform better when having a higher number of discriminators.

Keywords: Classification, High dimensional data, Machine learning

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2330
2138 Meta Random Forests

Authors: Praveen Boinee, Alessandro De Angelis, Gian Luca Foresti

Abstract:

Leo Breimans Random Forests (RF) is a recent development in tree based classifiers and quickly proven to be one of the most important algorithms in the machine learning literature. It has shown robust and improved results of classifications on standard data sets. Ensemble learning algorithms such as AdaBoost and Bagging have been in active research and shown improvements in classification results for several benchmarking data sets with mainly decision trees as their base classifiers. In this paper we experiment to apply these Meta learning techniques to the random forests. We experiment the working of the ensembles of random forests on the standard data sets available in UCI data sets. We compare the original random forest algorithm with their ensemble counterparts and discuss the results.

Keywords: Random Forests [RF], ensembles, UCI.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2649
2137 Integration of Support Vector Machine and Bayesian Neural Network for Data Mining and Classification

Authors: Essam Al-Daoud

Abstract:

Several combinations of the preprocessing algorithms, feature selection techniques and classifiers can be applied to the data classification tasks. This study introduces a new accurate classifier, the proposed classifier consist from four components: Signal-to- Noise as a feature selection technique, support vector machine, Bayesian neural network and AdaBoost as an ensemble algorithm. To verify the effectiveness of the proposed classifier, seven well known classifiers are applied to four datasets. The experiments show that using the suggested classifier enhances the classification rates for all datasets.

Keywords: AdaBoost, Bayesian neural network, Signal-to-Noise, support vector machine, MCMC.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1977
2136 Breast Cancer Survivability Prediction via Classifier Ensemble

Authors: Mohamed Al-Badrashiny, Abdelghani Bellaachia

Abstract:

This paper presents a classifier ensemble approach for predicting the survivability of the breast cancer patients using the latest database version of the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute. The system consists of two main components; features selection and classifier ensemble components. The features selection component divides the features in SEER database into four groups. After that it tries to find the most important features among the four groups that maximizes the weighted average F-score of a certain classification algorithm. The ensemble component uses three different classifiers, each of which models different set of features from SEER through the features selection module. On top of them, another classifier is used to give the final decision based on the output decisions and confidence scores from each of the underlying classifiers. Different classification algorithms have been examined; the best setup found is by using the decision tree, Bayesian network, and Na¨ıve Bayes algorithms for the underlying classifiers and Na¨ıve Bayes for the classifier ensemble step. The system outperforms all published systems to date when evaluated against the exact same data of SEER (period of 1973-2002). It gives 87.39% weighted average F-score compared to 85.82% and 81.34% of the other published systems. By increasing the data size to cover the whole database (period of 1973-2014), the overall weighted average F-score jumps to 92.4% on the held out unseen test set.

Keywords: Classifier ensemble, breast cancer survivability, data mining, SEER.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1625
2135 Empirical and Indian Automotive Equity Portfolio Decision Support

Authors: P. Sankar, P. James Daniel Paul, Siddhant Sahu

Abstract:

A brief review of the empirical studies on the methodology of the stock market decision support would indicate that they are at a threshold of validating the accuracy of the traditional and the fuzzy, artificial neural network and the decision trees. Many researchers have been attempting to compare these models using various data sets worldwide. However, the research community is on the way to the conclusive confidence in the emerged models. This paper attempts to use the automotive sector stock prices from National Stock Exchange (NSE), India and analyze them for the intra-sectorial support for stock market decisions. The study identifies the significant variables and their lags which affect the price of the stocks using OLS analysis and decision tree classifiers.

Keywords: Indian Automotive Sector, Stock Market Decisions, Equity Portfolio Analysis, Decision Tree Classifiers, Statistical Data Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1990
2134 Diagnosis of the Heart Rhythm Disorders by Using Hybrid Classifiers

Authors: Sule Yucelbas, Gulay Tezel, Cuneyt Yucelbas, Seral Ozsen

Abstract:

In this study, it was tried to identify some heart rhythm disorders by electrocardiography (ECG) data that is taken from MIT-BIH arrhythmia database by subtracting the required features, presenting to artificial neural networks (ANN), artificial immune systems (AIS), artificial neural network based on artificial immune system (AIS-ANN) and particle swarm optimization based artificial neural network (PSO-NN) classifier systems. The main purpose of this study is to evaluate the performance of hybrid AIS-ANN and PSO-ANN classifiers with regard to the ANN and AIS. For this purpose, the normal sinus rhythm (NSR), atrial premature contraction (APC), sinus arrhythmia (SA), ventricular trigeminy (VTI), ventricular tachycardia (VTK) and atrial fibrillation (AF) data for each of the RR intervals were found. Then these data in the form of pairs (NSR-APC, NSR-SA, NSR-VTI, NSR-VTK and NSR-AF) is created by combining discrete wavelet transform which is applied to each of these two groups of data and two different data sets with 9 and 27 features were obtained from each of them after data reduction. Afterwards, the data randomly was firstly mixed within themselves, and then 4-fold cross validation method was applied to create the training and testing data. The training and testing accuracy rates and training time are compared with each other.

As a result, performances of the hybrid classification systems, AIS-ANN and PSO-ANN were seen to be close to the performance of the ANN system. Also, the results of the hybrid systems were much better than AIS, too. However, ANN had much shorter period of training time than other systems. In terms of training times, ANN was followed by PSO-ANN, AIS-ANN and AIS systems respectively. Also, the features that extracted from the data affected the classification results significantly.

Keywords: AIS, ANN, ECG, hybrid classifiers, PSO.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1873
2133 sEMG Interface Design for Locomotion Identification

Authors: Rohit Gupta, Ravinder Agarwal

Abstract:

Surface electromyographic (sEMG) signal has the potential to identify the human activities and intention. This potential is further exploited to control the artificial limbs using the sEMG signal from residual limbs of amputees. The paper deals with the development of multichannel cost efficient sEMG signal interface for research application, along with evaluation of proposed class dependent statistical approach of the feature selection method. The sEMG signal acquisition interface was developed using ADS1298 of Texas Instruments, which is a front-end interface integrated circuit for ECG application. Further, the sEMG signal is recorded from two lower limb muscles for three locomotions namely: Plane Walk (PW), Stair Ascending (SA), Stair Descending (SD). A class dependent statistical approach is proposed for feature selection and also its performance is compared with 12 preexisting feature vectors. To make the study more extensive, performance of five different types of classifiers are compared. The outcome of the current piece of work proves the suitability of the proposed feature selection algorithm for locomotion recognition, as compared to other existing feature vectors. The SVM Classifier is found as the outperformed classifier among compared classifiers with an average recognition accuracy of 97.40%. Feature vector selection emerges as the most dominant factor affecting the classification performance as it holds 51.51% of the total variance in classification accuracy. The results demonstrate the potentials of the developed sEMG signal acquisition interface along with the proposed feature selection algorithm.

Keywords: Classifiers, feature selection, locomotion, sEMG.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1440
2132 Bayes Net Classifiers for Prediction of Renal Graft Status and Survival Period

Authors: Jiakai Li, Gursel Serpen, Steven Selman, Matt Franchetti, Mike Riesen, Cynthia Schneider

Abstract:

This paper presents the development of a Bayesian belief network classifier for prediction of graft status and survival period in renal transplantation using the patient profile information prior to the transplantation. The objective was to explore feasibility of developing a decision making tool for identifying the most suitable recipient among the candidate pool members. The dataset was compiled from the University of Toledo Medical Center Hospital patients as reported to the United Network Organ Sharing, and had 1228 patient records for the period covering 1987 through 2009. The Bayes net classifiers were developed using the Weka machine learning software workbench. Two separate classifiers were induced from the data set, one to predict the status of the graft as either failed or living, and a second classifier to predict the graft survival period. The classifier for graft status prediction performed very well with a prediction accuracy of 97.8% and true positive values of 0.967 and 0.988 for the living and failed classes, respectively. The second classifier to predict the graft survival period yielded a prediction accuracy of 68.2% and a true positive rate of 0.85 for the class representing those instances with kidneys failing during the first year following transplantation. Simulation results indicated that it is feasible to develop a successful Bayesian belief network classifier for prediction of graft status, but not the graft survival period, using the information in UNOS database.

Keywords: Bayesian network classifier, renal transplantation, graft survival period, United Network for Organ Sharing

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2061
2131 Classifier Combination Approach in Motion Imagery Signals Processing for Brain Computer Interface

Authors: Homayoon Zarshenas, Mahdi Bamdad, Hadi Grailu, Akbar A. Shakoori

Abstract:

In this study we focus on improvement performance of a cue based Motor Imagery Brain Computer Interface (BCI). For this purpose, data fusion approach is used on results of different classifiers to make the best decision. At first step Distinction Sensitive Learning Vector Quantization method is used as a feature selection method to determine most informative frequencies in recorded signals and its performance is evaluated by frequency search method. Then informative features are extracted by packet wavelet transform. In next step 5 different types of classification methods are applied. The methodologies are tested on BCI Competition II dataset III, the best obtained accuracy is 85% and the best kappa value is 0.8. At final step ordered weighted averaging (OWA) method is used to provide a proper aggregation classifiers outputs. Using OWA enhanced system accuracy to 95% and kappa value to 0.9. Applying OWA just uses 50 milliseconds for performing calculation.

Keywords: BCI, EEG, Classifier, Fuzzy operator, OWA.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1822
2130 An SVM based Classification Method for Cancer Data using Minimum Microarray Gene Expressions

Authors: R. Mallika, V. Saravanan

Abstract:

This paper gives a novel method for improving classification performance for cancer classification with very few microarray Gene expression data. The method employs classification with individual gene ranking and gene subset ranking. For selection and classification, the proposed method uses the same classifier. The method is applied to three publicly available cancer gene expression datasets from Lymphoma, Liver and Leukaemia datasets. Three different classifiers namely Support vector machines-one against all (SVM-OAA), K nearest neighbour (KNN) and Linear Discriminant analysis (LDA) were tested and the results indicate the improvement in performance of SVM-OAA classifier with satisfactory results on all the three datasets when compared with the other two classifiers.

Keywords: Support vector machines-one against all, cancerclassification, Linear Discriminant analysis, K nearest neighbour, microarray gene expression, gene pair ranking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2507
2129 Wavelet - Based Classification of Outdoor Natural Scenes by Resilient Neural Network

Authors: Amitabh Wahi, Sundaramurthy S.

Abstract:

Natural outdoor scene classification is active and promising research area around the globe. In this study, the classification is carried out in two phases. In the first phase, the features are extracted from the images by wavelet decomposition method and stored in a database as feature vectors. In the second phase, the neural classifiers such as back-propagation neural network (BPNN) and resilient back-propagation neural network (RPNN) are employed for the classification of scenes. Four hundred color images are considered from MIT database of two classes as forest and street. A comparative study has been carried out on the performance of the two neural classifiers BPNN and RPNN on the increasing number of test samples. RPNN showed better classification results compared to BPNN on the large test samples.

Keywords: BPNN, Classification, Feature extraction, RPNN, Wavelet.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1909
2128 Comparison of Reliability Systems Based Uncertainty

Authors: A. Aissani, H. Benaoudia

Abstract:

Stochastic comparison has been an important direction of research in various area. This can be done by the use of the notion of stochastic ordering which gives qualitatitive rather than purely quantitative estimation of the system under study. In this paper we present applications of comparison based uncertainty related to entropy in Reliability analysis, for example to design better systems. These results can be used as a priori information in simulation studies.

Keywords: Uncertainty, Stochastic comparison, Reliability, serie's system, imperfect repair.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1212
2127 Analysis of a Population of Diabetic Patients Databases with Classifiers

Authors: Murat Koklu, Yavuz Unal

Abstract:

Data mining can be called as a technique to extract information from data. It is the process of obtaining hidden information and then turning it into qualified knowledge by statistical and artificial intelligence technique. One of its application areas is medical area to form decision support systems for diagnosis just by inventing meaningful information from given medical data. In this study a decision support system for diagnosis of illness that make use of data mining and three different artificial intelligence classifier algorithms namely Multilayer Perceptron, Naive Bayes Classifier and J.48. Pima Indian dataset of UCI Machine Learning Repository was used. This dataset includes urinary and blood test results of 768 patients. These test results consist of 8 different feature vectors. Obtained classifying results were compared with the previous studies. The suggestions for future studies were presented.

Keywords: Artificial Intelligence, Classifiers, Data Mining, Diabetic Patients.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5388
2126 Sentiment Analysis: Comparative Analysis of Multilingual Sentiment and Opinion Classification Techniques

Authors: Sannikumar Patel, Brian Nolan, Markus Hofmann, Philip Owende, Kunjan Patel

Abstract:

Sentiment analysis and opinion mining have become emerging topics of research in recent years but most of the work is focused on data in the English language. A comprehensive research and analysis are essential which considers multiple languages, machine translation techniques, and different classifiers. This paper presents, a comparative analysis of different approaches for multilingual sentiment analysis. These approaches are divided into two parts: one using classification of text without language translation and second using the translation of testing data to a target language, such as English, before classification. The presented research and results are useful for understanding whether machine translation should be used for multilingual sentiment analysis or building language specific sentiment classification systems is a better approach. The effects of language translation techniques, features, and accuracy of various classifiers for multilingual sentiment analysis is also discussed in this study.

Keywords: Cross-language analysis, machine learning, machine translation, sentiment analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1606
2125 Margin-Based Feed-Forward Neural Network Classifiers

Authors: Han Xiao, Xiaoyan Zhu

Abstract:

Margin-Based Principle has been proposed for a long time, it has been proved that this principle could reduce the structural risk and improve the performance in both theoretical and practical aspects. Meanwhile, feed-forward neural network is a traditional classifier, which is very hot at present with a deeper architecture. However, the training algorithm of feed-forward neural network is developed and generated from Widrow-Hoff Principle that means to minimize the squared error. In this paper, we propose a new training algorithm for feed-forward neural networks based on Margin-Based Principle, which could effectively promote the accuracy and generalization ability of neural network classifiers with less labelled samples and flexible network. We have conducted experiments on four UCI open datasets and achieved good results as expected. In conclusion, our model could handle more sparse labelled and more high-dimension dataset in a high accuracy while modification from old ANN method to our method is easy and almost free of work.

Keywords: Max-Margin Principle, Feed-Forward Neural Network, Classifier.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1694
2124 Evaluation of Robust Feature Descriptors for Texture Classification

Authors: Jia-Hong Lee, Mei-Yi Wu, Hsien-Tsung Kuo

Abstract:

Texture is an important characteristic in real and synthetic scenes. Texture analysis plays a critical role in inspecting surfaces and provides important techniques in a variety of applications. Although several descriptors have been presented to extract texture features, the development of object recognition is still a difficult task due to the complex aspects of texture. Recently, many robust and scaling-invariant image features such as SIFT, SURF and ORB have been successfully used in image retrieval and object recognition. In this paper, we have tried to compare the performance for texture classification using these feature descriptors with k-means clustering. Different classifiers including K-NN, Naive Bayes, Back Propagation Neural Network , Decision Tree and Kstar were applied in three texture image sets - UIUCTex, KTH-TIPS and Brodatz, respectively. Experimental results reveal SIFTS as the best average accuracy rate holder in UIUCTex, KTH-TIPS and SURF is advantaged in Brodatz texture set. BP neuro network works best in the test set classification among all used classifiers.

Keywords: Texture classification, texture descriptor, SIFT, SURF, ORB.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1552
2123 Semi-Automatic Method to Assist Expert for Association Rules Validation

Authors: Amdouni Hamida, Gammoudi Mohamed Mohsen

Abstract:

In order to help the expert to validate association rules extracted from data, some quality measures are proposed in the literature. We distinguish two categories: objective and subjective measures. The first one depends on a fixed threshold and on data quality from which the rules are extracted. The second one consists on providing to the expert some tools in the objective to explore and visualize rules during the evaluation step. However, the number of extracted rules to validate remains high. Thus, the manually mining rules task is very hard. To solve this problem, we propose, in this paper, a semi-automatic method to assist the expert during the association rule's validation. Our method uses rule-based classification as follow: (i) We transform association rules into classification rules (classifiers), (ii) We use the generated classifiers for data classification. (iii) We visualize association rules with their quality classification to give an idea to the expert and to assist him during validation process.

Keywords: Association rules, Rule-based classification, Classification quality, Validation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1752
2122 Discrimination of Alcoholic Subjects using Second Order Autoregressive Modelling of Brain Signals Evoked during Visual Stimulus Perception

Authors: Ramaswamy Palaniappan

Abstract:

In this paper, a second order autoregressive (AR) model is proposed to discriminate alcoholics using single trial gamma band Visual Evoked Potential (VEP) signals using 3 different classifiers: Simplified Fuzzy ARTMAP (SFA) neural network (NN), Multilayer-perceptron-backpropagation (MLP-BP) NN and Linear Discriminant (LD). Electroencephalogram (EEG) signals were recorded from alcoholic and control subjects during the presentation of visuals from Snodgrass and Vanderwart picture set. Single trial VEP signals were extracted from EEG signals using Elliptic filtering in the gamma band spectral range. A second order AR model was used as gamma band VEP exhibits pseudo-periodic behaviour and second order AR is optimal to represent this behaviour. This circumvents the requirement of having to use some criteria to choose the correct order. The averaged discrimination errors of 2.6%, 2.8% and 11.9% were given by LD, MLP-BP and SFA classifiers. The high LD discrimination results show the validity of the proposed method to discriminate between alcoholic subjects.

Keywords: Linear Discriminant, Neural Network, VisualEvoked Potential.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1575
2121 Enhanced Performance for Support Vector Machines as Multiclass Classifiers in Steel Surface Defect Detection

Authors: Ehsan Amid, Sina Rezaei Aghdam, Hamidreza Amindavar

Abstract:

Steel surface defect detection is essentially one of pattern recognition problems. Support Vector Machines (SVMs) are known as one of the most proper classifiers in this application. In this paper, we introduce a more accurate classification method by using SVMs as our final classifier of the inspection system. In this scheme, multiclass classification task is performed based on the "one-againstone" method and different kernels are utilized for each pair of the classes in multiclass classification of the different defects. In the proposed system, a decision tree is employed in the first stage for two-class classification of the steel surfaces to "defect" and "non-defect", in order to decrease the time complexity. Based on the experimental results, generated from over one thousand images, the proposed multiclass classification scheme is more accurate than the conventional methods and the overall system yields a sufficient performance which can meet the requirements in steel manufacturing.

Keywords: Steel Surface Defect Detection, Support Vector Machines, Kernel Methods.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1877
2120 Diagnosis of Diabetes Using Computer Methods: Soft Computing Methods for Diabetes Detection Using Iris

Authors: Piyush Samant, Ravinder Agarwal

Abstract:

Complementary and Alternative Medicine (CAM) techniques are quite popular and effective for chronic diseases. Iridology is more than 150 years old CAM technique which analyzes the patterns, tissue weakness, color, shape, structure, etc. for disease diagnosis. The objective of this paper is to validate the use of iridology for the diagnosis of the diabetes. The suggested model was applied in a systemic disease with ocular effects. 200 subject data of 100 each diabetic and non-diabetic were evaluated. Complete procedure was kept very simple and free from the involvement of any iridologist. From the normalized iris, the region of interest was cropped. All 63 features were extracted using statistical, texture analysis, and two-dimensional discrete wavelet transformation. A comparison of accuracies of six different classifiers has been presented. The result shows 89.66% accuracy by the random forest classifier.

Keywords: Complementary and alternative medicine, Iridology, iris, feature extraction, classification, disease prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1779