Search results for: Multinomial dirichlet classification model
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 8248

Search results for: Multinomial dirichlet classification model

8068 Gene Expression Signature for Classification of Metastasis Positive and Negative Oral Cancer in Homosapiens

Authors: A. Shukla, A. Tarsauliya, R. Tiwari, S. Sharma

Abstract:

Cancer classification to their corresponding cohorts has been key area of research in bioinformatics aiming better prognosis of the disease. High dimensionality of gene data has been makes it a complex task and requires significance data identification technique in order to reducing the dimensionality and identification of significant information. In this paper, we have proposed a novel approach for classification of oral cancer into metastasis positive and negative patients. We have used significance analysis of microarrays (SAM) for identifying significant genes which constitutes gene signature. 3 different gene signatures were identified using SAM from 3 different combination of training datasets and their classification accuracy was calculated on corresponding testing datasets using k-Nearest Neighbour (kNN), Fuzzy C-Means Clustering (FCM), Support Vector Machine (SVM) and Backpropagation Neural Network (BPNN). A final gene signature of only 9 genes was obtained from above 3 individual gene signatures. 9 gene signature-s classification capability was compared using same classifiers on same testing datasets. Results obtained from experimentation shows that 9 gene signature classified all samples in testing dataset accurately while individual genes could not classify all accurately.

Keywords: Cancer, Gene Signature, SAM, Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2044
8067 The Labeled Classification and its Application

Authors: M. Nemissi, H. Seridi, H. Akdag

Abstract:

This paper presents and evaluates a new classification method that aims to improve classifiers performances and speed up their training process. The proposed approach, called labeled classification, seeks to improve convergence of the BP (Back propagation) algorithm through the addition of an extra feature (labels) to all training examples. To classify every new example, tests will be carried out each label. The simplicity of implementation is the main advantage of this approach because no modifications are required in the training algorithms. Therefore, it can be used with others techniques of acceleration and stabilization. In this work, two models of the labeled classification are proposed: the LMLP (Labeled Multi Layered Perceptron) and the LNFC (Labeled Neuro Fuzzy Classifier). These models are tested using Iris, wine, texture and human thigh databases to evaluate their performances.

Keywords: Artificial neural networks, Fusion of neural networkfuzzysystems, Learning theory, Pattern recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1385
8066 A Kernel Based Rejection Method for Supervised Classification

Authors: Abdenour Bounsiar, Edith Grall, Pierre Beauseroy

Abstract:

In this paper we are interested in classification problems with a performance constraint on error probability. In such problems if the constraint cannot be satisfied, then a rejection option is introduced. For binary labelled classification, a number of SVM based methods with rejection option have been proposed over the past few years. All of these methods use two thresholds on the SVM output. However, in previous works, we have shown on synthetic data that using thresholds on the output of the optimal SVM may lead to poor results for classification tasks with performance constraint. In this paper a new method for supervised classification with rejection option is proposed. It consists in two different classifiers jointly optimized to minimize the rejection probability subject to a given constraint on error rate. This method uses a new kernel based linear learning machine that we have recently presented. This learning machine is characterized by its simplicity and high training speed which makes the simultaneous optimization of the two classifiers computationally reasonable. The proposed classification method with rejection option is compared to a SVM based rejection method proposed in recent literature. Experiments show the superiority of the proposed method.

Keywords: rejection, Chow's rule, error-reject tradeoff, SupportVector Machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1421
8065 Classifying Students for E-Learning in Information Technology Course Using ANN

Authors: S. Areerachakul, N. Ployong, S. Na Songkla

Abstract:

This research’s objective is to select the model with most accurate value by using Neural Network Technique as a way to filter potential students who enroll in IT course by Electronic learning at Suan Suanadha Rajabhat University. It is designed to help students selecting the appropriate courses by themselves. The result showed that the most accurate model was 100 Folds Cross-validation which had 73.58% points of accuracy.

Keywords: Artificial neural network, classification, students.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1476
8064 Research on Reservoir Lithology Prediction Based on Residual Neural Network and Squeeze-and- Excitation Neural Network

Authors: Li Kewen, Su Zhaoxin, Wang Xingmou, Zhu Jian Bing

Abstract:

Conventional reservoir prediction methods ar not sufficient to explore the implicit relation between seismic attributes, and thus data utilization is low. In order to improve the predictive classification accuracy of reservoir lithology, this paper proposes a deep learning lithology prediction method based on ResNet (Residual Neural Network) and SENet (Squeeze-and-Excitation Neural Network). The neural network model is built and trained by using seismic attribute data and lithology data of Shengli oilfield, and the nonlinear mapping relationship between seismic attribute and lithology marker is established. The experimental results show that this method can significantly improve the classification effect of reservoir lithology, and the classification accuracy is close to 70%. This study can effectively predict the lithology of undrilled area and provide support for exploration and development.

Keywords: Convolutional neural network, lithology, prediction of reservoir lithology, seismic attributes.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 609
8063 Investigation of Wave Atom Sub-Bands via Breast Cancer Classification

Authors: Nebi Gedik, Ayten Atasoy

Abstract:

This paper investigates successful sub-bands of wave atom transform via classification of mammograms, when the coefficients of sub-bands are used as features. A computer-aided diagnosis system is constructed by using wave atom transform, support vector machine and k-nearest neighbor classifiers. Two-class classification is studied in detail using two data sets, separately. The successful sub-bands are determined according to the accuracy rates, coefficient numbers, and sensitivity rates.

Keywords: Breast cancer, wave atom transform, SVM, k-NN.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1041
8062 Optimal Multilayer Perceptron Structure For Classification of HIV Sub-Type Viruses

Authors: Zeyneb Kurt, Oguzhan Yavuz

Abstract:

The feature of HIV genome is in a wide range because of it is highly heterogeneous. Hence, the infection ability of the virus changes related with different chemokine receptors. From this point, R5 and X4 HIV viruses use CCR5 and CXCR5 coreceptors respectively while R5X4 viruses can utilize both coreceptors. Recently, in Bioinformatics, R5X4 viruses have been studied to classify by using the coreceptors of HIV genome. The aim of this study is to develop the optimal Multilayer Perceptron (MLP) for high classification accuracy of HIV sub-type viruses. To accomplish this purpose, the unit number in hidden layer was incremented one by one, from one to a particular number. The statistical data of R5X4, R5 and X4 viruses was preprocessed by the signal processing methods. Accessible residues of these virus sequences were extracted and modeled by Auto-Regressive Model (AR) due to the dimension of residues is large and different from each other. Finally the pre-processed dataset was used to evolve MLP with various number of hidden units to determine R5X4 viruses. Furthermore, ROC analysis was used to figure out the optimal MLP structure.

Keywords: Multilayer Perceptron, Auto-Regressive Model, HIV, ROC Analysis

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1413
8061 Using Data Mining Technique for Scholarship Disbursement

Authors: J. K. Alhassan, S. A. Lawal

Abstract:

This work is on decision tree-based classification for the disbursement of scholarship. Tree-based data mining classification technique is used in other to determine the generic rule to be used to disburse the scholarship. The system based on the defined rules from the tree is able to determine the class (status) to which an applicant shall belong whether Granted or Not Granted. The applicants that fall to the class of granted denote a successful acquirement of scholarship while those in not granted class are unsuccessful in the scheme. An algorithm that can be used to classify the applicants based on the rules from tree-based classification was also developed. The tree-based classification is adopted because of its efficiency, effectiveness, and easy to comprehend features. The system was tested with the data of National Information Technology Development Agency (NITDA) Abuja, a Parastatal of Federal Ministry of Communication Technology that is mandated to develop and regulate information technology in Nigeria. The system was found working according to the specification. It is therefore recommended for all scholarship disbursement organizations.

Keywords: Decision tree, classification, data mining, scholarship.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2116
8060 A Constrained Clustering Algorithm for the Classification of Industrial Ores

Authors: Luciano Nieddu, Giuseppe Manfredi

Abstract:

In this paper a Pattern Recognition algorithm based on a constrained version of the k-means clustering algorithm will be presented. The proposed algorithm is a non parametric supervised statistical pattern recognition algorithm, i.e. it works under very mild assumptions on the dataset. The performance of the algorithm will be tested, togheter with a feature extraction technique that captures the information on the closed two-dimensional contour of an image, on images of industrial mineral ores.

Keywords: K-means, Industrial ores classification, Invariant Features, Supervised Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1348
8059 Use of Gaussian-Euclidean Hybrid Function Based Artificial Immune System for Breast Cancer Diagnosis

Authors: Cuneyt Yucelbas, Seral Ozsen, Sule Yucelbas, Gulay Tezel

Abstract:

Due to the fact that there exist only a small number of complex systems in artificial immune system (AIS) that work out nonlinear problems, nonlinear AIS approaches, among the well-known solution techniques, need to be developed. Gaussian function is usually used as similarity estimation in classification problems and pattern recognition. In this study, diagnosis of breast cancer, the second type of the most widespread cancer in women, was performed with different distance calculation functions that euclidean, gaussian and gaussian-euclidean hybrid function in the clonal selection model of classical AIS on Wisconsin Breast Cancer Dataset (WBCD), which was taken from the University of California, Irvine Machine-Learning Repository. We used 3-fold cross validation method to train and test the dataset. According to the results, the maximum test classification accuracy was reported as 97.35% by using of gaussian-euclidean hybrid function for fold-3. Also, mean of test classification accuracies for all of functions were obtained as 94.78%, 94.45% and 95.31% with use of euclidean, gaussian and gaussian-euclidean, respectively. With these results, gaussian-euclidean hybrid function seems to be a potential distance calculation method, and it may be considered as an alternative distance calculation method for hard nonlinear classification problems.

Keywords: Artificial Immune System, Breast Cancer Diagnosis, Euclidean Function, Gaussian Function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2091
8058 Machine Learning-Enabled Classification of Climbing Using Small Data

Authors: Nicholas Milburn, Yu Liang, Dalei Wu

Abstract:

Athlete performance scoring within the climbing domain presents interesting challenges as the sport does not have an objective way to assign skill. Assessing skill levels within any sport is valuable as it can be used to mark progress while training, and it can help an athlete choose appropriate climbs to attempt. Machine learning-based methods are popular for complex problems like this. The dataset available was composed of dynamic force data recorded during climbing; however, this dataset came with challenges such as data scarcity, imbalance, and it was temporally heterogeneous. Investigated solutions to these challenges include data augmentation, temporal normalization, conversion of time series to the spectral domain, and cross validation strategies. The investigated solutions to the classification problem included light weight machine classifiers KNN and SVM as well as the deep learning with CNN. The best performing model had an 80% accuracy. In conclusion, there seems to be enough information within climbing force data to accurately categorize climbers by skill.

Keywords: Classification, climbing, data imbalance, data scarcity, machine learning, time sequence.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 510
8057 Using Data Mining Techniques for Finding Cardiac Outlier Patients

Authors: Farhan Ismaeel Dakheel, Raoof Smko, K. Negrat, Abdelsalam Almarimi

Abstract:

In this paper we used data mining techniques to identify outlier patients who are using large amount of drugs over a long period of time. Any healthcare or health insurance system should deal with the quantities of drugs utilized by chronic diseases patients. In Kingdom of Bahrain, about 20% of health budget is spent on medications. For the managers of healthcare systems, there is no enough information about the ways of drug utilization by chronic diseases patients, is there any misuse or is there outliers patients. In this work, which has been done in cooperation with information department in the Bahrain Defence Force hospital; we select the data for Cardiac patients in the period starting from 1/1/2008 to December 31/12/2008 to be the data for the model in this paper. We used three techniques for finding the drug utilization for cardiac patients. First we applied a clustering technique, followed by measuring of clustering validity, and finally we applied a decision tree as classification algorithm. The clustering results is divided into three clusters according to the drug utilization, for 1603 patients, who received 15,806 prescriptions during this period can be partitioned into three groups, where 23 patients (2.59%) who received 1316 prescriptions (8.32%) are classified to be outliers. The classification algorithm shows that the use of average drug utilization and the age, and the gender of the patient can be considered to be the main predictive factors in the induced model.

Keywords: Data Mining, Clustering, Classification, Drug Utilization..

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1874
8056 Auto Classification for Search Intelligence

Authors: Lilac A. E. Al-Safadi

Abstract:

This paper proposes an auto-classification algorithm of Web pages using Data mining techniques. We consider the problem of discovering association rules between terms in a set of Web pages belonging to a category in a search engine database, and present an auto-classification algorithm for solving this problem that are fundamentally based on Apriori algorithm. The proposed technique has two phases. The first phase is a training phase where human experts determines the categories of different Web pages, and the supervised Data mining algorithm will combine these categories with appropriate weighted index terms according to the highest supported rules among the most frequent words. The second phase is the categorization phase where a web crawler will crawl through the World Wide Web to build a database categorized according to the result of the data mining approach. This database contains URLs and their categories.

Keywords: Information Processing on the Web, Data Mining, Document Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1595
8055 Analysis of Relation between Unlabeled and Labeled Data to Self-Taught Learning Performance

Authors: Ekachai Phaisangittisagul, Rapeepol Chongprachawat

Abstract:

Obtaining labeled data in supervised learning is often difficult and expensive, and thus the trained learning algorithm tends to be overfitting due to small number of training data. As a result, some researchers have focused on using unlabeled data which may not necessary to follow the same generative distribution as the labeled data to construct a high-level feature for improving performance on supervised learning tasks. In this paper, we investigate the impact of the relationship between unlabeled and labeled data for classification performance. Specifically, we will apply difference unlabeled data which have different degrees of relation to the labeled data for handwritten digit classification task based on MNIST dataset. Our experimental results show that the higher the degree of relation between unlabeled and labeled data, the better the classification performance. Although the unlabeled data that is completely from different generative distribution to the labeled data provides the lowest classification performance, we still achieve high classification performance. This leads to expanding the applicability of the supervised learning algorithms using unsupervised learning.

Keywords: Autoencoder, high-level feature, MNIST dataset, selftaught learning, supervised learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1799
8054 A New Model for Question Answering Systems

Authors: Mohammad Reza Kangavari, Samira Ghandchi, Manak Golpour

Abstract:

Most of the Question Answering systems composed of three main modules: question processing, document processing and answer processing. Question processing module plays an important role in QA systems. If this module doesn't work properly, it will make problems for other sections. Moreover answer processing module is an emerging topic in Question Answering, where these systems are often required to rank and validate candidate answers. These techniques aiming at finding short and precise answers are often based on the semantic classification. This paper discussed about a new model for question answering which improved two main modules, question processing and answer processing. There are two important components which are the bases of the question processing. First component is question classification that specifies types of question and answer. Second one is reformulation which converts the user's question into an understandable question by QA system in a specific domain. Answer processing module, consists of candidate answer filtering, candidate answer ordering components and also it has a validation section for interacting with user. This module makes it more suitable to find exact answer. In this paper we have described question and answer processing modules with modeling, implementing and evaluating the system. System implemented in two versions. Results show that 'Version No.1' gave correct answer to 70% of questions (30 correct answers to 50 asked questions) and 'version No.2' gave correct answers to 94% of questions (47 correct answers to 50 asked questions).

Keywords: Answer Processing, Classification, QuestionAnswering and Query Reformulation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2095
8053 Curvelet Transform Based Two Class Motor Imagery Classification

Authors: Nebi Gedik

Abstract:

One of the important parts of the brain-computer interface (BCI) studies is the classification of motor imagery (MI) obtained by electroencephalography (EEG). The major goal is to provide non-muscular communication and control via assistive technologies to people with severe motor disorders so that they can communicate with the outside world. In this study, an EEG signal classification approach based on multiscale and multi-resolution transform method is presented. The proposed approach is used to decompose the EEG signal containing motor image information (right- and left-hand movement imagery). The decomposition process is performed using curvelet transform which is a multiscale and multiresolution analysis method, and the transform output was evaluated as feature data. The obtained feature set is subjected to feature selection process to obtain the most effective ones using t-test methods. SVM and k-NN algorithms are assigned for classification.

Keywords: motor imagery, EEG, curvelet transform, SVM, k-NN

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 570
8052 A New Hybrid RMN Image Segmentation Algorithm

Authors: Abdelouahab Moussaoui, Nabila Ferahta, Victor Chen

Abstract:

The development of aid's systems for the medical diagnosis is not easy thing because of presence of inhomogeneities in the MRI, the variability of the data from a sequence to the other as well as of other different source distortions that accentuate this difficulty. A new automatic, contextual, adaptive and robust segmentation procedure by MRI brain tissue classification is described in this article. A first phase consists in estimating the density of probability of the data by the Parzen-Rozenblatt method. The classification procedure is completely automatic and doesn't make any assumptions nor on the clusters number nor on the prototypes of these clusters since these last are detected in an automatic manner by an operator of mathematical morphology called skeleton by influence zones detection (SKIZ). The problem of initialization of the prototypes as well as their number is transformed in an optimization problem; in more the procedure is adaptive since it takes in consideration the contextual information presents in every voxel by an adaptive and robust non parametric model by the Markov fields (MF). The number of bad classifications is reduced by the use of the criteria of MPM minimization (Maximum Posterior Marginal).

Keywords: Clustering, Automatic Classification, SKIZ, MarkovFields, Image segmentation, Maximum Posterior Marginal (MPM).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1383
8051 Identification of Arousal and Relaxation by using SVM-Based Fusion of PPG Features

Authors: Chi Jung Kim, Mincheol Whang, Eui Chul Lee

Abstract:

In this paper, we propose a new method to distinguish between arousal and relaxation states by using multiple features acquired from a photoplethysmogram (PPG) and support vector machine (SVM). To induce arousal and relaxation states in subjects, 2 kinds of sound stimuli are used, and their corresponding biosignals are obtained using the PPG sensor. Two features–pulse to pulse interval (PPI) and pulse amplitude (PA)–are extracted from acquired PPG data, and a nonlinear classification between arousal and relaxation is performed using SVM. This methodology has several advantages when compared with previous similar studies. Firstly, we extracted 2 separate features from PPG, i.e., PPI and PA. Secondly, in order to improve the classification accuracy, SVM-based nonlinear classification was performed. Thirdly, to solve classification problems caused by generalized features of whole subjects, we defined each threshold according to individual features. Experimental results showed that the average classification accuracy was 74.67%. Also, the proposed method showed the better identification performance than the single feature based methods. From this result, we confirmed that arousal and relaxation can be classified using SVM and PPG features.

Keywords: Support Vector Machine, PPG, Emotion Recognition, Arousal, Relaxation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2453
8050 A Spatial Hypergraph Based Semi-Supervised Band Selection Method for Hyperspectral Imagery Semantic Interpretation

Authors: Akrem Sellami, Imed Riadh Farah

Abstract:

Hyperspectral imagery (HSI) typically provides a wealth of information captured in a wide range of the electromagnetic spectrum for each pixel in the image. Hence, a pixel in HSI is a high-dimensional vector of intensities with a large spectral range and a high spectral resolution. Therefore, the semantic interpretation is a challenging task of HSI analysis. We focused in this paper on object classification as HSI semantic interpretation. However, HSI classification still faces some issues, among which are the following: The spatial variability of spectral signatures, the high number of spectral bands, and the high cost of true sample labeling. Therefore, the high number of spectral bands and the low number of training samples pose the problem of the curse of dimensionality. In order to resolve this problem, we propose to introduce the process of dimensionality reduction trying to improve the classification of HSI. The presented approach is a semi-supervised band selection method based on spatial hypergraph embedding model to represent higher order relationships with different weights of the spatial neighbors corresponding to the centroid of pixel. This semi-supervised band selection has been developed to select useful bands for object classification. The presented approach is evaluated on AVIRIS and ROSIS HSIs and compared to other dimensionality reduction methods. The experimental results demonstrate the efficacy of our approach compared to many existing dimensionality reduction methods for HSI classification.

Keywords: Hyperspectral image, spatial hypergraph, dimensionality reduction, semantic interpretation, band selection, feature extraction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1192
8049 Enhanced Performance for Support Vector Machines as Multiclass Classifiers in Steel Surface Defect Detection

Authors: Ehsan Amid, Sina Rezaei Aghdam, Hamidreza Amindavar

Abstract:

Steel surface defect detection is essentially one of pattern recognition problems. Support Vector Machines (SVMs) are known as one of the most proper classifiers in this application. In this paper, we introduce a more accurate classification method by using SVMs as our final classifier of the inspection system. In this scheme, multiclass classification task is performed based on the "one-againstone" method and different kernels are utilized for each pair of the classes in multiclass classification of the different defects. In the proposed system, a decision tree is employed in the first stage for two-class classification of the steel surfaces to "defect" and "non-defect", in order to decrease the time complexity. Based on the experimental results, generated from over one thousand images, the proposed multiclass classification scheme is more accurate than the conventional methods and the overall system yields a sufficient performance which can meet the requirements in steel manufacturing.

Keywords: Steel Surface Defect Detection, Support Vector Machines, Kernel Methods.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1889
8048 Investigation of the Possibility to Prepare Supervised Classification Map of Gully Erosion by RS and GIS

Authors: Ali Mohammadi Torkashvand, Hamid Reza Alipour

Abstract:

This study investigates the possibility providing gully erosion map by the supervised classification of satellite images (ETM+) in two mountainous and plain land types. These land types were the part of Varamin plain, Tehran province, and Roodbar subbasin, Guilan province, as plain and mountain land types, respectively. The position of 652 and 124 ground control points were recorded by GPS respectively in mountain and plain land types. Soil gully erosion, land uses or plant covers were investigated in these points. Regarding ground control points and auxiliary points, training points of gully erosion and other surface features were introduced to software (Ilwis 3.3 Academic). The supervised classified map of gully erosion was prepared by maximum likelihood method and then, overall accuracy of this map was computed. Results showed that the possibility supervised classification of gully erosion isn-t possible, although it need more studies for results generalization to other mountainous regions. Also, with increasing land uses and other surface features in plain physiography, it decreases the classification of accuracy.

Keywords: Supervised classification, Gully erosion, Map.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1795
8047 Movie Genre Preference Prediction Using Machine Learning for Customer-Based Information

Authors: Haifeng Wang, Haili Zhang

Abstract:

Most movie recommendation systems have been developed for customers to find items of interest. This work introduces a predictive model usable by small and medium-sized enterprises (SMEs) who are in need of a data-based and analytical approach to stock proper movies for local audiences and retain more customers. We used classification models to extract features from thousands of customers’ demographic, behavioral and social information to predict their movie genre preference. In the implementation, a Gaussian kernel support vector machine (SVM) classification model and a logistic regression model were established to extract features from sample data and their test error-in-sample were compared. Comparison of error-out-sample was also made under different Vapnik–Chervonenkis (VC) dimensions in the machine learning algorithm to find and prevent overfitting. Gaussian kernel SVM prediction model can correctly predict movie genre preferences in 85% of positive cases. The accuracy of the algorithm increased to 93% with a smaller VC dimension and less overfitting. These findings advance our understanding of how to use machine learning approach to predict customers’ preferences with a small data set and design prediction tools for these enterprises.

Keywords: Computational social science, movie preference, machine learning, SVM.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1620
8046 Indonesian News Classification using Support Vector Machine

Authors: Dewi Y. Liliana, Agung Hardianto, M. Ridok

Abstract:

Digital news with a variety topics is abundant on the internet. The problem is to classify news based on its appropriate category to facilitate user to find relevant news rapidly. Classifier engine is used to split any news automatically into the respective category. This research employs Support Vector Machine (SVM) to classify Indonesian news. SVM is a robust method to classify binary classes. The core processing of SVM is in the formation of an optimum separating plane to separate the different classes. For multiclass problem, a mechanism called one against one is used to combine the binary classification result. Documents were taken from the Indonesian digital news site, www.kompas.com. The experiment showed a promising result with the accuracy rate of 85%. This system is feasible to be implemented on Indonesian news classification.

Keywords: classification, Indonesian news, text processing, support vector machine

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3462
8045 Hybrid Color-Texture Space for Image Classification

Authors: Hassan El Maia, Ahmed Hammouch, Driss Aboutajdine

Abstract:

This work presents an approach for the construction of a hybrid color-texture space by using mutual information. Feature extraction is done by the Laws filter with SVM (Support Vectors Machine) as a classifier. The classification is applied on the VisTex database and a SPOT HRV (XS) image representing two forest areas in the region of Rabat in Morocco. The result of classification obtained in the hybrid space is compared with the one obtained in the RGB color space.

Keywords: Color, texture, laws filter, mutual information, SVM, hybrid space.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1797
8044 Balancing Neural Trees to Improve Classification Performance

Authors: Asha Rani, Christian Micheloni, Gian Luca Foresti

Abstract:

In this paper, a neural tree (NT) classifier having a simple perceptron at each node is considered. A new concept for making a balanced tree is applied in the learning algorithm of the tree. At each node, if the perceptron classification is not accurate and unbalanced, then it is replaced by a new perceptron. This separates the training set in such a way that almost the equal number of patterns fall into each of the classes. Moreover, each perceptron is trained only for the classes which are present at respective node and ignore other classes. Splitting nodes are employed into the neural tree architecture to divide the training set when the current perceptron node repeats the same classification of the parent node. A new error function based on the depth of the tree is introduced to reduce the computational time for the training of a perceptron. Experiments are performed to check the efficiency and encouraging results are obtained in terms of accuracy and computational costs.

Keywords: Neural Tree, Pattern Classification, Perceptron, Splitting Nodes.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1199
8043 Analysis of Medical Data using Data Mining and Formal Concept Analysis

Authors: Anamika Gupta, Naveen Kumar, Vasudha Bhatnagar

Abstract:

This paper focuses on analyzing medical diagnostic data using classification rules in data mining and context reduction in formal concept analysis. It helps in finding redundancies among the various medical examination tests used in diagnosis of a disease. Classification rules have been derived from positive and negative association rules using the Concept lattice structure of the Formal Concept Analysis. Context reduction technique given in Formal Concept Analysis along with classification rules has been used to find redundancies among the various medical examination tests. Also it finds out whether expensive medical tests can be replaced by some cheaper tests.

Keywords: Data Mining, Formal Concept Analysis, Medical Data, Negative Classification Rules.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1698
8042 Multiclass Support Vector Machines for Environmental Sounds Classification Using log-Gabor Filters

Authors: S. Souli, Z. Lachiri

Abstract:

In this paper we propose a robust environmental sound classification approach, based on spectrograms features driven from log-Gabor filters. This approach includes two methods. In the first methods, the spectrograms are passed through an appropriate log-Gabor filter banks and the outputs are averaged and underwent an optimal feature selection procedure based on a mutual information criteria. The second method uses the same steps but applied only to three patches extracted from each spectrogram.

To investigate the accuracy of the proposed methods, we conduct experiments using a large database containing 10 environmental sound classes. The classification results based on Multiclass Support Vector Machines show that the second method is the most efficient with an average classification accuracy of 89.62 %.

Keywords: Environmental sounds, Log-Gabor filters, Spectrogram, SVM Multiclass, Visual features.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1720
8041 Optimized Facial Features-based Age Classification

Authors: Md. Zahangir Alom, Mei-Lan Piao, Md. Shariful Islam, Nam Kim, Jae-Hyeung Park

Abstract:

The evaluation and measurement of human body dimensions are achieved by physical anthropometry. This research was conducted in view of the importance of anthropometric indices of the face in forensic medicine, surgery, and medical imaging. The main goal of this research is to optimization of facial feature point by establishing a mathematical relationship among facial features and used optimize feature points for age classification. Since selected facial feature points are located to the area of mouth, nose, eyes and eyebrow on facial images, all desire facial feature points are extracted accurately. According this proposes method; sixteen Euclidean distances are calculated from the eighteen selected facial feature points vertically as well as horizontally. The mathematical relationships among horizontal and vertical distances are established. Moreover, it is also discovered that distances of the facial feature follows a constant ratio due to age progression. The distances between the specified features points increase with respect the age progression of a human from his or her childhood but the ratio of the distances does not change (d = 1 .618 ) . Finally, according to the proposed mathematical relationship four independent feature distances related to eight feature points are selected from sixteen distances and eighteen feature point-s respectively. These four feature distances are used for classification of age using Support Vector Machine (SVM)-Sequential Minimal Optimization (SMO) algorithm and shown around 96 % accuracy. Experiment result shows the proposed system is effective and accurate for age classification.

Keywords: 3D Face Model, Face Anthropometrics, Facial Features Extraction, Feature distances, SVM-SMO

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2021
8040 Persian Printed Numerals Classification Using Extended Moment Invariants

Authors: Hamid Reza Boveiri

Abstract:

Classification of Persian printed numeral characters has been considered and a proposed system has been introduced. In representation stage, for the first time in Persian optical character recognition, extended moment invariants has been utilized as characters image descriptor. In classification stage, four different classifiers namely minimum mean distance, nearest neighbor rule, multi layer perceptron, and fuzzy min-max neural network has been used, which first and second are traditional nonparametric statistical classifier. Third is a well-known neural network and forth is a kind of fuzzy neural network that is based on utilizing hyperbox fuzzy sets. Set of different experiments has been done and variety of results has been presented. The results showed that extended moment invariants are qualified as features to classify Persian printed numeral characters.

Keywords: Extended moment invariants, optical characterrecognition, Persian numerals classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1891
8039 Automatic Classification of Initial Categories of Alzheimer's Disease from Structural MRI Phase Images: A Comparison of PSVM, KNN and ANN Methods

Authors: Ahsan Bin Tufail, Ali Abidi, Adil Masood Siddiqui, Muhammad Shahzad Younis

Abstract:

An early and accurate detection of Alzheimer's disease (AD) is an important stage in the treatment of individuals suffering from AD. We present an approach based on the use of structural magnetic resonance imaging (sMRI) phase images to distinguish between normal controls (NC), mild cognitive impairment (MCI) and AD patients with clinical dementia rating (CDR) of 1. Independent component analysis (ICA) technique is used for extracting useful features which form the inputs to the support vector machines (SVM), K nearest neighbour (kNN) and multilayer artificial neural network (ANN) classifiers to discriminate between the three classes. The obtained results are encouraging in terms of classification accuracy and effectively ascertain the usefulness of phase images for the classification of different stages of Alzheimer-s disease.

Keywords: Biomedical image processing, classification algorithms, feature extraction, statistical learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2732