Search results for: classification accuracy
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 4995

Search results for: classification accuracy

4845 A Nonlinear Feature Selection Method for Hyperspectral Image Classification

Authors: Pei-Jyun Hsieh, Cheng-Hsuan Li, Bor-Chen Kuo

Abstract:

For hyperspectral image classification, feature reduction is an important pre-processing for avoiding the Hughes phenomena due to the difficulty for collecting training samples. Hence, lots of researches developed feature selection methods such as F-score, HSIC (Hilbert-Schmidt Independence Criterion), and etc., to improve hyperspectral image classification. However, most of them only consider the class separability in the original space, i.e., a linear class separability. In this study, we proposed a nonlinear class separability measure based on kernel trick for selecting an appropriate feature subset. The proposed nonlinear class separability was formed by a generalized RBF kernel with different bandwidths with respect to different features. Moreover, it considered the within-class separability and the between-class separability. A genetic algorithm was applied to tune these bandwidths such that the smallest with-class separability and the largest between-class separability simultaneously. This indicates the corresponding feature space is more suitable for classification. In addition, the corresponding nonlinear classification boundary can separate classes very well. These optimal bandwidths also show the importance of bands for hyperspectral image classification. The reciprocals of these bandwidths can be viewed as weights of bands. The smaller bandwidth, the larger weight of the band, and the more importance for classification. Hence, the descending order of the reciprocals of the bands gives an order for selecting the appropriate feature subsets. In the experiments, three hyperspectral image data sets, the Indian Pine Site data set, the PAVIA data set, and the Salinas A data set, were used to demonstrate the selected feature subsets by the proposed nonlinear feature selection method are more appropriate for hyperspectral image classification. Only ten percent of samples were randomly selected to form the training dataset. All non-background samples were used to form the testing dataset. The support vector machine was applied to classify these testing samples based on selected feature subsets. According to the experiments on the Indian Pine Site data set with 220 bands, the highest accuracies by applying the proposed method, F-score, and HSIC are 0.8795, 0.8795, and 0.87404, respectively. However, the proposed method selects 158 features. F-score and HSIC select 168 features and 217 features, respectively. Moreover, the classification accuracies increase dramatically only using first few features. The classification accuracies with respect to feature subsets of 10 features, 20 features, 50 features, and 110 features are 0.69587, 0.7348, 0.79217, and 0.84164, respectively. Furthermore, only using half selected features (110 features) of the proposed method, the corresponding classification accuracy (0.84168) is approximate to the highest classification accuracy, 0.8795. For other two hyperspectral image data sets, the PAVIA data set and Salinas A data set, we can obtain the similar results. These results illustrate our proposed method can efficiently find feature subsets to improve hyperspectral image classification. One can apply the proposed method to determine the suitable feature subset first according to specific purposes. Then researchers can only use the corresponding sensors to obtain the hyperspectral image and classify the samples. This can not only improve the classification performance but also reduce the cost for obtaining hyperspectral images.

Keywords: hyperspectral image classification, nonlinear feature selection, kernel trick, support vector machine

Procedia PDF Downloads 240
4844 Pose Normalization Network for Object Classification

Authors: Bingquan Shen

Abstract:

Convolutional Neural Networks (CNN) have demonstrated their effectiveness in synthesizing 3D views of object instances at various viewpoints. Given the problem where one have limited viewpoints of a particular object for classification, we present a pose normalization architecture to transform the object to existing viewpoints in the training dataset before classification to yield better classification performance. We have demonstrated that this Pose Normalization Network (PNN) can capture the style of the target object and is able to re-render it to a desired viewpoint. Moreover, we have shown that the PNN improves the classification result for the 3D chairs dataset and ShapeNet airplanes dataset when given only images at limited viewpoint, as compared to a CNN baseline.

Keywords: convolutional neural networks, object classification, pose normalization, viewpoint invariant

Procedia PDF Downloads 306
4843 Lean Models Classification: Towards a Holistic View

Authors: Y. Tiamaz, N. Souissi

Abstract:

The purpose of this paper is to present a classification of Lean models which aims to capture all the concepts related to this approach and thus facilitate its implementation. This classification allows the identification of the most relevant models according to several dimensions. From this perspective, we present a review and an analysis of Lean models literature and we propose dimensions for the classification of the current proposals while respecting among others the axes of the Lean approach, the maturity of the models as well as their application domains. This classification allowed us to conclude that researchers essentially consider the Lean approach as a toolbox also they design their models to solve problems related to a specific environment. Since Lean approach is no longer intended only for the automotive sector where it was invented, but to all fields (IT, Hospital, ...), we consider that this approach requires a generic model that is capable of being implemented in all areas.

Keywords: lean approach, lean models, classification, dimensions, holistic view

Procedia PDF Downloads 406
4842 Comparison of the Effectiveness of Tree Algorithms in Classification of Spongy Tissue Texture

Authors: Roza Dzierzak, Waldemar Wojcik, Piotr Kacejko

Abstract:

Analysis of the texture of medical images consists of determining the parameters and characteristics of the examined tissue. The main goal is to assign the analyzed area to one of two basic groups: as a healthy tissue or a tissue with pathological changes. The CT images of the thoracic lumbar spine from 15 healthy patients and 15 with confirmed osteoporosis were used for the analysis. As a result, 120 samples with dimensions of 50x50 pixels were obtained. The set of features has been obtained based on the histogram, gradient, run-length matrix, co-occurrence matrix, autoregressive model, and Haar wavelet. As a result of the image analysis, 290 descriptors of textural features were obtained. The dimension of the space of features was reduced by the use of three selection methods: Fisher coefficient (FC), mutual information (MI), minimization of the classification error probability and average correlation coefficients between the chosen features minimization of classification error probability (POE) and average correlation coefficients (ACC). Each of them returned ten features occupying the initial place in the ranking devised according to its own coefficient. As a result of the Fisher coefficient and mutual information selections, the same features arranged in a different order were obtained. In both rankings, the 50% percentile (Perc.50%) was found in the first place. The next selected features come from the co-occurrence matrix. The sets of features selected in the selection process were evaluated using six classification tree methods. These were: decision stump (DS), Hoeffding tree (HT), logistic model trees (LMT), random forest (RF), random tree (RT) and reduced error pruning tree (REPT). In order to assess the accuracy of classifiers, the following parameters were used: overall classification accuracy (ACC), true positive rate (TPR, classification sensitivity), true negative rate (TNR, classification specificity), positive predictive value (PPV) and negative predictive value (NPV). Taking into account the classification results, it should be stated that the best results were obtained for the Hoeffding tree and logistic model trees classifiers, using the set of features selected by the POE + ACC method. In the case of the Hoeffding tree classifier, the highest values of three parameters were obtained: ACC = 90%, TPR = 93.3% and PPV = 93.3%. Additionally, the values of the other two parameters, i.e., TNR = 86.7% and NPV = 86.6% were close to the maximum values obtained for the LMT classifier. In the case of logistic model trees classifier, the same ACC value was obtained ACC=90% and the highest values for TNR=88.3% and NPV= 88.3%. The values of the other two parameters remained at a level close to the highest TPR = 91.7% and PPV = 91.6%. The results obtained in the experiment show that the use of classification trees is an effective method of classification of texture features. This allows identifying the conditions of the spongy tissue for healthy cases and those with the porosis.

Keywords: classification, feature selection, texture analysis, tree algorithms

Procedia PDF Downloads 141
4841 Developing an Advanced Algorithm Capable of Classifying News, Articles and Other Textual Documents Using Text Mining Techniques

Authors: R. B. Knudsen, O. T. Rasmussen, R. A. Alphinas

Abstract:

The reason for conducting this research is to develop an algorithm that is capable of classifying news articles from the automobile industry, according to the competitive actions that they entail, with the use of Text Mining (TM) methods. It is needed to test how to properly preprocess the data for this research by preparing pipelines which fits each algorithm the best. The pipelines are tested along with nine different classification algorithms in the realm of regression, support vector machines, and neural networks. Preliminary testing for identifying the optimal pipelines and algorithms resulted in the selection of two algorithms with two different pipelines. The two algorithms are Logistic Regression (LR) and Artificial Neural Network (ANN). These algorithms are optimized further, where several parameters of each algorithm are tested. The best result is achieved with the ANN. The final model yields an accuracy of 0.79, a precision of 0.80, a recall of 0.78, and an F1 score of 0.76. By removing three of the classes that created noise, the final algorithm is capable of reaching an accuracy of 94%.

Keywords: Artificial Neural network, Competitive dynamics, Logistic Regression, Text classification, Text mining

Procedia PDF Downloads 93
4840 The Optimization of Decision Rules in Multimodal Decision-Level Fusion Scheme

Authors: Andrey V. Timofeev, Dmitry V. Egorov

Abstract:

This paper introduces an original method of parametric optimization of the structure for multimodal decision-level fusion scheme which combines the results of the partial solution of the classification task obtained from assembly of the mono-modal classifiers. As a result, a multimodal fusion classifier which has the minimum value of the total error rate has been obtained.

Keywords: classification accuracy, fusion solution, total error rate, multimodal fusion classifier

Procedia PDF Downloads 434
4839 Machine Learning Techniques in Bank Credit Analysis

Authors: Fernanda M. Assef, Maria Teresinha A. Steiner

Abstract:

The aim of this paper is to compare and discuss better classifier algorithm options for credit risk assessment by applying different Machine Learning techniques. Using records from a Brazilian financial institution, this study uses a database of 5,432 companies that are clients of the bank, where 2,600 clients are classified as non-defaulters, 1,551 are classified as defaulters and 1,281 are temporarily defaulters, meaning that the clients are overdue on their payments for up 180 days. For each case, a total of 15 attributes was considered for a one-against-all assessment using four different techniques: Artificial Neural Networks Multilayer Perceptron (ANN-MLP), Artificial Neural Networks Radial Basis Functions (ANN-RBF), Logistic Regression (LR) and finally Support Vector Machines (SVM). For each method, different parameters were analyzed in order to obtain different results when the best of each technique was compared. Initially the data were coded in thermometer code (numerical attributes) or dummy coding (for nominal attributes). The methods were then evaluated for each parameter and the best result of each technique was compared in terms of accuracy, false positives, false negatives, true positives and true negatives. This comparison showed that the best method, in terms of accuracy, was ANN-RBF (79.20% for non-defaulter classification, 97.74% for defaulters and 75.37% for the temporarily defaulter classification). However, the best accuracy does not always represent the best technique. For instance, on the classification of temporarily defaulters, this technique, in terms of false positives, was surpassed by SVM, which had the lowest rate (0.07%) of false positive classifications. All these intrinsic details are discussed considering the results found, and an overview of what was presented is shown in the conclusion of this study.

Keywords: artificial neural networks (ANNs), classifier algorithms, credit risk assessment, logistic regression, machine Learning, support vector machines

Procedia PDF Downloads 76
4838 Improved Accuracy of Ratio Multiple Valuation

Authors: Julianto Agung Saputro, Jogiyanto Hartono

Abstract:

Multiple valuation is widely used by investors and practitioners but its accuracy is questionable. Multiple valuation inaccuracies are due to the unreliability of information used in valuation, inaccuracies comparison group selection, and use of individual multiple values. This study investigated the accuracy of valuation to examine factors that can increase the accuracy of the valuation of multiple ratios, that are discretionary accruals, the comparison group, and the composite of multiple valuation. These results indicate that multiple value adjustment method with discretionary accruals provides better accuracy, the industry comparator group method combined with the size and growth of companies also provide better accuracy. Composite of individual multiple valuation gives the best accuracy. If all of these factors combined, the accuracy of valuation of multiple ratios will give the best results.

Keywords: multiple, valuation, composite, accuracy

Procedia PDF Downloads 252
4837 Automatic Classification for the Degree of Disc Narrowing from X-Ray Images Using CNN

Authors: Kwangmin Joo

Abstract:

Automatic detection of lumbar vertebrae and classification method is proposed for evaluating the degree of disc narrowing. Prior to classification, deep learning based segmentation is applied to detect individual lumbar vertebra. M-net is applied to segment five lumbar vertebrae and fine-tuning segmentation is employed to improve the accuracy of segmentation. Using the features extracted from previous step, clustering technique, k-means clustering, is applied to estimate the degree of disc space narrowing under four grade scoring system. As preliminary study, techniques proposed in this research could help building an automatic scoring system to diagnose the severity of disc narrowing from X-ray images.

Keywords: Disc space narrowing, Degenerative disc disorders, Deep learning based segmentation, Clustering technique

Procedia PDF Downloads 95
4836 Phenotype Prediction of DNA Sequence Data: A Machine and Statistical Learning Approach

Authors: Mpho Mokoatle, Darlington Mapiye, James Mashiyane, Stephanie Muller, Gciniwe Dlamini

Abstract:

Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on $k$-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0%, 80.5%, 80.5%, 63.6%, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanisms.

Keywords: AWD-LSTM, bootstrapping, k-mers, next generation sequencing

Procedia PDF Downloads 133
4835 Phenotype Prediction of DNA Sequence Data: A Machine and Statistical Learning Approach

Authors: Darlington Mapiye, Mpho Mokoatle, James Mashiyane, Stephanie Muller, Gciniwe Dlamini

Abstract:

Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on k-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0 %, 80.5 %, 80.5 %, 63.6 %, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanisms

Keywords: AWD-LSTM, bootstrapping, k-mers, next generation sequencing

Procedia PDF Downloads 122
4834 Hyperspectral Data Classification Algorithm Based on the Deep Belief and Self-Organizing Neural Network

Authors: Li Qingjian, Li Ke, He Chun, Huang Yong

Abstract:

In this paper, the method of combining the Pohl Seidman's deep belief network with the self-organizing neural network is proposed to classify the target. This method is mainly aimed at the high nonlinearity of the hyperspectral image, the high sample dimension and the difficulty in designing the classifier. The main feature of original data is extracted by deep belief network. In the process of extracting features, adding known labels samples to fine tune the network, enriching the main characteristics. Then, the extracted feature vectors are classified into the self-organizing neural network. This method can effectively reduce the dimensions of data in the spectrum dimension in the preservation of large amounts of raw data information, to solve the traditional clustering and the long training time when labeled samples less deep learning algorithm for training problems, improve the classification accuracy and robustness. Through the data simulation, the results show that the proposed network structure can get a higher classification precision in the case of a small number of known label samples.

Keywords: DBN, SOM, pattern classification, hyperspectral, data compression

Procedia PDF Downloads 311
4833 Brain-Computer Interface Based Real-Time Control of Fixed Wing and Multi-Rotor Unmanned Aerial Vehicles

Authors: Ravi Vishwanath, Saumya Kumaar, S. N. Omkar

Abstract:

Brain-computer interfacing (BCI) is a technology that is almost four decades old, and it was developed solely for the purpose of developing and enhancing the impact of neuroprosthetics. However, in the recent times, with the commercialization of non-invasive electroencephalogram (EEG) headsets, the technology has seen a wide variety of applications like home automation, wheelchair control, vehicle steering, etc. One of the latest developed applications is the mind-controlled quadrotor unmanned aerial vehicle. These applications, however, do not require a very high-speed response and give satisfactory results when standard classification methods like Support Vector Machine (SVM) and Multi-Layer Perceptron (MLPC). Issues are faced when there is a requirement for high-speed control in the case of fixed-wing unmanned aerial vehicles where such methods are rendered unreliable due to the low speed of classification. Such an application requires the system to classify data at high speeds in order to retain the controllability of the vehicle. This paper proposes a novel method of classification which uses a combination of Common Spatial Paradigm and Linear Discriminant Analysis that provides an improved classification accuracy in real time. A non-linear SVM based classification technique has also been discussed. Further, this paper discusses the implementation of the proposed method on a fixed-wing and VTOL unmanned aerial vehicles.

Keywords: brain-computer interface, classification, machine learning, unmanned aerial vehicles

Procedia PDF Downloads 254
4832 Classification of Political Affiliations by Reduced Number of Features

Authors: Vesile Evrim, Aliyu Awwal

Abstract:

By the evolvement in technology, the way of expressing opinions switched the direction to the digital world. The domain of politics as one of the hottest topics of opinion mining research merged together with the behavior analysis for affiliation determination in text which constitutes the subject of this paper. This study aims to classify the text in news/blogs either as Republican or Democrat with the minimum number of features. As an initial set, 68 features which 64 are constituted by Linguistic Inquiry and Word Count (LIWC) features are tested against 14 benchmark classification algorithms. In the later experiments, the dimensions of the feature vector reduced based on the 7 feature selection algorithms. The results show that Decision Tree, Rule Induction and M5 Rule classifiers when used with SVM and IGR feature selection algorithms performed the best up to 82.5% accuracy on a given dataset. Further tests on a single feature and the linguistic based feature sets showed the similar results. The feature “function” as an aggregate feature of the linguistic category, is obtained as the most differentiating feature among the 68 features with 81% accuracy by itself in classifying articles either as Republican or Democrat.

Keywords: feature selection, LIWC, machine learning, politics

Procedia PDF Downloads 353
4831 Heart Failure Identification and Progression by Classifying Cardiac Patients

Authors: Muhammad Saqlain, Nazar Abbas Saqib, Muazzam A. Khan

Abstract:

Heart Failure (HF) has become the major health problem in our society. The prevalence of HF has increased as the patient’s ages and it is the major cause of the high mortality rate in adults. A successful identification and progression of HF can be helpful to reduce the individual and social burden from this syndrome. In this study, we use a real data set of cardiac patients to propose a classification model for the identification and progression of HF. The data set has divided into three age groups, namely young, adult, and old and then each age group have further classified into four classes according to patient’s current physical condition. Contemporary Data Mining classification algorithms have been applied to each individual class of every age group to identify the HF. Decision Tree (DT) gives the highest accuracy of 90% and outperform all other algorithms. Our model accurately diagnoses different stages of HF for each age group and it can be very useful for the early prediction of HF.

Keywords: decision tree, heart failure, data mining, classification model

Procedia PDF Downloads 380
4830 Morphological Processing of Punjabi Text for Sentiment Analysis of Farmer Suicides

Authors: Jaspreet Singh, Gurvinder Singh, Prabhsimran Singh, Rajinder Singh, Prithvipal Singh, Karanjeet Singh Kahlon, Ravinder Singh Sawhney

Abstract:

Morphological evaluation of Indian languages is one of the burgeoning fields in the area of Natural Language Processing (NLP). The evaluation of a language is an eminent task in the era of information retrieval and text mining. The extraction and classification of knowledge from text can be exploited for sentiment analysis and morphological evaluation. This study coalesce morphological evaluation and sentiment analysis for the task of classification of farmer suicide cases reported in Punjab state of India. The pre-processing of Punjabi text involves morphological evaluation and normalization of Punjabi word tokens followed by the training of proposed model using deep learning classification on Punjabi language text extracted from online Punjabi news reports. The class-wise accuracies of sentiment prediction for four negatively oriented classes of farmer suicide cases are 93.85%, 88.53%, 83.3%, and 95.45% respectively. The overall accuracy of sentiment classification obtained using proposed framework on 275 Punjabi text documents is found to be 90.29%.

Keywords: deep neural network, farmer suicides, morphological processing, punjabi text, sentiment analysis

Procedia PDF Downloads 286
4829 A Novel Heuristic for Analysis of Large Datasets by Selecting Wrapper-Based Features

Authors: Bushra Zafar, Usman Qamar

Abstract:

Large data sample size and dimensions render the effectiveness of conventional data mining methodologies. A data mining technique are important tools for collection of knowledgeable information from variety of databases and provides supervised learning in the form of classification to design models to describe vital data classes while structure of the classifier is based on class attribute. Classification efficiency and accuracy are often influenced to great extent by noisy and undesirable features in real application data sets. The inherent natures of data set greatly masks its quality analysis and leave us with quite few practical approaches to use. To our knowledge first time, we present a new approach for investigation of structure and quality of datasets by providing a targeted analysis of localization of noisy and irrelevant features of data sets. Machine learning is based primarily on feature selection as pre-processing step which offers us to select few features from number of features as a subset by reducing the space according to certain evaluation criterion. The primary objective of this study is to trim down the scope of the given data sample by searching a small set of important features which may results into good classification performance. For this purpose, a heuristic for wrapper-based feature selection using genetic algorithm and for discriminative feature selection an external classifier are used. Selection of feature based on its number of occurrence in the chosen chromosomes. Sample dataset has been used to demonstrate proposed idea effectively. A proposed method has improved average accuracy of different datasets is about 95%. Experimental results illustrate that proposed algorithm increases the accuracy of prediction of different diseases.

Keywords: data mining, generic algorithm, KNN algorithms, wrapper based feature selection

Procedia PDF Downloads 293
4828 A Similarity Measure for Classification and Clustering in Image Based Medical and Text Based Banking Applications

Authors: K. P. Sandesh, M. H. Suman

Abstract:

Text processing plays an important role in information retrieval, data-mining, and web search. Measuring the similarity between the documents is an important operation in the text processing field. In this project, a new similarity measure is proposed. To compute the similarity between two documents with respect to a feature the proposed measure takes the following three cases into account: (1) The feature appears in both documents; (2) The feature appears in only one document and; (3) The feature appears in none of the documents. The proposed measure is extended to gauge the similarity between two sets of documents. The effectiveness of our measure is evaluated on several real-world data sets for text classification and clustering problems, especially in banking and health sectors. The results show that the performance obtained by the proposed measure is better than that achieved by the other measures.

Keywords: document classification, document clustering, entropy, accuracy, classifiers, clustering algorithms

Procedia PDF Downloads 480
4827 Real-Time Classification of Marbles with Decision-Tree Method

Authors: K. S. Parlak, E. Turan

Abstract:

The separation of marbles according to the pattern quality is a process made according to expert decision. The classification phase is the most critical part in terms of economic value. In this study, a self-learning system is proposed which performs the classification of marbles quickly and with high success. This system performs ten feature extraction by taking ten marble images from the camera. The marbles are classified by decision tree method using the obtained properties. The user forms the training set by training the system at the marble classification stage. The system evolves itself in every marble image that is classified. The aim of the proposed system is to minimize the error caused by the person performing the classification and achieve it quickly.

Keywords: decision tree, feature extraction, k-means clustering, marble classification

Procedia PDF Downloads 355
4826 Adversarial Attacks and Defenses on Deep Neural Networks

Authors: Jonathan Sohn

Abstract:

Deep neural networks (DNNs) have shown state-of-the-art performance for many applications, including computer vision, natural language processing, and speech recognition. Recently, adversarial attacks have been studied in the context of deep neural networks, which aim to alter the results of deep neural networks by modifying the inputs slightly. For example, an adversarial attack on a DNN used for object detection can cause the DNN to miss certain objects. As a result, the reliability of DNNs is undermined by their lack of robustness against adversarial attacks, raising concerns about their use in safety-critical applications such as autonomous driving. In this paper, we focus on studying the adversarial attacks and defenses on DNNs for image classification. There are two types of adversarial attacks studied which are fast gradient sign method (FGSM) attack and projected gradient descent (PGD) attack. A DNN forms decision boundaries that separate the input images into different categories. The adversarial attack slightly alters the image to move over the decision boundary, causing the DNN to misclassify the image. FGSM attack obtains the gradient with respect to the image and updates the image once based on the gradients to cross the decision boundary. PGD attack, instead of taking one big step, repeatedly modifies the input image with multiple small steps. There is also another type of attack called the target attack. This adversarial attack is designed to make the machine classify an image to a class chosen by the attacker. We can defend against adversarial attacks by incorporating adversarial examples in training. Specifically, instead of training the neural network with clean examples, we can explicitly let the neural network learn from the adversarial examples. In our experiments, the digit recognition accuracy on the MNIST dataset drops from 97.81% to 39.50% and 34.01% when the DNN is attacked by FGSM and PGD attacks, respectively. If we utilize FGSM training as a defense method, the classification accuracy greatly improves from 39.50% to 92.31% for FGSM attacks and from 34.01% to 75.63% for PGD attacks. To further improve the classification accuracy under adversarial attacks, we can also use a stronger PGD training method. PGD training improves the accuracy by 2.7% under FGSM attacks and 18.4% under PGD attacks over FGSM training. It is worth mentioning that both FGSM and PGD training do not affect the accuracy of clean images. In summary, we find that PGD attacks can greatly degrade the performance of DNNs, and PGD training is a very effective way to defend against such attacks. PGD attacks and defence are overall significantly more effective than FGSM methods.

Keywords: deep neural network, adversarial attack, adversarial defense, adversarial machine learning

Procedia PDF Downloads 158
4825 Enhanced Image Representation for Deep Belief Network Classification of Hyperspectral Images

Authors: Khitem Amiri, Mohamed Farah

Abstract:

Image classification is a challenging task and is gaining lots of interest since it helps us to understand the content of images. Recently Deep Learning (DL) based methods gave very interesting results on several benchmarks. For Hyperspectral images (HSI), the application of DL techniques is still challenging due to the scarcity of labeled data and to the curse of dimensionality. Among other approaches, Deep Belief Network (DBN) based approaches gave a fair classification accuracy. In this paper, we address the problem of the curse of dimensionality by reducing the number of bands and replacing the HSI channels by the channels representing radiometric indices. Therefore, instead of using all the HSI bands, we compute the radiometric indices such as NDVI (Normalized Difference Vegetation Index), NDWI (Normalized Difference Water Index), etc, and we use the combination of these indices as input for the Deep Belief Network (DBN) based classification model. Thus, we keep almost all the pertinent spectral information while reducing considerably the size of the image. In order to test our image representation, we applied our method on several HSI datasets including the Indian pines dataset, Jasper Ridge data and it gave comparable results to the state of the art methods while reducing considerably the time of training and testing.

Keywords: hyperspectral images, deep belief network, radiometric indices, image classification

Procedia PDF Downloads 244
4824 Arabic Text Classification: Review Study

Authors: M. Hijazi, A. Zeki, A. Ismail

Abstract:

An enormous amount of valuable human knowledge is preserved in documents. The rapid growth in the number of machine-readable documents for public or private access requires the use of automatic text classification. Text classification can be defined as assigning or structuring documents into a defined set of classes known in advance. Arabic text classification methods have emerged as a natural result of the existence of a massive amount of varied textual information written in the Arabic language on the web. This paper presents a review on the published researches of Arabic Text Classification using classical data representation, Bag of words (BoW), and using conceptual data representation based on semantic resources such as Arabic WordNet and Wikipedia.

Keywords: Arabic text classification, Arabic WordNet, bag of words, conceptual representation, semantic relations

Procedia PDF Downloads 398
4823 Automatic Classification of Lung Diseases from CT Images

Authors: Abobaker Mohammed Qasem Farhan, Shangming Yang, Mohammed Al-Nehari

Abstract:

Pneumonia is a kind of lung disease that creates congestion in the chest. Such pneumonic conditions lead to loss of life of the severity of high congestion. Pneumonic lung disease is caused by viral pneumonia, bacterial pneumonia, or Covidi-19 induced pneumonia. The early prediction and classification of such lung diseases help to reduce the mortality rate. We propose the automatic Computer-Aided Diagnosis (CAD) system in this paper using the deep learning approach. The proposed CAD system takes input from raw computerized tomography (CT) scans of the patient's chest and automatically predicts disease classification. We designed the Hybrid Deep Learning Algorithm (HDLA) to improve accuracy and reduce processing requirements. The raw CT scans have pre-processed first to enhance their quality for further analysis. We then applied a hybrid model that consists of automatic feature extraction and classification. We propose the robust 2D Convolutional Neural Network (CNN) model to extract the automatic features from the pre-processed CT image. This CNN model assures feature learning with extremely effective 1D feature extraction for each input CT image. The outcome of the 2D CNN model is then normalized using the Min-Max technique. The second step of the proposed hybrid model is related to training and classification using different classifiers. The simulation outcomes using the publically available dataset prove the robustness and efficiency of the proposed model compared to state-of-art algorithms.

Keywords: CT scan, Covid-19, deep learning, image processing, lung disease classification

Procedia PDF Downloads 107
4822 Image Classification with Localization Using Convolutional Neural Networks

Authors: Bhuyain Mobarok Hossain

Abstract:

Image classification and localization research is currently an important strategy in the field of computer vision. The evolution and advancement of deep learning and convolutional neural networks (CNN) have greatly improved the capabilities of object detection and image-based classification. Target detection is important to research in the field of computer vision, especially in video surveillance systems. To solve this problem, we will be applying a convolutional neural network of multiple scales at multiple locations in the image in one sliding window. Most translation networks move away from the bounding box around the area of interest. In contrast to this architecture, we consider the problem to be a classification problem where each pixel of the image is a separate section. Image classification is the method of predicting an individual category or specifying by a shoal of data points. Image classification is a part of the classification problem, including any labels throughout the image. The image can be classified as a day or night shot. Or, likewise, images of cars and motorbikes will be automatically placed in their collection. The deep learning of image classification generally includes convolutional layers; the invention of it is referred to as a convolutional neural network (CNN).

Keywords: image classification, object detection, localization, particle filter

Procedia PDF Downloads 266
4821 Rank-Based Chain-Mode Ensemble for Binary Classification

Authors: Chongya Song, Kang Yen, Alexander Pons, Jin Liu

Abstract:

In the field of machine learning, the ensemble has been employed as a common methodology to improve the performance upon multiple base classifiers. However, the true predictions are often canceled out by the false ones during consensus due to a phenomenon called “curse of correlation” which is represented as the strong interferences among the predictions produced by the base classifiers. In addition, the existing practices are still not able to effectively mitigate the problem of imbalanced classification. Based on the analysis on our experiment results, we conclude that the two problems are caused by some inherent deficiencies in the approach of consensus. Therefore, we create an enhanced ensemble algorithm which adopts a designed rank-based chain-mode consensus to overcome the two problems. In order to evaluate the proposed ensemble algorithm, we employ a well-known benchmark data set NSL-KDD (the improved version of dataset KDDCup99 produced by University of New Brunswick) to make comparisons between the proposed and 8 common ensemble algorithms. Particularly, each compared ensemble classifier uses the same 22 base classifiers, so that the differences in terms of the improvements toward the accuracy and reliability upon the base classifiers can be truly revealed. As a result, the proposed rank-based chain-mode consensus is proved to be a more effective ensemble solution than the traditional consensus approach, which outperforms the 8 ensemble algorithms by 20% on almost all compared metrices which include accuracy, precision, recall, F1-score and area under receiver operating characteristic curve.

Keywords: consensus, curse of correlation, imbalance classification, rank-based chain-mode ensemble

Procedia PDF Downloads 107
4820 Mapping of Arenga Pinnata Tree Using Remote Sensing

Authors: Zulkiflee Abd Latif, Sitinor Atikah Nordin, Alawi Sulaiman

Abstract:

Different tree species possess different and various benefits. Arenga Pinnata tree species own several potential uses that is valuable for the economy and the country. Mapping vegetation using remote sensing technique involves various process, techniques and consideration. Using satellite imagery, this method enables the access of inaccessible area and with the availability of near infra-red band; it is useful in vegetation analysis, especially in identifying tree species. Pixel-based and object-based classification technique is used as a method in this study. Pixel-based classification technique used in this study divided into unsupervised and supervised classification. Object based classification technique becomes more popular another alternative method in classification process. Using spectral, texture, color and other information, to classify the target make object-based classification is a promising technique for classification. Classification of Arenga Pinnata trees is overlaid with elevation, slope and aspect, soil and river data and several other data to give information regarding the tree character and living environment. This paper will present the utilization of remote sensing technique in order to map Arenga Pinnata tree species

Keywords: Arenga Pinnata, pixel-based classification, object-based classification, remote sensing

Procedia PDF Downloads 340
4819 Metamorphic Computer Virus Classification Using Hidden Markov Model

Authors: Babak Bashari Rad

Abstract:

A metamorphic computer virus uses different code transformation techniques to mutate its body in duplicated instances. Characteristics and function of new instances are mostly similar to their parents, but they cannot be easily detected by the majority of antivirus in market, as they depend on string signature-based detection techniques. The purpose of this research is to propose a Hidden Markov Model for classification of metamorphic viruses in executable files. In the proposed solution, portable executable files are inspected to extract the instructions opcodes needed for the examination of code. A Hidden Markov Model trained on portable executable files is employed to classify the metamorphic viruses of the same family. The proposed model is able to generate and recognize common statistical features of mutated code. The model has been evaluated by examining the model on a test data set. The performance of the model has been practically tested and evaluated based on False Positive Rate, Detection Rate and Overall Accuracy. The result showed an acceptable performance with high average of 99.7% Detection Rate.

Keywords: malware classification, computer virus classification, metamorphic virus, metamorphic malware, Hidden Markov Model

Procedia PDF Downloads 284
4818 MSIpred: A Python 2 Package for the Classification of Tumor Microsatellite Instability from Tumor Mutation Annotation Data Using a Support Vector Machine

Authors: Chen Wang, Chun Liang

Abstract:

Microsatellite instability (MSI) is characterized by high degree of polymorphism in microsatellite (MS) length due to a deficiency in mismatch repair (MMR) system. MSI is associated with several tumor types and its status can be considered as an important indicator for tumor prognostic. Conventional clinical diagnosis of MSI examines PCR products of a panel of MS markers using electrophoresis (MSI-PCR) which is laborious, time consuming, and less reliable. MSIpred, a python 2 package for automatic classification of MSI was released by this study. It computes important somatic mutation features from files in mutation annotation format (MAF) generated from paired tumor-normal exome sequencing data, subsequently using these to predict tumor MSI status with a support vector machine (SVM) classifier trained by MAF files of 1074 tumors belonging to four types. Evaluation of MSIpred on an independent 358-tumor test set achieved overall accuracy of over 98% and area under receiver operating characteristic (ROC) curve of 0.967. These results indicated that MSIpred is a robust pan-cancer MSI classification tool and can serve as a complementary diagnostic to MSI-PCR in MSI diagnosis.

Keywords: microsatellite instability, pan-cancer classification, somatic mutation, support vector machine

Procedia PDF Downloads 141
4817 Vehicle Type Classification with Geometric and Appearance Attributes

Authors: Ghada S. Moussa

Abstract:

With the increase in population along with economic prosperity, an enormous increase in the number and types of vehicles on the roads occurred. This fact brings a growing need for efficiently yet effectively classifying vehicles into their corresponding categories, which play a crucial role in many areas of infrastructure planning and traffic management. This paper presents two vehicle-type classification approaches; 1) geometric-based and 2) appearance-based. The two classification approaches are used for two tasks: multi-class and intra-class vehicle classifications. For the evaluation purpose of the proposed classification approaches’ performance and the identification of the most effective yet efficient one, 10-fold cross-validation technique is used with a large dataset. The proposed approaches are distinguishable from previous research on vehicle classification in which: i) they consider both geometric and appearance attributes of vehicles, and ii) they perform remarkably well in both multi-class and intra-class vehicle classification. Experimental results exhibit promising potentials implementations of the proposed vehicle classification approaches into real-world applications.

Keywords: appearance attributes, geometric attributes, support vector machine, vehicle classification

Procedia PDF Downloads 312
4816 Application of Data Mining Techniques for Tourism Knowledge Discovery

Authors: Teklu Urgessa, Wookjae Maeng, Joong Seek Lee

Abstract:

Application of five implementations of three data mining classification techniques was experimented for extracting important insights from tourism data. The aim was to find out the best performing algorithm among the compared ones for tourism knowledge discovery. Knowledge discovery process from data was used as a process model. 10-fold cross validation method is used for testing purpose. Various data preprocessing activities were performed to get the final dataset for model building. Classification models of the selected algorithms were built with different scenarios on the preprocessed dataset. The outperformed algorithm tourism dataset was Random Forest (76%) before applying information gain based attribute selection and J48 (C4.5) (75%) after selection of top relevant attributes to the class (target) attribute. In terms of time for model building, attribute selection improves the efficiency of all algorithms. Artificial Neural Network (multilayer perceptron) showed the highest improvement (90%). The rules extracted from the decision tree model are presented, which showed intricate, non-trivial knowledge/insight that would otherwise not be discovered by simple statistical analysis with mediocre accuracy of the machine using classification algorithms.

Keywords: classification algorithms, data mining, knowledge discovery, tourism

Procedia PDF Downloads 266