Search results for: classification methods
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 16894

Search results for: classification methods

16894 Arabic Text Representation and Classification Methods: Current State of the Art

Authors: Rami Ayadi, Mohsen Maraoui, Mounir Zrigui

Abstract:

In this paper, we have presented a brief current state of the art for Arabic text representation and classification methods. We decomposed Arabic Task Classification into four categories. First we describe some algorithms applied to classification on Arabic text. Secondly, we cite all major works when comparing classification algorithms applied on Arabic text, after this, we mention some authors who proposing new classification methods and finally we investigate the impact of preprocessing on Arabic TC.

Keywords: text classification, Arabic, impact of preprocessing, classification algorithms

Procedia PDF Downloads 469
16893 Evaluating Classification with Efficacy Metrics

Authors: Guofan Shao, Lina Tang, Hao Zhang

Abstract:

The values of image classification accuracy are affected by class size distributions and classification schemes, making it difficult to compare the performance of classification algorithms across different remote sensing data sources and classification systems. Based on the term efficacy from medicine and pharmacology, we have developed the metrics of image classification efficacy at the map and class levels. The novelty of this approach is that a baseline classification is involved in computing image classification efficacies so that the effects of class statistics are reduced. Furthermore, the image classification efficacies are interpretable and comparable, and thus, strengthen the assessment of image data classification methods. We use real-world and hypothetical examples to explain the use of image classification efficacies. The metrics of image classification efficacy meet the critical need to rectify the strategy for the assessment of image classification performance as image classification methods are becoming more diversified.

Keywords: accuracy assessment, efficacy, image classification, machine learning, uncertainty

Procedia PDF Downloads 211
16892 A New Approach for Improving Accuracy of Multi Label Stream Data

Authors: Kunal Shah, Swati Patel

Abstract:

Many real world problems involve data which can be considered as multi-label data streams. Efficient methods exist for multi-label classification in non streaming scenarios. However, learning in evolving streaming scenarios is more challenging, as the learners must be able to adapt to change using limited time and memory. Classification is used to predict class of unseen instance as accurate as possible. Multi label classification is a variant of single label classification where set of labels associated with single instance. Multi label classification is used by modern applications, such as text classification, functional genomics, image classification, music categorization etc. This paper introduces the task of multi-label classification, methods for multi-label classification and evolution measure for multi-label classification. Also, comparative analysis of multi label classification methods on the basis of theoretical study, and then on the basis of simulation was done on various data sets.

Keywords: binary relevance, concept drift, data stream mining, MLSC, multiple window with buffer

Procedia PDF Downloads 584
16891 Urban Land Cover from GF-2 Satellite Images Using Object Based and Neural Network Classifications

Authors: Lamyaa Gamal El-Deen Taha, Ashraf Sharawi

Abstract:

China launched satellite GF-2 in 2014. This study deals with comparing nearest neighbor object-based classification and neural network classification methods for classification of the fused GF-2 image. Firstly, rectification of GF-2 image was performed. Secondly, a comparison between nearest neighbor object-based classification and neural network classification for classification of fused GF-2 was performed. Thirdly, the overall accuracy of classification and kappa index were calculated. Results indicate that nearest neighbor object-based classification is better than neural network classification for urban mapping.

Keywords: GF-2 images, feature extraction-rectification, nearest neighbour object based classification, segmentation algorithms, neural network classification, multilayer perceptron

Procedia PDF Downloads 389
16890 Multilabel Classification with Neural Network Ensemble Method

Authors: Sezin Ekşioğlu

Abstract:

Multilabel classification has a huge importance for several applications, it is also a challenging research topic. It is a kind of supervised learning that contains binary targets. The distance between multilabel and binary classification is having more than one class in multilabel classification problems. Features can belong to one class or many classes. There exists a wide range of applications for multi label prediction such as image labeling, text categorization, gene functionality. Even though features are classified in many classes, they may not always be properly classified. There are many ensemble methods for the classification. However, most of the researchers have been concerned about better multilabel methods. Especially little ones focus on both efficiency of classifiers and pairwise relationships at the same time in order to implement better multilabel classification. In this paper, we worked on modified ensemble methods by getting benefit from k-Nearest Neighbors and neural network structure to address issues within a beneficial way and to get better impacts from the multilabel classification. Publicly available datasets (yeast, emotion, scene and birds) are performed to demonstrate the developed algorithm efficiency and the technique is measured by accuracy, F1 score and hamming loss metrics. Our algorithm boosts benchmarks for each datasets with different metrics.

Keywords: multilabel, classification, neural network, KNN

Procedia PDF Downloads 155
16889 Comparative Analysis of Classification Methods in Determining Non-Active Student Characteristics in Indonesia Open University

Authors: Dewi Juliah Ratnaningsih, Imas Sukaesih Sitanggang

Abstract:

Classification is one of data mining techniques that aims to discover a model from training data that distinguishes records into the appropriate category or class. Data mining classification methods can be applied in education, for example, to determine the classification of non-active students in Indonesia Open University. This paper presents a comparison of three methods of classification: Naïve Bayes, Bagging, and C.45. The criteria used to evaluate the performance of three methods of classification are stratified cross-validation, confusion matrix, the value of the area under the ROC Curve (AUC), Recall, Precision, and F-measure. The data used for this paper are from the non-active Indonesia Open University students in registration period of 2004.1 to 2012.2. Target analysis requires that non-active students were divided into 3 groups: C1, C2, and C3. Data analyzed are as many as 4173 students. Results of the study show: (1) Bagging method gave a high degree of classification accuracy than Naïve Bayes and C.45, (2) the Bagging classification accuracy rate is 82.99 %, while the Naïve Bayes and C.45 are 80.04 % and 82.74 % respectively, (3) the result of Bagging classification tree method has a large number of nodes, so it is quite difficult in decision making, (4) classification of non-active Indonesia Open University student characteristics uses algorithms C.45, (5) based on the algorithm C.45, there are 5 interesting rules which can describe the characteristics of non-active Indonesia Open University students.

Keywords: comparative analysis, data mining, clasiffication, Bagging, Naïve Bayes, C.45, non-active students, Indonesia Open University

Procedia PDF Downloads 316
16888 A Hybrid Fuzzy Clustering Approach for Fertile and Unfertile Analysis

Authors: Shima Soltanzadeh, Mohammad Hosain Fazel Zarandi, Mojtaba Barzegar Astanjin

Abstract:

Diagnosis of male infertility by the laboratory tests is expensive and, sometimes it is intolerable for patients. Filling out the questionnaire and then using classification method can be the first step in decision-making process, so only in the cases with a high probability of infertility we can use the laboratory tests. In this paper, we evaluated the performance of four classification methods including naive Bayesian, neural network, logistic regression and fuzzy c-means clustering as a classification, in the diagnosis of male infertility due to environmental factors. Since the data are unbalanced, the ROC curves are most suitable method for the comparison. In this paper, we also have selected the more important features using a filtering method and examined the impact of this feature reduction on the performance of each methods; generally, most of the methods had better performance after applying the filter. We have showed that using fuzzy c-means clustering as a classification has a good performance according to the ROC curves and its performance is comparable to other classification methods like logistic regression.

Keywords: classification, fuzzy c-means, logistic regression, Naive Bayesian, neural network, ROC curve

Procedia PDF Downloads 337
16887 Predication Model for Leukemia Diseases Based on Data Mining Classification Algorithms with Best Accuracy

Authors: Fahd Sabry Esmail, M. Badr Senousy, Mohamed Ragaie

Abstract:

In recent years, there has been an explosion in the rate of using technology that help discovering the diseases. For example, DNA microarrays allow us for the first time to obtain a "global" view of the cell. It has great potential to provide accurate medical diagnosis, to help in finding the right treatment and cure for many diseases. Various classification algorithms can be applied on such micro-array datasets to devise methods that can predict the occurrence of Leukemia disease. In this study, we compared the classification accuracy and response time among eleven decision tree methods and six rule classifier methods using five performance criteria. The experiment results show that the performance of Random Tree is producing better result. Also it takes lowest time to build model in tree classifier. The classification rules algorithms such as nearest- neighbor-like algorithm (NNge) is the best algorithm due to the high accuracy and it takes lowest time to build model in classification.

Keywords: data mining, classification techniques, decision tree, classification rule, leukemia diseases, microarray data

Procedia PDF Downloads 321
16886 Comparison of Different Methods to Produce Fuzzy Tolerance Relations for Rainfall Data Classification in the Region of Central Greece

Authors: N. Samarinas, C. Evangelides, C. Vrekos

Abstract:

The aim of this paper is the comparison of three different methods, in order to produce fuzzy tolerance relations for rainfall data classification. More specifically, the three methods are correlation coefficient, cosine amplitude and max-min method. The data were obtained from seven rainfall stations in the region of central Greece and refers to 20-year time series of monthly rainfall height average. Three methods were used to express these data as a fuzzy relation. This specific fuzzy tolerance relation is reformed into an equivalence relation with max-min composition for all three methods. From the equivalence relation, the rainfall stations were categorized and classified according to the degree of confidence. The classification shows the similarities among the rainfall stations. Stations with high similarity can be utilized in water resource management scenarios interchangeably or to augment data from one to another. Due to the complexity of calculations, it is important to find out which of the methods is computationally simpler and needs fewer compositions in order to give reliable results.

Keywords: classification, fuzzy logic, tolerance relations, rainfall data

Procedia PDF Downloads 314
16885 Drone Classification Using Classification Methods Using Conventional Model With Embedded Audio-Visual Features

Authors: Hrishi Rakshit, Pooneh Bagheri Zadeh

Abstract:

This paper investigates the performance of drone classification methods using conventional DCNN with different hyperparameters, when additional drone audio data is embedded in the dataset for training and further classification. In this paper, first a custom dataset is created using different images of drones from University of South California (USC) datasets and Leeds Beckett university datasets with embedded drone audio signal. The three well-known DCNN architectures namely, Resnet50, Darknet53 and Shufflenet are employed over the created dataset tuning their hyperparameters such as, learning rates, maximum epochs, Mini Batch size with different optimizers. Precision-Recall curves and F1 Scores-Threshold curves are used to evaluate the performance of the named classification algorithms. Experimental results show that Resnet50 has the highest efficiency compared to other DCNN methods.

Keywords: drone classifications, deep convolutional neural network, hyperparameters, drone audio signal

Procedia PDF Downloads 104
16884 Analyzing Tools and Techniques for Classification In Educational Data Mining: A Survey

Authors: D. I. George Amalarethinam, A. Emima

Abstract:

Educational Data Mining (EDM) is one of the newest topics to emerge in recent years, and it is concerned with developing methods for analyzing various types of data gathered from the educational circle. EDM methods and techniques with machine learning algorithms are used to extract meaningful and usable information from huge databases. For scientists and researchers, realistic applications of Machine Learning in the EDM sectors offer new frontiers and present new problems. One of the most important research areas in EDM is predicting student success. The prediction algorithms and techniques must be developed to forecast students' performance, which aids the tutor, institution to boost the level of student’s performance. This paper examines various classification techniques in prediction methods and data mining tools used in EDM.

Keywords: classification technique, data mining, EDM methods, prediction methods

Procedia PDF Downloads 117
16883 Comparative Analysis of Spectral Estimation Methods for Brain-Computer Interfaces

Authors: Rafik Djemili, Hocine Bourouba, M. C. Amara Korba

Abstract:

In this paper, we present a method in order to classify EEG signals for Brain-Computer Interfaces (BCI). EEG signals are first processed by means of spectral estimation methods to derive reliable features before classification step. Spectral estimation methods used are standard periodogram and the periodogram calculated by the Welch method; both methods are compared with Logarithm of Band Power (logBP) features. In the method proposed, we apply Linear Discriminant Analysis (LDA) followed by Support Vector Machine (SVM). Classification accuracy reached could be as high as 85%, which proves the effectiveness of classification of EEG signals based BCI using spectral methods.

Keywords: brain-computer interface, motor imagery, electroencephalogram, linear discriminant analysis, support vector machine

Procedia PDF Downloads 499
16882 New Approach to Construct Phylogenetic Tree

Authors: Ouafae Baida, Najma Hamzaoui, Maha Akbib, Abdelfettah Sedqui, Abdelouahid Lyhyaoui

Abstract:

Numerous scientific works present various methods to analyze the data for several domains, specially the comparison of classifications. In our recent work, we presented a new approach to help the user choose the best classification method from the results obtained by every method, by basing itself on the distances between the trees of classification. The result of our approach was in the form of a dendrogram contains methods as a succession of connections. This approach is much needed in phylogeny analysis. This discipline is intended to analyze the sequences of biological macro molecules for information on the evolutionary history of living beings, including their relationship. The product of phylogeny analysis is a phylogenetic tree. In this paper, we recommend the use of a new method of construction the phylogenetic tree based on comparison of different classifications obtained by different molecular genes.

Keywords: hierarchical classification, classification methods, structure of tree, genes, phylogenetic analysis

Procedia PDF Downloads 511
16881 Hyperspectral Image Classification Using Tree Search Algorithm

Authors: Shreya Pare, Parvin Akhter

Abstract:

Remotely sensing image classification becomes a very challenging task owing to the high dimensionality of hyperspectral images. The pixel-wise classification methods fail to take the spatial structure information of an image. Therefore, to improve the performance of classification, spatial information can be integrated into the classification process. In this paper, the multilevel thresholding algorithm based on a modified fuzzy entropy function is used to perform the segmentation of hyperspectral images. The fuzzy parameters of the MFE function have been optimized by using a new meta-heuristic algorithm based on the Tree-Search algorithm. The segmented image is classified by a large distribution machine (LDM) classifier. Experimental results are shown on a hyperspectral image dataset. The experimental outputs indicate that the proposed technique (MFE-TSA-LDM) achieves much higher classification accuracy for hyperspectral images when compared to state-of-art classification techniques. The proposed algorithm provides accurate segmentation and classification maps, thus becoming more suitable for image classification with large spatial structures.

Keywords: classification, hyperspectral images, large distribution margin, modified fuzzy entropy function, multilevel thresholding, tree search algorithm, hyperspectral image classification using tree search algorithm

Procedia PDF Downloads 177
16880 Arabic Text Classification: Review Study

Authors: M. Hijazi, A. Zeki, A. Ismail

Abstract:

An enormous amount of valuable human knowledge is preserved in documents. The rapid growth in the number of machine-readable documents for public or private access requires the use of automatic text classification. Text classification can be defined as assigning or structuring documents into a defined set of classes known in advance. Arabic text classification methods have emerged as a natural result of the existence of a massive amount of varied textual information written in the Arabic language on the web. This paper presents a review on the published researches of Arabic Text Classification using classical data representation, Bag of words (BoW), and using conceptual data representation based on semantic resources such as Arabic WordNet and Wikipedia.

Keywords: Arabic text classification, Arabic WordNet, bag of words, conceptual representation, semantic relations

Procedia PDF Downloads 427
16879 A Review of Effective Gene Selection Methods for Cancer Classification Using Microarray Gene Expression Profile

Authors: Hala Alshamlan, Ghada Badr, Yousef Alohali

Abstract:

Cancer is one of the dreadful diseases, which causes considerable death rate in humans. DNA microarray-based gene expression profiling has been emerged as an efficient technique for cancer classification, as well as for diagnosis, prognosis, and treatment purposes. In recent years, a DNA microarray technique has gained more attraction in both scientific and in industrial fields. It is important to determine the informative genes that cause cancer to improve early cancer diagnosis and to give effective chemotherapy treatment. In order to gain deep insight into the cancer classification problem, it is necessary to take a closer look at the proposed gene selection methods. We believe that they should be an integral preprocessing step for cancer classification. Furthermore, finding an accurate gene selection method is a very significant issue in a cancer classification area because it reduces the dimensionality of microarray dataset and selects informative genes. In this paper, we classify and review the state-of-art gene selection methods. We proceed by evaluating the performance of each gene selection approach based on their classification accuracy and number of informative genes. In our evaluation, we will use four benchmark microarray datasets for the cancer diagnosis (leukemia, colon, lung, and prostate). In addition, we compare the performance of gene selection method to investigate the effective gene selection method that has the ability to identify a small set of marker genes, and ensure high cancer classification accuracy. To the best of our knowledge, this is the first attempt to compare gene selection approaches for cancer classification using microarray gene expression profile.

Keywords: gene selection, feature selection, cancer classification, microarray, gene expression profile

Procedia PDF Downloads 455
16878 Sentiment Classification Using Enhanced Contextual Valence Shifters

Authors: Vo Ngoc Phu, Phan Thi Tuoi

Abstract:

We have explored different methods of improving the accuracy of sentiment classification. The sentiment orientation of a document can be positive (+), negative (-), or neutral (0). We combine five dictionaries from [2, 3, 4, 5, 6] into the new one with 21137 entries. The new dictionary has many verbs, adverbs, phrases and idioms, that are not in five ones before. The paper shows that our proposed method based on the combination of Term-Counting method and Enhanced Contextual Valence Shifters method has improved the accuracy of sentiment classification. The combined method has accuracy 68.984% on the testing dataset, and 69.224% on the training dataset. All of these methods are implemented to classify the reviews based on our new dictionary and the Internet Movie data set.

Keywords: sentiment classification, sentiment orientation, valence shifters, contextual, valence shifters, term counting

Procedia PDF Downloads 505
16877 A Quantitative Evaluation of Text Feature Selection Methods

Authors: B. S. Harish, M. B. Revanasiddappa

Abstract:

Due to rapid growth of text documents in digital form, automated text classification has become an important research in the last two decades. The major challenge of text document representations are high dimension, sparsity, volume and semantics. Since the terms are only features that can be found in documents, selection of good terms (features) plays an very important role. In text classification, feature selection is a strategy that can be used to improve classification effectiveness, computational efficiency and accuracy. In this paper, we present a quantitative analysis of most widely used feature selection (FS) methods, viz. Term Frequency-Inverse Document Frequency (tfidf ), Mutual Information (MI), Information Gain (IG), CHISquare (x2), Term Frequency-Relevance Frequency (tfrf ), Term Strength (TS), Ambiguity Measure (AM) and Symbolic Feature Selection (SFS) to classify text documents. We evaluated all the feature selection methods on standard datasets like 20 Newsgroups, 4 University dataset and Reuters-21578.

Keywords: classifiers, feature selection, text classification

Procedia PDF Downloads 460
16876 A Deep Learning Approach to Subsection Identification in Electronic Health Records

Authors: Nitin Shravan, Sudarsun Santhiappan, B. Sivaselvan

Abstract:

Subsection identification, in the context of Electronic Health Records (EHRs), is identifying the important sections for down-stream tasks like auto-coding. In this work, we classify the text present in EHRs according to their information, using machine learning and deep learning techniques. We initially describe briefly about the problem and formulate it as a text classification problem. Then, we discuss upon the methods from the literature. We try two approaches - traditional feature extraction based machine learning methods and deep learning methods. Through experiments on a private dataset, we establish that the deep learning methods perform better than the feature extraction based Machine Learning Models.

Keywords: deep learning, machine learning, semantic clinical classification, subsection identification, text classification

Procedia PDF Downloads 218
16875 Sensitive Analysis of the ZF Model for ABC Multi Criteria Inventory Classification

Authors: Makram Ben Jeddou

Abstract:

The ABC classification is widely used by managers for inventory control. The classical ABC classification is based on the Pareto principle and according to the criterion of the annual use value only. Single criterion classification is often insufficient for a closely inventory control. Multi-criteria inventory classification models have been proposed by researchers in order to take into account other important criteria. From these models, we will consider the ZF model in order to make a sensitive analysis on the composite score calculated for each item. In fact, this score based on a normalized average between a good and a bad optimized index can affect the ABC items classification. We will then focus on the weights assigned to each index and propose a classification compromise.

Keywords: ABC classification, multi criteria inventory classification models, ZF-model

Procedia PDF Downloads 508
16874 Performance Evaluation of Contemporary Classifiers for Automatic Detection of Epileptic EEG

Authors: K. E. Ch. Vidyasagar, M. Moghavvemi, T. S. S. T. Prabhat

Abstract:

Epilepsy is a global problem, and with seizures eluding even the smartest of diagnoses a requirement for automatic detection of the same using electroencephalogram (EEG) would have a huge impact in diagnosis of the disorder. Among a multitude of methods for automatic epilepsy detection, one should find the best method out, based on accuracy, for classification. This paper reasons out, and rationalizes, the best methods for classification. Accuracy is based on the classifier, and thus this paper discusses classifiers like quadratic discriminant analysis (QDA), classification and regression tree (CART), support vector machine (SVM), naive Bayes classifier (NBC), linear discriminant analysis (LDA), K-nearest neighbor (KNN) and artificial neural networks (ANN). Results show that ANN is the most accurate of all the above stated classifiers with 97.7% accuracy, 97.25% specificity and 98.28% sensitivity in its merit. This is followed closely by SVM with 1% variation in result. These results would certainly help researchers choose the best classifier for detection of epilepsy.

Keywords: classification, seizure, KNN, SVM, LDA, ANN, epilepsy

Procedia PDF Downloads 521
16873 Classification of Construction Projects

Authors: M. Safa, A. Sabet, S. MacGillivray, M. Davidson, K. Kaczmarczyk, C. T. Haas, G. E. Gibson, D. Rayside

Abstract:

To address construction project requirements and specifications, scholars and practitioners need to establish a taxonomy according to a scheme that best fits their need. While existing characterization methods are continuously being improved, new ones are devised to cover project properties which have not been previously addressed. One such method, the Project Definition Rating Index (PDRI), has received limited consideration strictly as a classification scheme. Developed by the Construction Industry Institute (CII) in 1996, the PDRI has been refined over the last two decades as a method for evaluating a project's scope definition completeness during front-end planning (FEP). The main contribution of this study is a review of practical project classification methods, and a discussion of how PDRI can be used to classify projects based on their readiness in the FEP phase. The proposed model has been applied to 59 construction projects in Ontario, and the results are discussed.

Keywords: project classification, project definition rating index (PDRI), risk, project goals alignment

Procedia PDF Downloads 679
16872 Effect of Signal Acquisition Procedure on Imagined Speech Classification Accuracy

Authors: M.R Asghari Bejestani, Gh. R. Mohammad Khani, V.R. Nafisi

Abstract:

Imagined speech recognition is one of the most interesting approaches to BCI development and a lot of works have been done in this area. Many different experiments have been designed and hundreds of combinations of feature extraction methods and classifiers have been examined. Reported classification accuracies range from the chance level to more than 90%. Based on non-stationary nature of brain signals, we have introduced 3 classification modes according to time difference in inter and intra-class samples. The modes can explain the diversity of reported results and predict the range of expected classification accuracies from the brain signal accusation procedure. In this paper, a few samples are illustrated by inspecting results of some previous works.

Keywords: brain computer interface, silent talk, imagined speech, classification, signal processing

Procedia PDF Downloads 153
16871 A Lightweight Pretrained Encrypted Traffic Classification Method with Squeeze-and-Excitation Block and Sharpness-Aware Optimization

Authors: Zhiyan Meng, Dan Liu, Jintao Meng

Abstract:

Dependable encrypted traffic classification is crucial for improving cybersecurity and handling the growing amount of data. Large language models have shown that learning from large datasets can be effective, making pre-trained methods for encrypted traffic classification popular. However, attention-based pre-trained methods face two main issues: their large neural parameters are not suitable for low-computation environments like mobile devices and real-time applications, and they often overfit by getting stuck in local minima. To address these issues, we developed a lightweight transformer model, which reduces the computational parameters through lightweight vocabulary construction and Squeeze-and-Excitation Block. We use sharpness-aware optimization to avoid local minima during pre-training and capture temporal features with relative positional embeddings. Our approach keeps the model's classification accuracy high for downstream tasks. We conducted experiments on four datasets -USTC-TFC2016, VPN 2016, Tor 2016, and CICIOT 2022. Even with fewer than 18 million parameters, our method achieves classification results similar to methods with ten times as many parameters.

Keywords: sharpness-aware optimization, encrypted traffic classification, squeeze-and-excitation block, pretrained model

Procedia PDF Downloads 31
16870 Classification of Attacks Over Cloud Environment

Authors: Karim Abouelmehdi, Loubna Dali, Elmoutaoukkil Abdelmajid, Hoda Elsayed, Eladnani Fatiha, Benihssane Abderahim

Abstract:

The security of cloud services is the concern of cloud service providers. In this paper, we will mention different classifications of cloud attacks referred by specialized organizations. Each agency has its classification of well-defined properties. The purpose is to present a high-level classification of current research in cloud computing security. This classification is organized around attack strategies and corresponding defenses.

Keywords: cloud computing, classification, risk, security

Procedia PDF Downloads 548
16869 Performance Comparison of ADTree and Naive Bayes Algorithms for Spam Filtering

Authors: Thanh Nguyen, Andrei Doncescu, Pierre Siegel

Abstract:

Classification is an important data mining technique and could be used as data filtering in artificial intelligence. The broad application of classification for all kind of data leads to be used in nearly every field of our modern life. Classification helps us to put together different items according to the feature items decided as interesting and useful. In this paper, we compare two classification methods Naïve Bayes and ADTree use to detect spam e-mail. This choice is motivated by the fact that Naive Bayes algorithm is based on probability calculus while ADTree algorithm is based on decision tree. The parameter settings of the above classifiers use the maximization of true positive rate and minimization of false positive rate. The experiment results present classification accuracy and cost analysis in view of optimal classifier choice for Spam Detection. It is point out the number of attributes to obtain a tradeoff between number of them and the classification accuracy.

Keywords: classification, data mining, spam filtering, naive bayes, decision tree

Procedia PDF Downloads 412
16868 Taxonomic Classification for Living Organisms Using Convolutional Neural Networks

Authors: Saed Khawaldeh, Mohamed Elsharnouby, Alaa Eddin Alchalabi, Usama Pervaiz, Tajwar Aleef, Vu Hoang Minh

Abstract:

Taxonomic classification has a wide-range of applications such as finding out more about the evolutionary history of organisms that can be done by making a comparison between species living now and species that lived in the past. This comparison can be made using different kinds of extracted species’ data which include DNA sequences. Compared to the estimated number of the organisms that nature harbours, humanity does not have a thorough comprehension of which specific species they all belong to, in spite of the significant development of science and scientific knowledge over many years. One of the methods that can be applied to extract information out of the study of organisms in this regard is to use the DNA sequence of a living organism as a marker, thus making it available to classify it into a taxonomy. The classification of living organisms can be done in many machine learning techniques including Neural Networks (NNs). In this study, DNA sequences classification is performed using Convolutional Neural Networks (CNNs) which is a special type of NNs.

Keywords: deep networks, convolutional neural networks, taxonomic classification, DNA sequences classification

Procedia PDF Downloads 442
16867 Feature Weighting Comparison Based on Clustering Centers in the Detection of Diabetic Retinopathy

Authors: Kemal Polat

Abstract:

In this paper, three feature weighting methods have been used to improve the classification performance of diabetic retinopathy (DR). To classify the diabetic retinopathy, features extracted from the output of several retinal image processing algorithms, such as image-level, lesion-specific and anatomical components, have been used and fed them into the classifier algorithms. The dataset used in this study has been taken from University of California, Irvine (UCI) machine learning repository. Feature weighting methods including the fuzzy c-means clustering based feature weighting, subtractive clustering based feature weighting, and Gaussian mixture clustering based feature weighting, have been used and compered with each other in the classification of DR. After feature weighting, five different classifier algorithms comprising multi-layer perceptron (MLP), k- nearest neighbor (k-NN), decision tree, support vector machine (SVM), and Naïve Bayes have been used. The hybrid method based on combination of subtractive clustering based feature weighting and decision tree classifier has been obtained the classification accuracy of 100% in the screening of DR. These results have demonstrated that the proposed hybrid scheme is very promising in the medical data set classification.

Keywords: machine learning, data weighting, classification, data mining

Procedia PDF Downloads 326
16866 U-Net Based Multi-Output Network for Lung Disease Segmentation and Classification Using Chest X-Ray Dataset

Authors: Jaiden X. Schraut

Abstract:

Medical Imaging Segmentation of Chest X-rays is used for the purpose of identification and differentiation of lung cancer, pneumonia, COVID-19, and similar respiratory diseases. Widespread application of computer-supported perception methods into the diagnostic pipeline has been demonstrated to increase prognostic accuracy and aid doctors in efficiently treating patients. Modern models attempt the task of segmentation and classification separately and improve diagnostic efficiency; however, to further enhance this process, this paper proposes a multi-output network that follows a U-Net architecture for image segmentation output and features an additional CNN module for auxiliary classification output. The proposed model achieves a final Jaccard Index of .9634 for image segmentation and a final accuracy of .9600 for classification on the COVID-19 radiography database.

Keywords: chest X-ray, deep learning, image segmentation, image classification

Procedia PDF Downloads 144
16865 Review and Comparison of Associative Classification Data Mining Approaches

Authors: Suzan Wedyan

Abstract:

Data mining is one of the main phases in the Knowledge Discovery Database (KDD) which is responsible of finding hidden and useful knowledge from databases. There are many different tasks for data mining including regression, pattern recognition, clustering, classification, and association rule. In recent years a promising data mining approach called associative classification (AC) has been proposed, AC integrates classification and association rule discovery to build classification models (classifiers). This paper surveys and critically compares several AC algorithms with reference of the different procedures are used in each algorithm, such as rule learning, rule sorting, rule pruning, classifier building, and class allocation for test cases.

Keywords: associative classification, classification, data mining, learning, rule ranking, rule pruning, prediction

Procedia PDF Downloads 537