Search results for: classification problem
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 8839

Search results for: classification problem

8779 Using Machine-Learning Methods for Allergen Amino Acid Sequence's Permutations

Authors: Kuei-Ling Sun, Emily Chia-Yu Su

Abstract:

Allergy is a hypersensitive overreaction of the immune system to environmental stimuli, and a major health problem. These overreactions include rashes, sneezing, fever, food allergies, anaphylaxis, asthmatic, shock, or other abnormal conditions. Allergies can be caused by food, insect stings, pollen, animal wool, and other allergens. Their development of allergies is due to both genetic and environmental factors. Allergies involve immunoglobulin E antibodies, a part of the body’s immune system. Immunoglobulin E antibodies will bind to an allergen and then transfer to a receptor on mast cells or basophils triggering the release of inflammatory chemicals such as histamine. Based on the increasingly serious problem of environmental change, changes in lifestyle, air pollution problem, and other factors, in this study, we both collect allergens and non-allergens from several databases and use several machine learning methods for classification, including logistic regression (LR), stepwise regression, decision tree (DT) and neural networks (NN) to do the model comparison and determine the permutations of allergen amino acid’s sequence.

Keywords: allergy, classification, decision tree, logistic regression, machine learning

Procedia PDF Downloads 277
8778 Performance Analysis of Artificial Neural Network Based Land Cover Classification

Authors: Najam Aziz, Nasru Minallah, Ahmad Junaid, Kashaf Gul

Abstract:

Landcover classification using automated classification techniques, while employing remotely sensed multi-spectral imagery, is one of the promising areas of research. Different land conditions at different time are captured through satellite and monitored by applying different classification algorithms in specific environment. In this paper, a SPOT-5 image provided by SUPARCO has been studied and classified in Environment for Visual Interpretation (ENVI), a tool widely used in remote sensing. Then, Artificial Neural Network (ANN) classification technique is used to detect the land cover changes in Abbottabad district. Obtained results are compared with a pixel based Distance classifier. The results show that ANN gives the better overall accuracy of 99.20% and Kappa coefficient value of 0.98 over the Mahalanobis Distance Classifier.

Keywords: landcover classification, artificial neural network, remote sensing, SPOT 5

Procedia PDF Downloads 504
8777 Scene Classification Using Hierarchy Neural Network, Directed Acyclic Graph Structure, and Label Relations

Authors: Po-Jen Chen, Jian-Jiun Ding, Hung-Wei Hsu, Chien-Yao Wang, Jia-Ching Wang

Abstract:

A more accurate scene classification algorithm using label relations and the hierarchy neural network was developed in this work. In many classification algorithms, it is assumed that the labels are mutually exclusive. This assumption is true in some specific problems, however, for scene classification, the assumption is not reasonable. Because there are a variety of objects with a photo image, it is more practical to assign multiple labels for an image. In this paper, two label relations, which are exclusive relation and hierarchical relation, were adopted in the classification process to achieve more accurate multiple label classification results. Moreover, the hierarchy neural network (hierarchy NN) is applied to classify the image and the directed acyclic graph structure is used for predicting a more reasonable result which obey exclusive and hierarchical relations. Simulations show that, with these techniques, a much more accurate scene classification result can be achieved.

Keywords: convolutional neural network, label relation, hierarchy neural network, scene classification

Procedia PDF Downloads 423
8776 Pruning Algorithm for the Minimum Rule Reduct Generation

Authors: Sahin Emrah Amrahov, Fatih Aybar, Serhat Dogan

Abstract:

In this paper we consider the rule reduct generation problem. Rule Reduct Generation (RG) and Modified Rule Generation (MRG) algorithms, that are used to solve this problem, are well-known. Alternative to these algorithms, we develop Pruning Rule Generation (PRG) algorithm. We compare the PRG algorithm with RG and MRG.

Keywords: rough sets, decision rules, rule induction, classification

Procedia PDF Downloads 499
8775 Effective Parameter Selection for Audio-Based Music Mood Classification for Christian Kokborok Song: A Regression-Based Approach

Authors: Sanchali Das, Swapan Debbarma

Abstract:

Music mood classification is developing in both the areas of music information retrieval (MIR) and natural language processing (NLP). Some of the Indian languages like Hindi English etc. have considerable exposure in MIR. But research in mood classification in regional language is very less. In this paper, powerful audio based feature for Kokborok Christian song is identified and mood classification task has been performed. Kokborok is an Indo-Burman language especially spoken in the northeastern part of India and also some other countries like Bangladesh, Myanmar etc. For performing audio-based classification task, useful audio features are taken out by jMIR software. There are some standard audio parameters are there for the audio-based task but as known to all that every language has its unique characteristics. So here, the most significant features which are the best fit for the database of Kokborok song is analysed. The regression-based model is used to find out the independent parameters that act as a predictor and predicts the dependencies of parameters and shows how it will impact on overall classification result. For classification WEKA 3.5 is used, and selected parameters create a classification model. And another model is developed by using all the standard audio features that are used by most of the researcher. In this experiment, the essential parameters that are responsible for effective audio based mood classification and parameters that do not significantly change for each of the Christian Kokborok songs are analysed, and a comparison is also shown between the two above model.

Keywords: Christian Kokborok song, mood classification, music information retrieval, regression

Procedia PDF Downloads 191
8774 Performance Comparison of ADTree and Naive Bayes Algorithms for Spam Filtering

Authors: Thanh Nguyen, Andrei Doncescu, Pierre Siegel

Abstract:

Classification is an important data mining technique and could be used as data filtering in artificial intelligence. The broad application of classification for all kind of data leads to be used in nearly every field of our modern life. Classification helps us to put together different items according to the feature items decided as interesting and useful. In this paper, we compare two classification methods Naïve Bayes and ADTree use to detect spam e-mail. This choice is motivated by the fact that Naive Bayes algorithm is based on probability calculus while ADTree algorithm is based on decision tree. The parameter settings of the above classifiers use the maximization of true positive rate and minimization of false positive rate. The experiment results present classification accuracy and cost analysis in view of optimal classifier choice for Spam Detection. It is point out the number of attributes to obtain a tradeoff between number of them and the classification accuracy.

Keywords: classification, data mining, spam filtering, naive bayes, decision tree

Procedia PDF Downloads 384
8773 Enhanced Image Representation for Deep Belief Network Classification of Hyperspectral Images

Authors: Khitem Amiri, Mohamed Farah

Abstract:

Image classification is a challenging task and is gaining lots of interest since it helps us to understand the content of images. Recently Deep Learning (DL) based methods gave very interesting results on several benchmarks. For Hyperspectral images (HSI), the application of DL techniques is still challenging due to the scarcity of labeled data and to the curse of dimensionality. Among other approaches, Deep Belief Network (DBN) based approaches gave a fair classification accuracy. In this paper, we address the problem of the curse of dimensionality by reducing the number of bands and replacing the HSI channels by the channels representing radiometric indices. Therefore, instead of using all the HSI bands, we compute the radiometric indices such as NDVI (Normalized Difference Vegetation Index), NDWI (Normalized Difference Water Index), etc, and we use the combination of these indices as input for the Deep Belief Network (DBN) based classification model. Thus, we keep almost all the pertinent spectral information while reducing considerably the size of the image. In order to test our image representation, we applied our method on several HSI datasets including the Indian pines dataset, Jasper Ridge data and it gave comparable results to the state of the art methods while reducing considerably the time of training and testing.

Keywords: hyperspectral images, deep belief network, radiometric indices, image classification

Procedia PDF Downloads 244
8772 An Investigation into Fraud Detection in Financial Reporting Using Sugeno Fuzzy Classification

Authors: Mohammad Sarchami, Mohsen Zeinalkhani

Abstract:

Always, financial reporting system faces some problems to win public ear. The increase in the number of fraud and representation, often combined with the bankruptcy of large companies, has raised concerns about the quality of financial statements. So, investors, legislators, managers, and auditors have focused on significant fraud detection or prevention in financial statements. This article aims to investigate the Sugeno fuzzy classification to consider fraud detection in financial reporting of accepted firms by Tehran stock exchange. The hypothesis is: Sugeno fuzzy classification may detect fraud in financial reporting by financial ratio. Hypothesis was tested using Matlab software. Accuracy average was 81/80 in Sugeno fuzzy classification; so the hypothesis was confirmed.

Keywords: fraud, financial reporting, Sugeno fuzzy classification, firm

Procedia PDF Downloads 220
8771 Heart Failure Identification and Progression by Classifying Cardiac Patients

Authors: Muhammad Saqlain, Nazar Abbas Saqib, Muazzam A. Khan

Abstract:

Heart Failure (HF) has become the major health problem in our society. The prevalence of HF has increased as the patient’s ages and it is the major cause of the high mortality rate in adults. A successful identification and progression of HF can be helpful to reduce the individual and social burden from this syndrome. In this study, we use a real data set of cardiac patients to propose a classification model for the identification and progression of HF. The data set has divided into three age groups, namely young, adult, and old and then each age group have further classified into four classes according to patient’s current physical condition. Contemporary Data Mining classification algorithms have been applied to each individual class of every age group to identify the HF. Decision Tree (DT) gives the highest accuracy of 90% and outperform all other algorithms. Our model accurately diagnoses different stages of HF for each age group and it can be very useful for the early prediction of HF.

Keywords: decision tree, heart failure, data mining, classification model

Procedia PDF Downloads 380
8770 Effect of Personality Traits on Classification of Political Orientation

Authors: Vesile Evrim, Aliyu Awwal

Abstract:

Today as in the other domains, there are an enormous number of political transcripts available in the Web which is waiting to be mined and used for various purposes such as statistics and recommendations. Therefore, automatically determining the political orientation on these transcripts becomes crucial. The methodologies used by machine learning algorithms to do the automatic classification are based on different features such as Linguistic. Considering the ideology differences between Liberals and Conservatives, in this paper, the effect of Personality Traits on political orientation classification is studied. This is done by considering the correlation between LIWC features and the BIG Five Personality Traits. Several experiments are conducted on Convote U.S. Congressional-Speech dataset with seven benchmark classification algorithms. The different methodologies are applied on selecting different feature sets that constituted by 8 to 64 varying number of features. While Neuroticism is obtained to be the most differentiating personality trait on classification of political polarity, when its top 10 representative features are combined with several classification algorithms, it outperformed the results presented in previous research.

Keywords: politics, personality traits, LIWC, machine learning

Procedia PDF Downloads 465
8769 The Menu Planning Problem: A Systematic Literature Review

Authors: Dorra Kallel, Ines Kanoun, Diala Dhouib

Abstract:

This paper elaborates a Systematic Literature Review SLR) to select the most outstanding studies that address the Menu Planning Problem (MPP) and to classify them according to the to the three following criteria: the used methods, types of patients and the required constraints. At first, a set of 4165 studies was selected. After applying the SLR’s guidelines, this collection was filtered to 13 studies using specific inclusion and exclusion criteria as well as an accurate analysis of each study. Second, the selected papers were invested to answer the proposed research questions. Finally, data synthesis and new perspectives for future works are incorporated in the closing section.

Keywords: Menu Planning Problem (MPP), Systematic Literature Review (SLR), classification, exact and approaches methods

Procedia PDF Downloads 239
8768 Obstacle Classification Method Based on 2D LIDAR Database

Authors: Moohyun Lee, Soojung Hur, Yongwan Park

Abstract:

In this paper is proposed a method uses only LIDAR system to classification an obstacle and determine its type by establishing database for classifying obstacles based on LIDAR. The existing LIDAR system, in determining the recognition of obstruction in an autonomous vehicle, has an advantage in terms of accuracy and shorter recognition time. However, it was difficult to determine the type of obstacle and therefore accurate path planning based on the type of obstacle was not possible. In order to overcome this problem, a method of classifying obstacle type based on existing LIDAR and using the width of obstacle materials was proposed. However, width measurement was not sufficient to improve accuracy. In this research, the width data was used to do the first classification; database for LIDAR intensity data by four major obstacle materials on the road were created; comparison is made to the LIDAR intensity data of actual obstacle materials; and determine the obstacle type by finding the one with highest similarity values. An experiment using an actual autonomous vehicle under real environment shows that data declined in quality in comparison to 3D LIDAR and it was possible to classify obstacle materials using 2D LIDAR.

Keywords: obstacle, classification, database, LIDAR, segmentation, intensity

Procedia PDF Downloads 311
8767 Efficient Fuzzy Classified Cryptographic Model for Intelligent Encryption Technique towards E-Banking XML Transactions

Authors: Maher Aburrous, Adel Khelifi, Manar Abu Talib

Abstract:

Transactions performed by financial institutions on daily basis require XML encryption on large scale. Encrypting large volume of message fully will result both performance and resource issues. In this paper a novel approach is presented for securing financial XML transactions using classification data mining (DM) algorithms. Our strategy defines the complete process of classifying XML transactions by using set of classification algorithms, classified XML documents processed at later stage using element-wise encryption. Classification algorithms were used to identify the XML transaction rules and factors in order to classify the message content fetching important elements within. We have implemented four classification algorithms to fetch the importance level value within each XML document. Classified content is processed using element-wise encryption for selected parts with "High", "Medium" or “Low” importance level values. Element-wise encryption is performed using AES symmetric encryption algorithm and proposed modified algorithm for AES to overcome the problem of computational overhead, in which substitute byte, shift row will remain as in the original AES while mix column operation is replaced by 128 permutation operation followed by add round key operation. An implementation has been conducted using data set fetched from e-banking service to present system functionality and efficiency. Results from our implementation showed a clear improvement in processing time encrypting XML documents.

Keywords: XML transaction, encryption, Advanced Encryption Standard (AES), XML classification, e-banking security, fuzzy classification, cryptography, intelligent encryption

Procedia PDF Downloads 378
8766 The International Classification of Functioning, Disability and Health (ICF) as a Problem-Solving Tool in Disability Rehabilitation and Education Alliance in Metabolic Disorders (DREAM) at Sultan Bin Abdul Aziz Humanitarian City:A Prototype for Reh

Authors: Hamzeh Awad

Abstract:

Disability is considered to be a worldwide complex phenomenon which rising at a phenomenal rate and caused by many different factors. Chronic diseases such as cardiovascular disease and diabetes can lead to mobility disability in particular and disability in general. The ICF is an integrative bio-psycho-social model of functioning and disability and considered by the World Health Organization (WHO) to be a reference for disability classification using its categories and core set to classify disorder’s functional limitations. Specialist programs at Sultan Bin Abdul Aziz Humanitarian City (SBAHC) are providing both inpatient and outpatient services have started to implement the ICF and use it as a problem solving tool in Rehab. Diabetes is leading contributing factor for disability and considered epidemic in several Gulf countries including the Kingdom of Saudi Arabia (KSA), where its prevalence continues to increase dramatically. Metabolic disorders, mainly diabetes are not well covered in Rehab field. The purpose of this study is present to research and clinical rehabilitation field of DREAM and ICF as a framework in clinical and research setting in Rehab service. Also, shed the light on using the ICF as problem solving tool at SBAHC. There are synergies between disability causes and wider public health priorities in relation to both chronic disease and disability prevention. Therefore, there is a need for strong advocacy and understanding of the role of ICF as a reference in Rehab settings in Middle East if we wish to seize the opportunity to reverse current trends of acquired disability in the region.

Keywords: international classification of functioning, disability and health (ICF), prototype, rehabilitation and diabetes

Procedia PDF Downloads 321
8765 Effect of Signal Acquisition Procedure on Imagined Speech Classification Accuracy

Authors: M.R Asghari Bejestani, Gh. R. Mohammad Khani, V.R. Nafisi

Abstract:

Imagined speech recognition is one of the most interesting approaches to BCI development and a lot of works have been done in this area. Many different experiments have been designed and hundreds of combinations of feature extraction methods and classifiers have been examined. Reported classification accuracies range from the chance level to more than 90%. Based on non-stationary nature of brain signals, we have introduced 3 classification modes according to time difference in inter and intra-class samples. The modes can explain the diversity of reported results and predict the range of expected classification accuracies from the brain signal accusation procedure. In this paper, a few samples are illustrated by inspecting results of some previous works.

Keywords: brain computer interface, silent talk, imagined speech, classification, signal processing

Procedia PDF Downloads 121
8764 Towards a Balancing Medical Database by Using the Least Mean Square Algorithm

Authors: Kamel Belammi, Houria Fatrim

Abstract:

imbalanced data set, a problem often found in real world application, can cause seriously negative effect on classification performance of machine learning algorithms. There have been many attempts at dealing with classification of imbalanced data sets. In medical diagnosis classification, we often face the imbalanced number of data samples between the classes in which there are not enough samples in rare classes. In this paper, we proposed a learning method based on a cost sensitive extension of Least Mean Square (LMS) algorithm that penalizes errors of different samples with different weight and some rules of thumb to determine those weights. After the balancing phase, we applythe different classifiers (support vector machine (SVM), k- nearest neighbor (KNN) and multilayer neuronal networks (MNN)) for balanced data set. We have also compared the obtained results before and after balancing method.

Keywords: multilayer neural networks, k- nearest neighbor, support vector machine, imbalanced medical data, least mean square algorithm, diabetes

Procedia PDF Downloads 497
8763 Evaluation of Vehicle Classification Categories: Florida Case Study

Authors: Ren Moses, Jaqueline Masaki

Abstract:

This paper addresses the need for accurate and updated vehicle classification system through a thorough evaluation of vehicle class categories to identify errors arising from the existing system and proposing modifications. The data collected from two permanent traffic monitoring sites in Florida were used to evaluate the performance of the existing vehicle classification table. The vehicle data were collected and classified by the automatic vehicle classifier (AVC), and a video camera was used to obtain ground truth data. The Federal Highway Administration (FHWA) vehicle classification definitions were used to define vehicle classes from the video and compare them to the data generated by AVC in order to identify the sources of misclassification. Six types of errors were identified. Modifications were made in the classification table to improve the classification accuracy. The results of this study include the development of updated vehicle classification table with a reduction in total error by 5.1%, a step by step procedure to use for evaluation of vehicle classification studies and recommendations to improve FHWA 13-category rule set. The recommendations for the FHWA 13-category rule set indicate the need for the vehicle classification definitions in this scheme to be updated to reflect the distribution of current traffic. The presented results will be of interest to States’ transportation departments and consultants, researchers, engineers, designers, and planners who require accurate vehicle classification information for planning, designing and maintenance of transportation infrastructures.

Keywords: vehicle classification, traffic monitoring, pavement design, highway traffic

Procedia PDF Downloads 160
8762 Comparative Analysis of Classification Methods in Determining Non-Active Student Characteristics in Indonesia Open University

Authors: Dewi Juliah Ratnaningsih, Imas Sukaesih Sitanggang

Abstract:

Classification is one of data mining techniques that aims to discover a model from training data that distinguishes records into the appropriate category or class. Data mining classification methods can be applied in education, for example, to determine the classification of non-active students in Indonesia Open University. This paper presents a comparison of three methods of classification: Naïve Bayes, Bagging, and C.45. The criteria used to evaluate the performance of three methods of classification are stratified cross-validation, confusion matrix, the value of the area under the ROC Curve (AUC), Recall, Precision, and F-measure. The data used for this paper are from the non-active Indonesia Open University students in registration period of 2004.1 to 2012.2. Target analysis requires that non-active students were divided into 3 groups: C1, C2, and C3. Data analyzed are as many as 4173 students. Results of the study show: (1) Bagging method gave a high degree of classification accuracy than Naïve Bayes and C.45, (2) the Bagging classification accuracy rate is 82.99 %, while the Naïve Bayes and C.45 are 80.04 % and 82.74 % respectively, (3) the result of Bagging classification tree method has a large number of nodes, so it is quite difficult in decision making, (4) classification of non-active Indonesia Open University student characteristics uses algorithms C.45, (5) based on the algorithm C.45, there are 5 interesting rules which can describe the characteristics of non-active Indonesia Open University students.

Keywords: comparative analysis, data mining, clasiffication, Bagging, Naïve Bayes, C.45, non-active students, Indonesia Open University

Procedia PDF Downloads 289
8761 Ensemble-Based SVM Classification Approach for miRNA Prediction

Authors: Sondos M. Hammad, Sherin M. ElGokhy, Mahmoud M. Fahmy, Elsayed A. Sallam

Abstract:

In this paper, an ensemble-based Support Vector Machine (SVM) classification approach is proposed. It is used for miRNA prediction. Three problems, commonly associated with previous approaches, are alleviated. These problems arise due to impose assumptions on the secondary structural of premiRNA, imbalance between the numbers of the laboratory checked miRNAs and the pseudo-hairpins, and finally using a training data set that does not consider all the varieties of samples in different species. We aggregate the predicted outputs of three well-known SVM classifiers; namely, Triplet-SVM, Virgo and Mirident, weighted by their variant features without any structural assumptions. An additional SVM layer is used in aggregating the final output. The proposed approach is trained and then tested with balanced data sets. The results of the proposed approach outperform the three base classifiers. Improved values for the metrics of 88.88% f-score, 92.73% accuracy, 90.64% precision, 96.64% specificity, 87.2% sensitivity, and the area under the ROC curve is 0.91 are achieved.

Keywords: MiRNAs, SVM classification, ensemble algorithm, assumption problem, imbalance data

Procedia PDF Downloads 308
8760 Comparative Analysis of Feature Extraction and Classification Techniques

Authors: R. L. Ujjwal, Abhishek Jain

Abstract:

In the field of computer vision, most facial variations such as identity, expression, emotions and gender have been extensively studied. Automatic age estimation has been rarely explored. With age progression of a human, the features of the face changes. This paper is providing a new comparable study of different type of algorithm to feature extraction [Hybrid features using HAAR cascade & HOG features] & classification [KNN & SVM] training dataset. By using these algorithms we are trying to find out one of the best classification algorithms. Same thing we have done on the feature selection part, we extract the feature by using HAAR cascade and HOG. This work will be done in context of age group classification model.

Keywords: computer vision, age group, face detection

Procedia PDF Downloads 331
8759 Selection of Appropriate Classification Technique for Lithological Mapping of Gali Jagir Area, Pakistan

Authors: Khunsa Fatima, Umar K. Khattak, Allah Bakhsh Kausar

Abstract:

Satellite images interpretation and analysis assist geologists by providing valuable information about geology and minerals of an area to be surveyed. A test site in Fatejang of district Attock has been studied using Landsat ETM+ and ASTER satellite images for lithological mapping. Five different supervised image classification techniques namely maximum likelihood, parallelepiped, minimum distance to mean, mahalanobis distance and spectral angle mapper have been performed on both satellite data images to find out the suitable classification technique for lithological mapping in the study area. Results of these five image classification techniques were compared with the geological map produced by Geological Survey of Pakistan. The result of maximum likelihood classification technique applied on ASTER satellite image has the highest correlation of 0.66 with the geological map. Field observations and XRD spectra of field samples also verified the results. A lithological map was then prepared based on the maximum likelihood classification of ASTER satellite image.

Keywords: ASTER, Landsat-ETM+, satellite, image classification

Procedia PDF Downloads 363
8758 Facial Pose Classification Using Hilbert Space Filling Curve and Multidimensional Scaling

Authors: Mekamı Hayet, Bounoua Nacer, Benabderrahmane Sidahmed, Taleb Ahmed

Abstract:

Pose estimation is an important task in computer vision. Though the majority of the existing solutions provide good accuracy results, they are often overly complex and computationally expensive. In this perspective, we propose the use of dimensionality reduction techniques to address the problem of facial pose estimation. Firstly, a face image is converted into one-dimensional time series using Hilbert space filling curve, then the approach converts these time series data to a symbolic representation. Furthermore, a distance matrix is calculated between symbolic series of an input learning dataset of images, to generate classifiers of frontal vs. profile face pose. The proposed method is evaluated with three public datasets. Experimental results have shown that our approach is able to achieve a correct classification rate exceeding 97% with K-NN algorithm.

Keywords: machine learning, pattern recognition, facial pose classification, time series

Procedia PDF Downloads 322
8757 Taxonomic Classification for Living Organisms Using Convolutional Neural Networks

Authors: Saed Khawaldeh, Mohamed Elsharnouby, Alaa Eddin Alchalabi, Usama Pervaiz, Tajwar Aleef, Vu Hoang Minh

Abstract:

Taxonomic classification has a wide-range of applications such as finding out more about the evolutionary history of organisms that can be done by making a comparison between species living now and species that lived in the past. This comparison can be made using different kinds of extracted species’ data which include DNA sequences. Compared to the estimated number of the organisms that nature harbours, humanity does not have a thorough comprehension of which specific species they all belong to, in spite of the significant development of science and scientific knowledge over many years. One of the methods that can be applied to extract information out of the study of organisms in this regard is to use the DNA sequence of a living organism as a marker, thus making it available to classify it into a taxonomy. The classification of living organisms can be done in many machine learning techniques including Neural Networks (NNs). In this study, DNA sequences classification is performed using Convolutional Neural Networks (CNNs) which is a special type of NNs.

Keywords: deep networks, convolutional neural networks, taxonomic classification, DNA sequences classification

Procedia PDF Downloads 404
8756 A Ratio-Weighted Decision Tree Algorithm for Imbalance Dataset Classification

Authors: Doyin Afolabi, Phillip Adewole, Oladipupo Sennaike

Abstract:

Most well-known classifiers, including the decision tree algorithm, can make predictions on balanced datasets efficiently. However, the decision tree algorithm tends to be biased towards imbalanced datasets because of the skewness of the distribution of such datasets. To overcome this problem, this study proposes a weighted decision tree algorithm that aims to remove the bias toward the majority class and prevents the reduction of majority observations in imbalance datasets classification. The proposed weighted decision tree algorithm was tested on three imbalanced datasets- cancer dataset, german credit dataset, and banknote dataset. The specificity, sensitivity, and accuracy metrics were used to evaluate the performance of the proposed decision tree algorithm on the datasets. The evaluation results show that for some of the weights of our proposed decision tree, the specificity, sensitivity, and accuracy metrics gave better results compared to that of the ID3 decision tree and decision tree induced with minority entropy for all three datasets.

Keywords: data mining, decision tree, classification, imbalance dataset

Procedia PDF Downloads 94
8755 Data Quality Enhancement with String Length Distribution

Authors: Qi Xiu, Hiromu Hota, Yohsuke Ishii, Takuya Oda

Abstract:

Recently, collectable manufacturing data are rapidly increasing. On the other hand, mega recall is getting serious as a social problem. Under such circumstances, there are increasing needs for preventing mega recalls by defect analysis such as root cause analysis and abnormal detection utilizing manufacturing data. However, the time to classify strings in manufacturing data by traditional method is too long to meet requirement of quick defect analysis. Therefore, we present String Length Distribution Classification method (SLDC) to correctly classify strings in a short time. This method learns character features, especially string length distribution from Product ID, Machine ID in BOM and asset list. By applying the proposal to strings in actual manufacturing data, we verified that the classification time of strings can be reduced by 80%. As a result, it can be estimated that the requirement of quick defect analysis can be fulfilled.

Keywords: string classification, data quality, feature selection, probability distribution, string length

Procedia PDF Downloads 286
8754 Tomato-Weed Classification by RetinaNet One-Step Neural Network

Authors: Dionisio Andujar, Juan lópez-Correa, Hugo Moreno, Angela Ri

Abstract:

The increased number of weeds in tomato crops highly lower yields. Weed identification with the aim of machine learning is important to carry out site-specific control. The last advances in computer vision are a powerful tool to face the problem. The analysis of RGB (Red, Green, Blue) images through Artificial Neural Networks had been rapidly developed in the past few years, providing new methods for weed classification. The development of the algorithms for crop and weed species classification looks for a real-time classification system using Object Detection algorithms based on Convolutional Neural Networks. The site study was located in commercial corn fields. The classification system has been tested. The procedure can detect and classify weed seedlings in tomato fields. The input to the Neural Network was a set of 10,000 RGB images with a natural infestation of Cyperus rotundus l., Echinochloa crus galli L., Setaria italica L., Portulaca oeracea L., and Solanum nigrum L. The validation process was done with a random selection of RGB images containing the aforementioned species. The mean average precision (mAP) was established as the metric for object detection. The results showed agreements higher than 95 %. The system will provide the input for an online spraying system. Thus, this work plays an important role in Site Specific Weed Management by reducing herbicide use in a single step.

Keywords: deep learning, object detection, cnn, tomato, weeds

Procedia PDF Downloads 77
8753 Reliable Soup: Reliable-Driven Model Weight Fusion on Ultrasound Imaging Classification

Authors: Shuge Lei, Haonan Hu, Dasheng Sun, Huabin Zhang, Kehong Yuan, Jian Dai, Yan Tong

Abstract:

It remains challenging to measure reliability from classification results from different machine learning models. This paper proposes a reliable soup optimization algorithm based on the model weight fusion algorithm Model Soup, aiming to improve reliability by using dual-channel reliability as the objective function to fuse a series of weights in the breast ultrasound classification models. Experimental results on breast ultrasound clinical datasets demonstrate that reliable soup significantly enhances the reliability of breast ultrasound image classification tasks. The effectiveness of the proposed approach was verified via multicenter trials. The results from five centers indicate that the reliability optimization algorithm can enhance the reliability of the breast ultrasound image classification model and exhibit low multicenter correlation.

Keywords: breast ultrasound image classification, feature attribution, reliability assessment, reliability optimization

Procedia PDF Downloads 49
8752 Predication Model for Leukemia Diseases Based on Data Mining Classification Algorithms with Best Accuracy

Authors: Fahd Sabry Esmail, M. Badr Senousy, Mohamed Ragaie

Abstract:

In recent years, there has been an explosion in the rate of using technology that help discovering the diseases. For example, DNA microarrays allow us for the first time to obtain a "global" view of the cell. It has great potential to provide accurate medical diagnosis, to help in finding the right treatment and cure for many diseases. Various classification algorithms can be applied on such micro-array datasets to devise methods that can predict the occurrence of Leukemia disease. In this study, we compared the classification accuracy and response time among eleven decision tree methods and six rule classifier methods using five performance criteria. The experiment results show that the performance of Random Tree is producing better result. Also it takes lowest time to build model in tree classifier. The classification rules algorithms such as nearest- neighbor-like algorithm (NNge) is the best algorithm due to the high accuracy and it takes lowest time to build model in classification.

Keywords: data mining, classification techniques, decision tree, classification rule, leukemia diseases, microarray data

Procedia PDF Downloads 292
8751 Multi-Criteria Inventory Classification Process Based on Logical Analysis of Data

Authors: Diana López-Soto, Soumaya Yacout, Francisco Ángel-Bello

Abstract:

Although inventories are considered as stocks of money sitting on shelve, they are needed in order to secure a constant and continuous production. Therefore, companies need to have control over the amount of inventory in order to find the balance between excessive and shortage of inventory. The classification of items according to certain criteria such as the price, the usage rate and the lead time before arrival allows any company to concentrate its investment in inventory according to certain ranking or priority of items. This makes the decision making process for inventory management easier and more justifiable. The purpose of this paper is to present a new approach for the classification of new items based on the already existing criteria. This approach is called the Logical Analysis of Data (LAD). It is used in this paper to assist the process of ABC items classification based on multiple criteria. LAD is a data mining technique based on Boolean theory that is used for pattern recognition. This technique has been tested in medicine, industry, credit risk analysis, and engineering with remarkable results. An application on ABC inventory classification is presented for the first time, and the results are compared with those obtained when using the well-known AHP technique and the ANN technique. The results show that LAD presented very good classification accuracy.

Keywords: ABC multi-criteria inventory classification, inventory management, multi-class LAD model, multi-criteria classification

Procedia PDF Downloads 845
8750 Machine Learning-Enabled Classification of Climbing Using Small Data

Authors: Nicholas Milburn, Yu Liang, Dalei Wu

Abstract:

Athlete performance scoring within the climbing do-main presents interesting challenges as the sport does not have an objective way to assign skill. Assessing skill levels within any sport is valuable as it can be used to mark progress while training, and it can help an athlete choose appropriate climbs to attempt. Machine learning-based methods are popular for complex problems like this. The dataset available was composed of dynamic force data recorded during climbing; however, this dataset came with challenges such as data scarcity, imbalance, and it was temporally heterogeneous. Investigated solutions to these challenges include data augmentation, temporal normalization, conversion of time series to the spectral domain, and cross validation strategies. The investigated solutions to the classification problem included light weight machine classifiers KNN and SVM as well as the deep learning with CNN. The best performing model had an 80% accuracy. In conclusion, there seems to be enough information within climbing force data to accurately categorize climbers by skill.

Keywords: classification, climbing, data imbalance, data scarcity, machine learning, time sequence

Procedia PDF Downloads 117