Search results for: one-class classification method.
8306 Time Series Regression with Meta-Clusters
Authors: Monika Chuchro
Abstract:
This paper presents a preliminary attempt to apply classification of time series using meta-clusters in order to improve the quality of regression models. In this case, clustering was performed as a method to obtain subgroups of time series data with normal distribution from the inflow into wastewater treatment plant data, composed of several groups differing by mean value. Two simple algorithms, K-mean and EM, were chosen as a clustering method. The Rand index was used to measure the similarity. After simple meta-clustering, a regression model was performed for each subgroups. The final model was a sum of the subgroups models. The quality of the obtained model was compared with the regression model made using the same explanatory variables, but with no clustering of data. Results were compared using determination coefficient (R2), measure of prediction accuracy- mean absolute percentage error (MAPE) and comparison on a linear chart. Preliminary results allow us to foresee the potential of the presented technique.
Keywords: Clustering, Data analysis, Data mining, Predictive models.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19518305 Hybrid Approach for Software Defect Prediction Using Machine Learning with Optimization Technique
Authors: C. Manjula, Lilly Florence
Abstract:
Software technology is developing rapidly which leads to the growth of various industries. Now-a-days, software-based applications have been adopted widely for business purposes. For any software industry, development of reliable software is becoming a challenging task because a faulty software module may be harmful for the growth of industry and business. Hence there is a need to develop techniques which can be used for early prediction of software defects. Due to complexities in manual prediction, automated software defect prediction techniques have been introduced. These techniques are based on the pattern learning from the previous software versions and finding the defects in the current version. These techniques have attracted researchers due to their significant impact on industrial growth by identifying the bugs in software. Based on this, several researches have been carried out but achieving desirable defect prediction performance is still a challenging task. To address this issue, here we present a machine learning based hybrid technique for software defect prediction. First of all, Genetic Algorithm (GA) is presented where an improved fitness function is used for better optimization of features in data sets. Later, these features are processed through Decision Tree (DT) classification model. Finally, an experimental study is presented where results from the proposed GA-DT based hybrid approach is compared with those from the DT classification technique. The results show that the proposed hybrid approach achieves better classification accuracy.
Keywords: Decision tree, genetic algorithm, machine learning, software defect prediction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14658304 Non-negative Principal Component Analysis for Face Recognition
Abstract:
Principle component analysis is often combined with the state-of-art classification algorithms to recognize human faces. However, principle component analysis can only capture these features contributing to the global characteristics of data because it is a global feature selection algorithm. It misses those features contributing to the local characteristics of data because each principal component only contains some levels of global characteristics of data. In this study, we present a novel face recognition approach using non-negative principal component analysis which is added with the constraint of non-negative to improve data locality and contribute to elucidating latent data structures. Experiments are performed on the Cambridge ORL face database. We demonstrate the strong performances of the algorithm in recognizing human faces in comparison with PCA and NREMF approaches.Keywords: classification, face recognition, non-negativeprinciple component analysis (NPCA)
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16958303 A Case-Based Reasoning-Decision Tree Hybrid System for Stock Selection
Authors: Yaojun Wang, Yaoqing Wang
Abstract:
Stock selection is an important decision-making problem. Many machine learning and data mining technologies are employed to build automatic stock-selection system. A profitable stock-selection system should consider the stock’s investment value and the market timing. In this paper, we present a hybrid system including both engage for stock selection. This system uses a case-based reasoning (CBR) model to execute the stock classification, uses a decision-tree model to help with market timing and stock selection. The experiments show that the performance of this hybrid system is better than that of other techniques regarding to the classification accuracy, the average return and the Sharpe ratio.Keywords: Case-based reasoning, decision tree, stock selection, machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17058302 Identity Verification Using k-NN Classifiers and Autistic Genetic Data
Authors: Fuad M. Alkoot
Abstract:
DNA data have been used in forensics for decades. However, current research looks at using the DNA as a biometric identity verification modality. The goal is to improve the speed of identification. We aim at using gene data that was initially used for autism detection to find if and how accurate is this data for identification applications. Mainly our goal is to find if our data preprocessing technique yields data useful as a biometric identification tool. We experiment with using the nearest neighbor classifier to identify subjects. Results show that optimal classification rate is achieved when the test set is corrupted by normally distributed noise with zero mean and standard deviation of 1. The classification rate is close to optimal at higher noise standard deviation reaching 3. This shows that the data can be used for identity verification with high accuracy using a simple classifier such as the k-nearest neighbor (k-NN).
Keywords: Biometrics, identity verification, genetic data, k-nearest neighbor.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11208301 Roof Material Detection Based on Object-Based Approach Using WorldView-2 Satellite Imagery
Authors: Ebrahim Taherzadeh, Helmi Z. M. Shafri, Kaveh Shahi
Abstract:
One of the most important tasks in urban remote sensing is the detection of impervious surfaces (IS), such as roofs and roads. However, detection of IS in heterogeneous areas still remains one of the most challenging tasks. In this study, detection of concrete roof using an object-based approach was proposed. A new rule-based classification was developed to detect concrete roof tile. This proposed rule-based classification was applied to WorldView-2 image and results showed that the proposed rule has good potential to predict concrete roof material from WorldView-2 images, with 85% accuracy.
Keywords: Urban remote sensing, impervious surface, Object- Based, Roof Material, Concrete tile, WorldView-2.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 37938300 Video Classification by Partitioned Frequency Spectra of Repeating Movements
Authors: Kahraman Ayyildiz, Stefan Conrad
Abstract:
In this paper we present a system for classifying videos by frequency spectra. Many videos contain activities with repeating movements. Sports videos, home improvement videos, or videos showing mechanical motion are some example areas. Motion of these areas usually repeats with a certain main frequency and several side frequencies. Transforming repeating motion to its frequency domain via FFT reveals these frequencies. Average amplitudes of frequency intervals can be seen as features of cyclic motion. Hence determining these features can help to classify videos with repeating movements. In this paper we explain how to compute frequency spectra for video clips and how to use them for classifying. Our approach utilizes series of image moments as a function. This function again is transformed into its frequency domain.Keywords: action recognition, frequency feature, motion recognition, repeating movement, video classification
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18848299 Glass Bottle Inspector Based on Machine Vision
Authors: Huanjun Liu, Yaonan Wang, Feng Duan
Abstract:
This text studies glass bottle intelligent inspector based machine vision instead of manual inspection. The system structure is illustrated in detail in this paper. The text presents the method based on watershed transform methods to segment the possible defective regions and extract features of bottle wall by rules. Then wavelet transform are used to exact features of bottle finish from images. After extracting features, the fuzzy support vector machine ensemble is putted forward as classifier. For ensuring that the fuzzy support vector machines have good classification ability, the GA based ensemble method is used to combining the several fuzzy support vector machines. The experiments demonstrate that using this inspector to inspect glass bottles, the accuracy rate may reach above 97.5%.Keywords: Intelligent Inspection, Support Vector Machines, Ensemble Methods, watershed transform, Wavelet Transform
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 38958298 An Analysis of Classification of Imbalanced Datasets by Using Synthetic Minority Over-Sampling Technique
Authors: Ghada A. Alfattni
Abstract:
Analysing unbalanced datasets is one of the challenges that practitioners in machine learning field face. However, many researches have been carried out to determine the effectiveness of the use of the synthetic minority over-sampling technique (SMOTE) to address this issue. The aim of this study was therefore to compare the effectiveness of the SMOTE over different models on unbalanced datasets. Three classification models (Logistic Regression, Support Vector Machine and Nearest Neighbour) were tested with multiple datasets, then the same datasets were oversampled by using SMOTE and applied again to the three models to compare the differences in the performances. Results of experiments show that the highest number of nearest neighbours gives lower values of error rates.Keywords: Imbalanced datasets, SMOTE, machine learning, logistic regression, support vector machine, nearest neighbour.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13148297 Estimation Model of Dry Docking Duration Using Data Mining
Authors: Isti Surjandari, Riara Novita
Abstract:
Maintenance is one of the most important activities in the shipyard industry. However, sometimes it is not supported by adequate services from the shipyard, where inaccuracy in estimating the duration of the ship maintenance is still common. This makes estimation of ship maintenance duration is crucial. This study uses Data Mining approach, i.e., CART (Classification and Regression Tree) to estimate the duration of ship maintenance that is limited to dock works or which is known as dry docking. By using the volume of dock works as an input to estimate the maintenance duration, 4 classes of dry docking duration were obtained with different linear model and job criteria for each class. These linear models can then be used to estimate the duration of dry docking based on job criteria.
Keywords: Classification and regression tree (CART), data mining, dry docking, maintenance duration.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24338296 A Neural-Network-Based Fault Diagnosis Approach for Analog Circuits by Using Wavelet Transformation and Fractal Dimension as a Preprocessor
Abstract:
This paper presents a new method of analog fault diagnosis based on back-propagation neural networks (BPNNs) using wavelet decomposition and fractal dimension as preprocessors. The proposed method has the capability to detect and identify faulty components in an analog electronic circuit with tolerance by analyzing its impulse response. Using wavelet decomposition to preprocess the impulse response drastically de-noises the inputs to the neural network. The second preprocessing by fractal dimension can extract unique features, which are the fed to a neural network as inputs for further classification. A comparison of our work with [1] and [6], which also employs back-propagation (BP) neural networks, reveals that our system requires a much smaller network and performs significantly better in fault diagnosis of analog circuits due to our proposed preprocessing techniques.
Keywords: Analog circuits, fault diagnosis, tolerance, wavelettransform, fractal dimension, box dimension.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 22008295 Clustering Multivariate Empiric Characteristic Functions for Multi-Class SVM Classification
Authors: María-Dolores Cubiles-de-la-Vega, Rafael Pino-Mejías, Esther-Lydia Silva-Ramírez
Abstract:
A dissimilarity measure between the empiric characteristic functions of the subsamples associated to the different classes in a multivariate data set is proposed. This measure can be efficiently computed, and it depends on all the cases of each class. It may be used to find groups of similar classes, which could be joined for further analysis, or it could be employed to perform an agglomerative hierarchical cluster analysis of the set of classes. The final tree can serve to build a family of binary classification models, offering an alternative approach to the multi-class SVM problem. We have tested this dendrogram based SVM approach with the oneagainst- one SVM approach over four publicly available data sets, three of them being microarray data. Both performances have been found equivalent, but the first solution requires a smaller number of binary SVM models.Keywords: Cluster Analysis, Empiric Characteristic Function, Multi-class SVM, R.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18778294 Classification of Business Models of Italian Bancassurance by Balance Sheet Indicators
Authors: Andrea Bellucci, Martina Tofi
Abstract:
The aim of paper is to analyze business models of bancassurance in Italy for life business. The life insurance business is very developed in the Italian market and banks branches have 80% of the market share. Given its maturity, the life insurance market needs to consolidate its organizational form to allow for the development of non-life business, which nowadays collects few premiums but represents a great opportunity to enlarge the market share of bancassurance using its strength in the distribution channel while the market share of independent agents is decreasing. Starting with the main business model of bancassurance for life business, this paper will analyze the performances of life companies in the Italian market by balance sheet indicators and by main discriminant variables of business models. The study will observe trends from 2013 to 2015 for the Italian market by exploiting a database managed by Associazione Nazionale delle Imprese di Assicurazione (ANIA). The applied approach is based on a bottom-up analysis starting with variables and indicators to define business models’ classification. The statistical classification algorithm proposed by Ward is employed to design business models’ profiles. Results from the analysis will be a representation of the main business models built by their profile related to indicators. In that way, an unsupervised analysis is developed that has the limit of its judgmental dimension based on research opinion, but it is possible to obtain a design of effective business models.
Keywords: Balance sheet indicators, Bancassurance, business models, ward algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12618293 Classification Based on Deep Neural Cellular Automata Model
Authors: Yasser F. Hassan
Abstract:
Deep learning structure is a branch of machine learning science and greet achievement in research and applications. Cellular neural networks are regarded as array of nonlinear analog processors called cells connected in a way allowing parallel computations. The paper discusses how to use deep learning structure for representing neural cellular automata model. The proposed learning technique in cellular automata model will be examined from structure of deep learning. A deep automata neural cellular system modifies each neuron based on the behavior of the individual and its decision as a result of multi-level deep structure learning. The paper will present the architecture of the model and the results of simulation of approach are given. Results from the implementation enrich deep neural cellular automata system and shed a light on concept formulation of the model and the learning in it.Keywords: Cellular automata, neural cellular automata, deep learning, classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8668292 Rank-Based Chain-Mode Ensemble for Binary Classification
Authors: Chongya Song, Kang Yen, Alexander Pons, Jin Liu
Abstract:
In the field of machine learning, the ensemble has been employed as a common methodology to improve the performance upon multiple base classifiers. However, the true predictions are often canceled out by the false ones during consensus due to a phenomenon called “curse of correlation” which is represented as the strong interferences among the predictions produced by the base classifiers. In addition, the existing practices are still not able to effectively mitigate the problem of imbalanced classification. Based on the analysis on our experiment results, we conclude that the two problems are caused by some inherent deficiencies in the approach of consensus. Therefore, we create an enhanced ensemble algorithm which adopts a designed rank-based chain-mode consensus to overcome the two problems. In order to evaluate the proposed ensemble algorithm, we employ a well-known benchmark data set NSL-KDD (the improved version of dataset KDDCup99 produced by University of New Brunswick) to make comparisons between the proposed and 8 common ensemble algorithms. Particularly, each compared ensemble classifier uses the same 22 base classifiers, so that the differences in terms of the improvements toward the accuracy and reliability upon the base classifiers can be truly revealed. As a result, the proposed rank-based chain-mode consensus is proved to be a more effective ensemble solution than the traditional consensus approach, which outperforms the 8 ensemble algorithms by 20% on almost all compared metrices which include accuracy, precision, recall, F1-score and area under receiver operating characteristic curve.
Keywords: Consensus, curse of correlation, imbalanced classification, rank-based chain-mode ensemble.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7348291 An Optimal Feature Subset Selection for Leaf Analysis
Authors: N. Valliammal, S.N. Geethalakshmi
Abstract:
This paper describes an optimal approach for feature subset selection to classify the leaves based on Genetic Algorithm (GA) and Kernel Based Principle Component Analysis (KPCA). Due to high complexity in the selection of the optimal features, the classification has become a critical task to analyse the leaf image data. Initially the shape, texture and colour features are extracted from the leaf images. These extracted features are optimized through the separate functioning of GA and KPCA. This approach performs an intersection operation over the subsets obtained from the optimization process. Finally, the most common matching subset is forwarded to train the Support Vector Machine (SVM). Our experimental results successfully prove that the application of GA and KPCA for feature subset selection using SVM as a classifier is computationally effective and improves the accuracy of the classifier.Keywords: Optimization, Feature extraction, Feature subset, Classification, GA, KPCA, SVM and Computation
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 22418290 Determination of Water Pollution and Water Quality with Decision Trees
Authors: Çiğdem Bakır, Mecit Yüzkat
Abstract:
With the increasing emphasis on water quality worldwide, the search for and expanding the market for new and intelligent monitoring systems has increased. The current method is the laboratory process, where samples are taken from bodies of water, and tests are carried out in laboratories. This method is time-consuming, a waste of manpower and uneconomical. To solve this problem, we used machine learning methods to detect water pollution in our study. We created decision trees with the Orange3 software used in the study and tried to determine all the factors that cause water pollution. An automatic prediction model based on water quality was developed by taking many model inputs such as water temperature, pH, transparency, conductivity, dissolved oxygen, and ammonia nitrogen with machine learning methods. The proposed approach consists of three stages: Preprocessing of the data used, feature detection and classification. We tried to determine the success of our study with different accuracy metrics and the results were presented comparatively. In addition, we achieved approximately 98% success with the decision tree.
Keywords: Decision tree, water quality, water pollution, machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2608289 Unified Method to Block Pornographic Images in Websites
Authors: Sakthi Priya Balaji R., Vijayendar G.
Abstract:
This paper proposes a technique to block adult images displayed in websites. The filter is designed so as to perform even in exceptional cases such as, where face detection is not possible or improper face visibility. This is achieved by using an alternative phase to extract the MFC (Most Frequent Color) from the Human Body regions estimated using a biometric of anthropometric distances between fixed rigidly connected body locations. The logical results generated can be protected from overriding by a firewall or intrusion, by encrypting the result in a SSH data packet.
Keywords: Face detection, characteristics extraction andclassification, Component based shape analysis and classification, open source SSH V2 protocol
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13968288 Error-Robust Nature of Genome Profiling Applied for Clustering of Species Demonstrated by Computer Simulation
Authors: Shamim Ahmed Koichi Nishigaki
Abstract:
Genome profiling (GP), a genotype based technology, which exploits random PCR and temperature gradient gel electrophoresis, has been successful in identification/classification of organisms. In this technology, spiddos (Species identification dots) and PaSS (Pattern similarity score) were employed for measuring the closeness (or distance) between genomes. Based on the closeness (PaSS), we can buildup phylogenetic trees of the organisms. We noticed that the topology of the tree is rather robust against the experimental fluctuation conveyed by spiddos. This fact was confirmed quantitatively in this study by computer-simulation, providing the limit of the reliability of this highly powerful methodology. As a result, we could demonstrate the effectiveness of the GP approach for identification/classification of organisms.
Keywords: Fluctuation, Genome profiling (GP), Pattern similarity score (PaSS), Robustness, Spiddos-shift.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15398287 Emotion Recognition Using Neural Network: A Comparative Study
Authors: Nermine Ahmed Hendy, Hania Farag
Abstract:
Emotion recognition is an important research field that finds lots of applications nowadays. This work emphasizes on recognizing different emotions from speech signal. The extracted features are related to statistics of pitch, formants, and energy contours, as well as spectral, perceptual and temporal features, jitter, and shimmer. The Artificial Neural Networks (ANN) was chosen as the classifier. Working on finding a robust and fast ANN classifier suitable for different real life application is our concern. Several experiments were carried out on different ANN to investigate the different factors that impact the classification success rate. Using a database containing 7 different emotions, it will be shown that with a proper and careful adjustment of features format, training data sorting, number of features selected and even the ANN type and architecture used, a success rate of 85% or even more can be achieved without increasing the system complicity and the computation time
Keywords: Classification, emotion recognition, features extraction, feature selection, neural network
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 46988286 A Study on the Application of Machine Learning and Deep Learning Techniques for Skin Cancer Detection
Authors: Hritwik Ghosh, Irfan Sadiq Rahat, Sachi Nandan Mohanty, J. V. R. Ravindra, Abdus Sobur
Abstract:
In the rapidly evolving landscape of medical diagnostics, the early detection and accurate classification of skin cancer remain paramount for effective treatment outcomes. This research delves into the transformative potential of artificial intelligence (AI), specifically deep learning (DL), as a tool for discerning and categorizing various skin conditions. Utilizing a diverse dataset of 3,000 images, representing nine distinct skin conditions, we confront the inherent challenge of class imbalance. This imbalance, where conditions like melanomas are over-represented, is addressed by incorporating class weights during the model training phase, ensuring an equitable representation of all conditions in the learning process. Our approach presents a hybrid model, amalgamating the strengths of two renowned convolutional neural networks (CNNs), VGG16 and ResNet50. These networks, pre-trained on the ImageNet dataset, are adept at extracting intricate features from images. By synergizing these models, our research aims to capture a holistic set of features, thereby bolstering classification performance. Preliminary findings underscore the hybrid model's superiority over individual models, showcasing its prowess in feature extraction and classification. Moreover, the research emphasizes the significance of rigorous data pre-processing, including image resizing, color normalization, and segmentation, in ensuring data quality and model reliability. In essence, this study illuminates the promising role of AI and DL in revolutionizing skin cancer diagnostics, offering insights into its potential applications in broader medical domains.
Keywords: Artificial intelligence, machine learning, deep learning, skin cancer, dermatology, convolutional neural networks, image classification, computer vision, healthcare technology, cancer detection, medical imaging.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14448285 Attention Multiple Instance Learning for Cancer Tissue Classification in Digital Histopathology Images
Authors: Afaf Alharbi, Qianni Zhang
Abstract:
The identification of malignant tissue in histopathological slides holds significant importance in both clinical settings and pathology research. This paper presents a methodology aimed at automatically categorizing cancerous tissue through the utilization of a multiple instance learning framework. This framework is specifically developed to acquire knowledge of the Bernoulli distribution of the bag label probability by employing neural networks. Furthermore, we put forward a neural network-based permutation-invariant aggregation operator, equivalent to attention mechanisms, which is applied to the multi-instance learning network. Through empirical evaluation on an openly available colon cancer histopathology dataset, we provide evidence that our approach surpasses various conventional deep learning methods.
Keywords: Attention Multiple Instance Learning, Multiple Instance Learning, transfer learning, histopathological slides, cancer tissue classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2218284 On The Analysis of a Compound Neural Network for Detecting Atrio Ventricular Heart Block (AVB) in an ECG Signal
Authors: Salama Meghriche, Amer Draa, Mohammed Boulemden
Abstract:
Heart failure is the most common reason of death nowadays, but if the medical help is given directly, the patient-s life may be saved in many cases. Numerous heart diseases can be detected by means of analyzing electrocardiograms (ECG). Artificial Neural Networks (ANN) are computer-based expert systems that have proved to be useful in pattern recognition tasks. ANN can be used in different phases of the decision-making process, from classification to diagnostic procedures. This work concentrates on a review followed by a novel method. The purpose of the review is to assess the evidence of healthcare benefits involving the application of artificial neural networks to the clinical functions of diagnosis, prognosis and survival analysis, in ECG signals. The developed method is based on a compound neural network (CNN), to classify ECGs as normal or carrying an AtrioVentricular heart Block (AVB). This method uses three different feed forward multilayer neural networks. A single output unit encodes the probability of AVB occurrences. A value between 0 and 0.1 is the desired output for a normal ECG; a value between 0.1 and 1 would infer an occurrence of an AVB. The results show that this compound network has a good performance in detecting AVBs, with a sensitivity of 90.7% and a specificity of 86.05%. The accuracy value is 87.9%.Keywords: Artificial neural networks, Electrocardiogram(ECG), Feed forward multilayer neural network, Medical diagnosis, Pattern recognitionm, Signal processing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24728283 MIM: A Species Independent Approach for Classifying Coding and Non-Coding DNA Sequences in Bacterial and Archaeal Genomes
Authors: Achraf El Allali, John R. Rose
Abstract:
A number of competing methodologies have been developed to identify genes and classify DNA sequences into coding and non-coding sequences. This classification process is fundamental in gene finding and gene annotation tools and is one of the most challenging tasks in bioinformatics and computational biology. An information theory measure based on mutual information has shown good accuracy in classifying DNA sequences into coding and noncoding. In this paper we describe a species independent iterative approach that distinguishes coding from non-coding sequences using the mutual information measure (MIM). A set of sixty prokaryotes is used to extract universal training data. To facilitate comparisons with the published results of other researchers, a test set of 51 bacterial and archaeal genomes was used to evaluate MIM. These results demonstrate that MIM produces superior results while remaining species independent.Keywords: Coding Non-coding Classification, Entropy, GeneRecognition, Mutual Information.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17278282 Fuzzy Rules Generation and Extraction from Support Vector Machine Based on Kernel Function Firing Signals
Authors: Prasan Pitiranggon, Nunthika Benjathepanun, Somsri Banditvilai, Veera Boonjing
Abstract:
Our study proposes an alternative method in building Fuzzy Rule-Based System (FRB) from Support Vector Machine (SVM). The first set of fuzzy IF-THEN rules is obtained through an equivalence of the SVM decision network and the zero-ordered Sugeno FRB type of the Adaptive Network Fuzzy Inference System (ANFIS). The second set of rules is generated by combining the first set based on strength of firing signals of support vectors using Gaussian kernel. The final set of rules is then obtained from the second set through input scatter partitioning. A distinctive advantage of our method is the guarantee that the number of final fuzzy IFTHEN rules is not more than the number of support vectors in the trained SVM. The final FRB system obtained is capable of performing classification with results comparable to its SVM counterpart, but it has an advantage over the black-boxed SVM in that it may reveal human comprehensible patterns.Keywords: Fuzzy Rule Base, Rule Extraction, Rule Generation, Support Vector Machine.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19028281 Probabilistic Crash Prediction and Prevention of Vehicle Crash
Authors: Lavanya Annadi, Fahimeh Jafari
Abstract:
Transportation brings immense benefits to society, but it also has its costs. Costs include the cost of infrastructure, personnel, and equipment, but also the loss of life and property in traffic accidents on the road, delays in travel due to traffic congestion, and various indirect costs in terms of air transport. This research aims to predict the probabilistic crash prediction of vehicles using Machine Learning due to natural and structural reasons by excluding spontaneous reasons, like overspeeding, etc., in the United States. These factors range from meteorological elements such as weather conditions, precipitation, visibility, wind speed, wind direction, temperature, pressure, and humidity, to human-made structures, like road structure components such as Bumps, Roundabouts, No Exit, Turning Loops, Give Away, etc. The probabilities are categorized into ten distinct classes. All the predictions are based on multiclass classification techniques, which are supervised learning. This study considers all crashes in all states collected by the US government. The probability of the crash was determined by employing Multinomial Expected Value, and a classification label was assigned accordingly. We applied three classification models, including multiclass Logistic Regression, Random Forest and XGBoost. The numerical results show that XGBoost achieved a 75.2% accuracy rate which indicates the part that is being played by natural and structural reasons for the crash. The paper has provided in-depth insights through exploratory data analysis.
Keywords: Road safety, crash prediction, exploratory analysis, machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 838280 Investigating Activity Recognition Using 9-Axis Sensors and Filters in Wearable Devices
Authors: Jun Gil Ahn, Jong Kang Park, Jong Tae Kim
Abstract:
In this paper, we analyze major components of activity recognition (AR) in wearable device with 9-axis sensors and sensor fusion filters. 9-axis sensors commonly include 3-axis accelerometer, 3-axis gyroscope and 3-axis magnetometer. We chose sensor fusion filters as Kalman filter and Direction Cosine Matrix (DCM) filter. We also construct sensor fusion data from each activity sensor data and perform classification by accuracy of AR using Naïve Bayes and SVM. According to the classification results, we observed that the DCM filter and the specific combination of the sensing axes are more effective for AR in wearable devices while classifying walking, running, ascending and descending.Keywords: Accelerometer, activity recognition, directional cosine matrix filter, gyroscope, Kalman filter, magnetometer.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16748279 A Taxonomy of Routing Protocols in Wireless Sensor Networks
Authors: A. Kardi, R. Zagrouba, M. Alqahtani
Abstract:
The Internet of Everything (IoE) presents today a very attractive and motivating field of research. It is basically based on Wireless Sensor Networks (WSNs) in which the routing task is the major analysis topic. In fact, it directly affects the effectiveness and the lifetime of the network. This paper, developed from recent works and based on extensive researches, proposes a taxonomy of routing protocols in WSNs. Our main contribution is that we propose a classification model based on nine classes namely application type, delivery mode, initiator of communication, network architecture, path establishment (route discovery), network topology (structure), protocol operation, next hop selection and latency-awareness and energy-efficient routing protocols. In order to provide a total classification pattern to serve as reference for network designers, each class is subdivided into possible subclasses, presented, and discussed using different parameters such as purposes and characteristics.
Keywords: WSNs, sensor, routing protocols, survey.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10408278 Evaluation of Ensemble Classifiers for Intrusion Detection
Authors: M. Govindarajan
Abstract:
One of the major developments in machine learning in the past decade is the ensemble method, which finds highly accurate classifier by combining many moderately accurate component classifiers. In this research work, new ensemble classification methods are proposed with homogeneous ensemble classifier using bagging and heterogeneous ensemble classifier using arcing and their performances are analyzed in terms of accuracy. A Classifier ensemble is designed using Radial Basis Function (RBF) and Support Vector Machine (SVM) as base classifiers. The feasibility and the benefits of the proposed approaches are demonstrated by the means of standard datasets of intrusion detection. The main originality of the proposed approach is based on three main parts: preprocessing phase, classification phase, and combining phase. A wide range of comparative experiments is conducted for standard datasets of intrusion detection. The performance of the proposed homogeneous and heterogeneous ensemble classifiers are compared to the performance of other standard homogeneous and heterogeneous ensemble methods. The standard homogeneous ensemble methods include Error correcting output codes, Dagging and heterogeneous ensemble methods include majority voting, stacking. The proposed ensemble methods provide significant improvement of accuracy compared to individual classifiers and the proposed bagged RBF and SVM performs significantly better than ECOC and Dagging and the proposed hybrid RBF-SVM performs significantly better than voting and stacking. Also heterogeneous models exhibit better results than homogeneous models for standard datasets of intrusion detection.Keywords: Data mining, ensemble, radial basis function, support vector machine, accuracy.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17008277 Day/Night Detector for Vehicle Tracking in Traffic Monitoring Systems
Authors: M. Taha, Hala H. Zayed, T. Nazmy, M. Khalifa
Abstract:
Recently, traffic monitoring has attracted the attention of computer vision researchers. Many algorithms have been developed to detect and track moving vehicles. In fact, vehicle tracking in daytime and in nighttime cannot be approached with the same techniques, due to the extreme different illumination conditions. Consequently, traffic-monitoring systems are in need of having a component to differentiate between daytime and nighttime scenes. In this paper, a HSV-based day/night detector is proposed for traffic monitoring scenes. The detector employs the hue-histogram and the value-histogram on the top half of the image frame. Experimental results show that the extraction of the brightness features along with the color features within the top region of the image is effective for classifying traffic scenes. In addition, the detector achieves high precision and recall rates along with it is feasible for real time applications.Keywords: Day/night detector, daytime/nighttime classification, image classification, vehicle tracking, traffic monitoring.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4508