Search results for: DNA sequences classification
2472 A Nonlinear Feature Selection Method for Hyperspectral Image Classification
Authors: Pei-Jyun Hsieh, Cheng-Hsuan Li, Bor-Chen Kuo
Abstract:
For hyperspectral image classification, feature reduction is an important pre-processing for avoiding the Hughes phenomena due to the difficulty for collecting training samples. Hence, lots of researches developed feature selection methods such as F-score, HSIC (Hilbert-Schmidt Independence Criterion), and etc., to improve hyperspectral image classification. However, most of them only consider the class separability in the original space, i.e., a linear class separability. In this study, we proposed a nonlinear class separability measure based on kernel trick for selecting an appropriate feature subset. The proposed nonlinear class separability was formed by a generalized RBF kernel with different bandwidths with respect to different features. Moreover, it considered the within-class separability and the between-class separability. A genetic algorithm was applied to tune these bandwidths such that the smallest with-class separability and the largest between-class separability simultaneously. This indicates the corresponding feature space is more suitable for classification. In addition, the corresponding nonlinear classification boundary can separate classes very well. These optimal bandwidths also show the importance of bands for hyperspectral image classification. The reciprocals of these bandwidths can be viewed as weights of bands. The smaller bandwidth, the larger weight of the band, and the more importance for classification. Hence, the descending order of the reciprocals of the bands gives an order for selecting the appropriate feature subsets. In the experiments, three hyperspectral image data sets, the Indian Pine Site data set, the PAVIA data set, and the Salinas A data set, were used to demonstrate the selected feature subsets by the proposed nonlinear feature selection method are more appropriate for hyperspectral image classification. Only ten percent of samples were randomly selected to form the training dataset. All non-background samples were used to form the testing dataset. The support vector machine was applied to classify these testing samples based on selected feature subsets. According to the experiments on the Indian Pine Site data set with 220 bands, the highest accuracies by applying the proposed method, F-score, and HSIC are 0.8795, 0.8795, and 0.87404, respectively. However, the proposed method selects 158 features. F-score and HSIC select 168 features and 217 features, respectively. Moreover, the classification accuracies increase dramatically only using first few features. The classification accuracies with respect to feature subsets of 10 features, 20 features, 50 features, and 110 features are 0.69587, 0.7348, 0.79217, and 0.84164, respectively. Furthermore, only using half selected features (110 features) of the proposed method, the corresponding classification accuracy (0.84168) is approximate to the highest classification accuracy, 0.8795. For other two hyperspectral image data sets, the PAVIA data set and Salinas A data set, we can obtain the similar results. These results illustrate our proposed method can efficiently find feature subsets to improve hyperspectral image classification. One can apply the proposed method to determine the suitable feature subset first according to specific purposes. Then researchers can only use the corresponding sensors to obtain the hyperspectral image and classify the samples. This can not only improve the classification performance but also reduce the cost for obtaining hyperspectral images.Keywords: hyperspectral image classification, nonlinear feature selection, kernel trick, support vector machine
Procedia PDF Downloads 2632471 Personal Information Classification Based on Deep Learning in Automatic Form Filling System
Authors: Shunzuo Wu, Xudong Luo, Yuanxiu Liao
Abstract:
Recently, the rapid development of deep learning makes artificial intelligence (AI) penetrate into many fields, replacing manual work there. In particular, AI systems also become a research focus in the field of automatic office. To meet real needs in automatic officiating, in this paper we develop an automatic form filling system. Specifically, it uses two classical neural network models and several word embedding models to classify various relevant information elicited from the Internet. When training the neural network models, we use less noisy and balanced data for training. We conduct a series of experiments to test my systems and the results show that our system can achieve better classification results.Keywords: artificial intelligence and office, NLP, deep learning, text classification
Procedia PDF Downloads 2002470 Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine
Authors: Bingchun Liu, Pei-Chann Chang, Natasha Huang, Dun Li
Abstract:
Machine Learning and Data Mining are the two important tools for extracting useful information and knowledge from large datasets. In machine learning, classification is a wildly used technique to predict qualitative variables and is generally preferred over regression from an operational point of view. Due to the enormous increase in air pollution in various countries especially China, Air Quality Classification has become one of the most important topics in air quality research and modelling. This study aims at introducing a hybrid classification model based on information theory and Support Vector Machine (SVM) using the air quality data of four cities in China namely Beijing, Guangzhou, Shanghai and Tianjin from Jan 1, 2014 to April 30, 2016. China's Ministry of Environmental Protection has classified the daily air quality into 6 levels namely Serious Pollution, Severe Pollution, Moderate Pollution, Light Pollution, Good and Excellent based on their respective Air Quality Index (AQI) values. Using the information theory, information gain (IG) is calculated and feature selection is done for both categorical features and continuous numeric features. Then SVM Machine Learning algorithm is implemented on the selected features with cross-validation. The final evaluation reveals that the IG and SVM hybrid model performs better than SVM (alone), Artificial Neural Network (ANN) and K-Nearest Neighbours (KNN) models in terms of accuracy as well as complexity.Keywords: machine learning, air quality classification, air quality index, information gain, support vector machine, cross-validation
Procedia PDF Downloads 2352469 Auto Classification of Multiple ECG Arrhythmic Detection via Machine Learning Techniques: A Review
Authors: Ng Liang Shen, Hau Yuan Wen
Abstract:
Arrhythmia analysis of ECG signal plays a major role in diagnosing most of the cardiac diseases. Therefore, a single arrhythmia detection of an electrocardiographic (ECG) record can determine multiple pattern of various algorithms and match accordingly each ECG beats based on Machine Learning supervised learning. These researchers used different features and classification methods to classify different arrhythmia types. A major problem in these studies is the fact that the symptoms of the disease do not show all the time in the ECG record. Hence, a successful diagnosis might require the manual investigation of several hours of ECG records. The point of this paper presents investigations cardiovascular ailment in Electrocardiogram (ECG) Signals for Cardiac Arrhythmia utilizing examination of ECG irregular wave frames via heart beat as correspond arrhythmia which with Machine Learning Pattern Recognition.Keywords: electrocardiogram, ECG, classification, machine learning, pattern recognition, detection, QRS
Procedia PDF Downloads 3762468 Land Use/Land Cover Mapping Using Landsat 8 and Sentinel-2 in a Mediterranean Landscape
Authors: Moschos Vogiatzis, K. Perakis
Abstract:
Spatial-explicit and up-to-date land use/land cover information is fundamental for spatial planning, land management, sustainable development, and sound decision-making. In the last decade, many satellite-derived land cover products at different spatial, spectral, and temporal resolutions have been developed, such as the European Copernicus Land Cover product. However, more efficient and detailed information for land use/land cover is required at the regional or local scale. A typical Mediterranean basin with a complex landscape comprised of various forest types, crops, artificial surfaces, and wetlands was selected to test and develop our approach. In this study, we investigate the improvement of Copernicus Land Cover product (CLC2018) using Landsat 8 and Sentinel-2 pixel-based classification based on all available existing geospatial data (Forest Maps, LPIS, Natura2000 habitats, cadastral parcels, etc.). We examined and compared the performance of the Random Forest classifier for land use/land cover mapping. In total, 10 land use/land cover categories were recognized in Landsat 8 and 11 in Sentinel-2A. A comparison of the overall classification accuracies for 2018 shows that Landsat 8 classification accuracy was slightly higher than Sentinel-2A (82,99% vs. 80,30%). We concluded that the main land use/land cover types of CLC2018, even within a heterogeneous area, can be successfully mapped and updated according to CLC nomenclature. Future research should be oriented toward integrating spatiotemporal information from seasonal bands and spectral indexes in the classification process.Keywords: classification, land use/land cover, mapping, random forest
Procedia PDF Downloads 1252467 Terrain Classification for Ground Robots Based on Acoustic Features
Authors: Bernd Kiefer, Abraham Gebru Tesfay, Dietrich Klakow
Abstract:
The motivation of our work is to detect different terrain types traversed by a robot based on acoustic data from the robot-terrain interaction. Different acoustic features and classifiers were investigated, such as Mel-frequency cepstral coefficient and Gamma-tone frequency cepstral coefficient for the feature extraction, and Gaussian mixture model and Feed forward neural network for the classification. We analyze the system’s performance by comparing our proposed techniques with some other features surveyed from distinct related works. We achieve precision and recall values between 87% and 100% per class, and an average accuracy at 95.2%. We also study the effect of varying audio chunk size in the application phase of the models and find only a mild impact on performance.Keywords: acoustic features, autonomous robots, feature extraction, terrain classification
Procedia PDF Downloads 3682466 The Implementation of the Multi-Agent Classification System (MACS) in Compliance with FIPA Specifications
Authors: Mohamed R. Mhereeg
Abstract:
The paper discusses the implementation of the MultiAgent classification System (MACS) and utilizing it to provide an automated and accurate classification of end users developing applications in the spreadsheet domain. However, different technologies have been brought together to build MACS. The strength of the system is the integration of the agent technology with the FIPA specifications together with other technologies, which are the .NET widows service based agents, the Windows Communication Foundation (WCF) services, the Service Oriented Architecture (SOA), and Oracle Data Mining (ODM). Microsoft's .NET windows service based agents were utilized to develop the monitoring agents of MACS, the .NET WCF services together with SOA approach allowed the distribution and communication between agents over the WWW. The Monitoring Agents (MAs) were configured to execute automatically to monitor excel spreadsheets development activities by content. Data gathered by the Monitoring Agents from various resources over a period of time was collected and filtered by a Database Updater Agent (DUA) residing in the .NET client application of the system. This agent then transfers and stores the data in Oracle server database via Oracle stored procedures for further processing that leads to the classification of the end user developers.Keywords: MACS, implementation, multi-agent, SOA, autonomous, WCF
Procedia PDF Downloads 2732465 A Text Classification Approach Based on Natural Language Processing and Machine Learning Techniques
Authors: Rim Messaoudi, Nogaye-Gueye Gning, François Azelart
Abstract:
Automatic text classification applies mostly natural language processing (NLP) and other AI-guided techniques to automatically classify text in a faster and more accurate manner. This paper discusses the subject of using predictive maintenance to manage incident tickets inside the sociality. It focuses on proposing a tool that treats and analyses comments and notes written by administrators after resolving an incident ticket. The goal here is to increase the quality of these comments. Additionally, this tool is based on NLP and machine learning techniques to realize the textual analytics of the extracted data. This approach was tested using real data taken from the French National Railways (SNCF) company and was given a high-quality result.Keywords: machine learning, text classification, NLP techniques, semantic representation
Procedia PDF Downloads 1002464 Wolof Voice Response Recognition System: A Deep Learning Model for Wolof Audio Classification
Authors: Krishna Mohan Bathula, Fatou Bintou Loucoubar, FNU Kaleemunnisa, Christelle Scharff, Mark Anthony De Castro
Abstract:
Voice recognition algorithms such as automatic speech recognition and text-to-speech systems with African languages can play an important role in bridging the digital divide of Artificial Intelligence in Africa, contributing to the establishment of a fully inclusive information society. This paper proposes a Deep Learning model that can classify the user responses as inputs for an interactive voice response system. A dataset with Wolof language words ‘yes’ and ‘no’ is collected as audio recordings. A two stage Data Augmentation approach is adopted for enhancing the dataset size required by the deep neural network. Data preprocessing and feature engineering with Mel-Frequency Cepstral Coefficients are implemented. Convolutional Neural Networks (CNNs) have proven to be very powerful in image classification and are promising for audio processing when sounds are transformed into spectra. For performing voice response classification, the recordings are transformed into sound frequency feature spectra and then applied image classification methodology using a deep CNN model. The inference model of this trained and reusable Wolof voice response recognition system can be integrated with many applications associated with both web and mobile platforms.Keywords: automatic speech recognition, interactive voice response, voice response recognition, wolof word classification
Procedia PDF Downloads 1162463 A Deep Learning Approach to Subsection Identification in Electronic Health Records
Authors: Nitin Shravan, Sudarsun Santhiappan, B. Sivaselvan
Abstract:
Subsection identification, in the context of Electronic Health Records (EHRs), is identifying the important sections for down-stream tasks like auto-coding. In this work, we classify the text present in EHRs according to their information, using machine learning and deep learning techniques. We initially describe briefly about the problem and formulate it as a text classification problem. Then, we discuss upon the methods from the literature. We try two approaches - traditional feature extraction based machine learning methods and deep learning methods. Through experiments on a private dataset, we establish that the deep learning methods perform better than the feature extraction based Machine Learning Models.Keywords: deep learning, machine learning, semantic clinical classification, subsection identification, text classification
Procedia PDF Downloads 2172462 Genomic and Evolutionary Diversity of Long Terminal Repeat (LTR) Retrotransposons in Date Palm (Phoenix dactylifera)
Authors: Faisal Nouroz, Mukaramin Mukaramin
Abstract:
Of the transposable elements (TEs), the retrotransposons are the most copious elements identified from many sequenced genomes. They have played a major role in genome evolution, rearrangement, and expansions based on their copy and paste mode of proliferation. They are further divided into LTR and Non-LTR retrotransposons. The purpose of the current study was to identify the LTR REs in sequenced Phoenix dactylifera genome and to study their structural diversity. A total of 150 P. dactylifera BAC sequences with > 60kb sizes were randomly retrieved from National Center for Biotechnology Information (NCBI) database and screened for the presence of LTR retrotransposons. Seven bacterial artificial chromosomes (BAC) sequences showed full-length LTR Retrotransposons with 4 Copia and 3 Gypsy families having variable copy numbers in respective families. Reverse transcriptase (RT) domain was found as the most conserved domain among Copia and Gypsy superfamilies and was used to deduce evolutionary analysis. The amino acid residues among various RT sequences showed variability in their percentages indicating post divergence evolution. Amino acid Leucine was found in highest proportions followed by Lysine, while Methionine and Tryptophan were in lowest percentages. The phylogenetic analysis based on RT domains confirmed that although having most conserved RT regions, several evolutionary events occurred causing nucleotide polymorphisms and hence clustering of Gypsy and Copia superfamilies into their respective lineages. The study will be helpful in identification and annotation of these elements in other species and genera and their distribution patterns on chromosomes by fluorescent in situ hybridization techniques.Keywords: transposable elements, Phoenix dactylifera, retrotransposons, phylogenetic analysis
Procedia PDF Downloads 1282461 Comparative Analysis of Spectral Estimation Methods for Brain-Computer Interfaces
Authors: Rafik Djemili, Hocine Bourouba, M. C. Amara Korba
Abstract:
In this paper, we present a method in order to classify EEG signals for Brain-Computer Interfaces (BCI). EEG signals are first processed by means of spectral estimation methods to derive reliable features before classification step. Spectral estimation methods used are standard periodogram and the periodogram calculated by the Welch method; both methods are compared with Logarithm of Band Power (logBP) features. In the method proposed, we apply Linear Discriminant Analysis (LDA) followed by Support Vector Machine (SVM). Classification accuracy reached could be as high as 85%, which proves the effectiveness of classification of EEG signals based BCI using spectral methods.Keywords: brain-computer interface, motor imagery, electroencephalogram, linear discriminant analysis, support vector machine
Procedia PDF Downloads 4992460 Optimizing Perennial Plants Image Classification by Fine-Tuning Deep Neural Networks
Authors: Khairani Binti Supyan, Fatimah Khalid, Mas Rina Mustaffa, Azreen Bin Azman, Amirul Azuani Romle
Abstract:
Perennial plant classification plays a significant role in various agricultural and environmental applications, assisting in plant identification, disease detection, and biodiversity monitoring. Nevertheless, attaining high accuracy in perennial plant image classification remains challenging due to the complex variations in plant appearance, the diverse range of environmental conditions under which images are captured, and the inherent variability in image quality stemming from various factors such as lighting conditions, camera settings, and focus. This paper proposes an adaptation approach to optimize perennial plant image classification by fine-tuning the pre-trained DNNs model. This paper explores the efficacy of fine-tuning prevalent architectures, namely VGG16, ResNet50, and InceptionV3, leveraging transfer learning to tailor the models to the specific characteristics of perennial plant datasets. A subset of the MYLPHerbs dataset consisted of 6 perennial plant species of 13481 images under various environmental conditions that were used in the experiments. Different strategies for fine-tuning, including adjusting learning rates, training set sizes, data augmentation, and architectural modifications, were investigated. The experimental outcomes underscore the effectiveness of fine-tuning deep neural networks for perennial plant image classification, with ResNet50 showcasing the highest accuracy of 99.78%. Despite ResNet50's superior performance, both VGG16 and InceptionV3 achieved commendable accuracy of 99.67% and 99.37%, respectively. The overall outcomes reaffirm the robustness of the fine-tuning approach across different deep neural network architectures, offering insights into strategies for optimizing model performance in the domain of perennial plant image classification.Keywords: perennial plants, image classification, deep neural networks, fine-tuning, transfer learning, VGG16, ResNet50, InceptionV3
Procedia PDF Downloads 642459 Obstacle Classification Method Based on 2D LIDAR Database
Authors: Moohyun Lee, Soojung Hur, Yongwan Park
Abstract:
In this paper is proposed a method uses only LIDAR system to classification an obstacle and determine its type by establishing database for classifying obstacles based on LIDAR. The existing LIDAR system, in determining the recognition of obstruction in an autonomous vehicle, has an advantage in terms of accuracy and shorter recognition time. However, it was difficult to determine the type of obstacle and therefore accurate path planning based on the type of obstacle was not possible. In order to overcome this problem, a method of classifying obstacle type based on existing LIDAR and using the width of obstacle materials was proposed. However, width measurement was not sufficient to improve accuracy. In this research, the width data was used to do the first classification; database for LIDAR intensity data by four major obstacle materials on the road were created; comparison is made to the LIDAR intensity data of actual obstacle materials; and determine the obstacle type by finding the one with highest similarity values. An experiment using an actual autonomous vehicle under real environment shows that data declined in quality in comparison to 3D LIDAR and it was possible to classify obstacle materials using 2D LIDAR.Keywords: obstacle, classification, database, LIDAR, segmentation, intensity
Procedia PDF Downloads 3492458 Performance Analysis with the Combination of Visualization and Classification Technique for Medical Chatbot
Authors: Shajida M., Sakthiyadharshini N. P., Kamalesh S., Aswitha B.
Abstract:
Natural Language Processing (NLP) continues to play a strategic part in complaint discovery and medicine discovery during the current epidemic. This abstract provides an overview of performance analysis with a combination of visualization and classification techniques of NLP for a medical chatbot. Sentiment analysis is an important aspect of NLP that is used to determine the emotional tone behind a piece of text. This technique has been applied to various domains, including medical chatbots. In this, we have compared the combination of the decision tree with heatmap and Naïve Bayes with Word Cloud. The performance of the chatbot was evaluated using accuracy, and the results indicate that the combination of visualization and classification techniques significantly improves the chatbot's performance.Keywords: sentimental analysis, NLP, medical chatbot, decision tree, heatmap, naïve bayes, word cloud
Procedia PDF Downloads 722457 Metamorphic Computer Virus Classification Using Hidden Markov Model
Authors: Babak Bashari Rad
Abstract:
A metamorphic computer virus uses different code transformation techniques to mutate its body in duplicated instances. Characteristics and function of new instances are mostly similar to their parents, but they cannot be easily detected by the majority of antivirus in market, as they depend on string signature-based detection techniques. The purpose of this research is to propose a Hidden Markov Model for classification of metamorphic viruses in executable files. In the proposed solution, portable executable files are inspected to extract the instructions opcodes needed for the examination of code. A Hidden Markov Model trained on portable executable files is employed to classify the metamorphic viruses of the same family. The proposed model is able to generate and recognize common statistical features of mutated code. The model has been evaluated by examining the model on a test data set. The performance of the model has been practically tested and evaluated based on False Positive Rate, Detection Rate and Overall Accuracy. The result showed an acceptable performance with high average of 99.7% Detection Rate.Keywords: malware classification, computer virus classification, metamorphic virus, metamorphic malware, Hidden Markov Model
Procedia PDF Downloads 3152456 Assessment on Rumen Microbial Diversity of Bali Cattle Using 16S rRNA Sequencing
Authors: Asmuddin Natsir, A. Mujnisa, Syahriani Syahrir, Marhamah Nadir, Nurul Purnomo
Abstract:
Bacteria, protozoa, Archaea, and fungi are the dominant microorganisms found in the rumen ecosystem that has an important role in converting feed ingredients into components that can be digested and utilized by the livestock host. This study was conducted to assess the diversity of rumen bacteria of bali cattle raised under traditional farming condition. Three adult bali cattle were used in this experiment. The rumen fluid samples from the three experimental animals were obtained by the Stomach Tube method before the morning feeding. The results of study indicated that the Illumina sequencing was successful in identifying 301,589 sequences, averaging 100,533 sequences, from three rumen fluid samples of three cattle. Furthermore, based on the SILVA taxonomic database, there were 19 kinds of phyla that had been successfully identified. Of the 19 phyla, there were only two dominant groups across the three samples, namely Bacteroidetes and Firmicutes, with an average percentage of 83.68% and 13.43%, respectively. Other groups such as Synergistetes, Spirochaetae, Planctomycetes can also be identified but in relatively small percentage. At the genus level, there were 157 sequences obtained from all three samples. Of this number, the most dominant group was Prevotella 1 with a percentage of 71.82% followed by 6.94% of Christencenellaceae R-7 group. Other groups such as Prevotellaceae UCG-001, Ruminococcaceae NK4A214 group, Sphaerochaeta, Ruminococcus 2, Rikenellaceae RC9 gut group, Quinella were also identified but with very low percentages. The sequencing results were able to detect the presence of 3.06% and 3.92% respectively for uncultured rumen bacterium and uncultured bacterium. In conclusion, the results of this experiment can provide an opportunity for a better understanding of the rumen bacterial diversity of the bali cattle raised under traditional farming condition and insight regarding the uncultured rumen bacterium and uncultured bacterium that need to be further explored.Keywords: 16S rRNA sequencing, bali cattle, rumen microbial diversity, uncultured rumen bacterium
Procedia PDF Downloads 3362455 Road Vehicle Recognition Using Magnetic Sensing Feature Extraction and Classification
Authors: Xiao Chen, Xiaoying Kong, Min Xu
Abstract:
This paper presents a road vehicle detection approach for the intelligent transportation system. This approach mainly uses low-cost magnetic sensor and associated data collection system to collect magnetic signals. This system can measure the magnetic field changing, and it also can detect and count vehicles. We extend Mel Frequency Cepstral Coefficients to analyze vehicle magnetic signals. Vehicle type features are extracted using representation of cepstrum, frame energy, and gap cepstrum of magnetic signals. We design a 2-dimensional map algorithm using Vector Quantization to classify vehicle magnetic features to four typical types of vehicles in Australian suburbs: sedan, VAN, truck, and bus. Experiments results show that our approach achieves a high level of accuracy for vehicle detection and classification.Keywords: vehicle classification, signal processing, road traffic model, magnetic sensing
Procedia PDF Downloads 3202454 Comparative Study of Accuracy of Land Cover/Land Use Mapping Using Medium Resolution Satellite Imagery: A Case Study
Authors: M. C. Paliwal, A. K. Jain, S. K. Katiyar
Abstract:
Classification of satellite imagery is very important for the assessment of its accuracy. In order to determine the accuracy of the classified image, usually the assumed-true data are derived from ground truth data using Global Positioning System. The data collected from satellite imagery and ground truth data is then compared to find out the accuracy of data and error matrices are prepared. Overall and individual accuracies are calculated using different methods. The study illustrates advanced classification and accuracy assessment of land use/land cover mapping using satellite imagery. IRS-1C-LISS IV data were used for classification of satellite imagery. The satellite image was classified using the software in fourteen classes namely water bodies, agricultural fields, forest land, urban settlement, barren land and unclassified area etc. Classification of satellite imagery and calculation of accuracy was done by using ERDAS-Imagine software to find out the best method. This study is based on the data collected for Bhopal city boundaries of Madhya Pradesh State of India.Keywords: resolution, accuracy assessment, land use mapping, satellite imagery, ground truth data, error matrices
Procedia PDF Downloads 5072453 MSIpred: A Python 2 Package for the Classification of Tumor Microsatellite Instability from Tumor Mutation Annotation Data Using a Support Vector Machine
Authors: Chen Wang, Chun Liang
Abstract:
Microsatellite instability (MSI) is characterized by high degree of polymorphism in microsatellite (MS) length due to a deficiency in mismatch repair (MMR) system. MSI is associated with several tumor types and its status can be considered as an important indicator for tumor prognostic. Conventional clinical diagnosis of MSI examines PCR products of a panel of MS markers using electrophoresis (MSI-PCR) which is laborious, time consuming, and less reliable. MSIpred, a python 2 package for automatic classification of MSI was released by this study. It computes important somatic mutation features from files in mutation annotation format (MAF) generated from paired tumor-normal exome sequencing data, subsequently using these to predict tumor MSI status with a support vector machine (SVM) classifier trained by MAF files of 1074 tumors belonging to four types. Evaluation of MSIpred on an independent 358-tumor test set achieved overall accuracy of over 98% and area under receiver operating characteristic (ROC) curve of 0.967. These results indicated that MSIpred is a robust pan-cancer MSI classification tool and can serve as a complementary diagnostic to MSI-PCR in MSI diagnosis.Keywords: microsatellite instability, pan-cancer classification, somatic mutation, support vector machine
Procedia PDF Downloads 1732452 The Effect of Feature Selection on Pattern Classification
Authors: Chih-Fong Tsai, Ya-Han Hu
Abstract:
The aim of feature selection (or dimensionality reduction) is to filter out unrepresentative features (or variables) making the classifier perform better than the one without feature selection. Since there are many well-known feature selection algorithms, and different classifiers based on different selection results may perform differently, very few studies consider examining the effect of performing different feature selection algorithms on the classification performances by different classifiers over different types of datasets. In this paper, two widely used algorithms, which are the genetic algorithm (GA) and information gain (IG), are used to perform feature selection. On the other hand, three well-known classifiers are constructed, which are the CART decision tree (DT), multi-layer perceptron (MLP) neural network, and support vector machine (SVM). Based on 14 different types of datasets, the experimental results show that in most cases IG is a better feature selection algorithm than GA. In addition, the combinations of IG with DT and IG with SVM perform best and second best for small and large scale datasets.Keywords: data mining, feature selection, pattern classification, dimensionality reduction
Procedia PDF Downloads 6692451 Application of Data Mining Techniques for Tourism Knowledge Discovery
Authors: Teklu Urgessa, Wookjae Maeng, Joong Seek Lee
Abstract:
Application of five implementations of three data mining classification techniques was experimented for extracting important insights from tourism data. The aim was to find out the best performing algorithm among the compared ones for tourism knowledge discovery. Knowledge discovery process from data was used as a process model. 10-fold cross validation method is used for testing purpose. Various data preprocessing activities were performed to get the final dataset for model building. Classification models of the selected algorithms were built with different scenarios on the preprocessed dataset. The outperformed algorithm tourism dataset was Random Forest (76%) before applying information gain based attribute selection and J48 (C4.5) (75%) after selection of top relevant attributes to the class (target) attribute. In terms of time for model building, attribute selection improves the efficiency of all algorithms. Artificial Neural Network (multilayer perceptron) showed the highest improvement (90%). The rules extracted from the decision tree model are presented, which showed intricate, non-trivial knowledge/insight that would otherwise not be discovered by simple statistical analysis with mediocre accuracy of the machine using classification algorithms.Keywords: classification algorithms, data mining, knowledge discovery, tourism
Procedia PDF Downloads 2952450 Accuracy Improvement of Traffic Participant Classification Using Millimeter-Wave Radar by Leveraging Simulator Based on Domain Adaptation
Authors: Tokihiko Akita, Seiichi Mita
Abstract:
A millimeter-wave radar is the most robust against adverse environments, making it an essential environment recognition sensor for automated driving. However, the reflection signal is sparse and unstable, so it is difficult to obtain the high recognition accuracy. Deep learning provides high accuracy even for them in recognition, but requires large scale datasets with ground truth. Specially, it takes a lot of cost to annotate for a millimeter-wave radar. For the solution, utilizing a simulator that can generate an annotated huge dataset is effective. Simulation of the radar is more difficult to match with real world data than camera image, and recognition by deep learning with higher-order features using the simulator causes further deviation. We have challenged to improve the accuracy of traffic participant classification by fusing simulator and real-world data with domain adaptation technique. Experimental results with the domain adaptation network created by us show that classification accuracy can be improved even with a few real-world data.Keywords: millimeter-wave radar, object classification, deep learning, simulation, domain adaptation
Procedia PDF Downloads 932449 Attribute Index and Classification Method of Earthquake Damage Photographs of Engineering Structure
Authors: Ming Lu, Xiaojun Li, Bodi Lu, Juehui Xing
Abstract:
Earthquake damage phenomenon of each large earthquake gives comprehensive and profound real test to the dynamic performance and failure mechanism of different engineering structures. Cognitive engineering structure characteristics through seismic damage phenomenon are often far superior to expensive shaking table experiments. After the earthquake, people will record a variety of different types of engineering damage photos. However, a large number of earthquake damage photographs lack sufficient information and reduce their using value. To improve the research value and the use efficiency of engineering seismic damage photographs, this paper objects to explore and show seismic damage background information, which includes the earthquake magnitude, earthquake intensity, and the damaged structure characteristics. From the research requirement in earthquake engineering field, the authors use the 2008 China Wenchuan M8.0 earthquake photographs, and provide four kinds of attribute indexes and classification, which are seismic information, structure types, earthquake damage parts and disaster causation factors. The final object is to set up an engineering structural seismic damage database based on these four attribute indicators and classification, and eventually build a website providing seismic damage photographs.Keywords: attribute index, classification method, earthquake damage picture, engineering structure
Procedia PDF Downloads 7652448 Classification of Cosmological Wormhole Solutions in the Framework of General Relativity
Authors: Usamah Al-Ali
Abstract:
We explore the effect of expanding space on the exoticity of the matter supporting a traversable Lorentzian wormhole of zero radial tide whose line element is given by ds2 = dt^2 − a^2(t)[ dr^2/(1 − kr2 −b(r)/r)+ r2dΩ^2 in the context of General Relativity. This task is achieved by deriving the Einstein field equations for anisotropic matter field corresponding to the considered cosmological wormhole metric and performing a classification of their solutions on the basis of a variable equations of state (EoS) of the form p = ω(r)ρ. Explicit forms of the shape function b(r) and the scale factor a(t) arising in the classification are utilized to construct the corresponding energy-momentum tensor where the energy conditions for each case is investigated. While the violation of energy conditions is inevitable in case of static wormholes, the classification we performed leads to interesting solutions in which this violation is either reduced or eliminated.Keywords: general relativity, Einstein field equations, energy conditions, cosmological wormhole
Procedia PDF Downloads 632447 Diversity and Phylogenetic Placement of Seven Inocybe (Inocybaceae, Fungi) from Benin
Authors: Hyppolite Aignon, Souleymane Yorou, Martin Ryberg, Anneli Svanholm
Abstract:
Climate change and human actions cause the extinction of wild mushrooms. In Benin, the diversity of fungi is large and may still contain species new to science but the inventory effort remains low and focuses on particularly edible species (Russula, Lactarius, Lactifluus, and also Amanita). In addition, inventories have started recently and some groups of fungi are not sufficiently sampled, however, the degradation of fungal habitat continues to increase and some species are already disappearing. (Yorou and De Kesel, 2011), however, the degradation of fungi habitat continues to increase and some species may disappear without being known. This genus (Inocybe) overlooked has a worldwide distribution and includes more than 700 species with many undiscovered or poorly known species worldwide and particularly in tropical Africa. It is therefore important to orient the inventory to other genera or important families such as Inocybe (Fungi, Agaricales) in order to highlight their diversity and also to know their phylogenetic positions with a combined approach of gene regions. This study aims to evaluate the species richness and phylogenetic position of Inocybe species and affiliated taxa in West Africa. Thus, in North Benin, we visited the Forest Reserve of Ouémé Supérieur, the Okpara forest and the Alibori Supérieur Forest Reserve. In the center, we targeted the Forest Reserve of Toui-Kilibo. The surveys have been carried during the raining season in the study area meaning from June to October. A total of 24 taxa were collected, photographed and described. The DNA was extracted, the Polymerase Chain Reaction was carried out using primers (ITS1-F, ITS4-B) for Internal transcribed spacer (ITS), (LROR, LWRB, LR7, LR5) for nuclear ribosomal (LSU), (RPB2-f5F, RPB2-b6F, RPB2- b6R2, RPB2-b7R) for RNA polymerase II gene (RPB2) and sequenced. The ITS sequences of the 24 collections of Inocybaceae were edited in Staden and all the sequences were aligned and edited with Aliview v1.17. The sequences were examined by eye for sufficient similarity to be considered the same species. 13 different species were present in the collections. In addition, sequences similar to the ITS sequences of the thirteen final species were searched using BLAST. The nLSU and RPB2 markers for these species have been inserted in a complete alignment, where species from all major Inocybaceae clades as well as from all continents except Antarctica are present. Our new sequences for nLSU and RPB2 have been manually aligned in this dataset. Phylogenetic analysis was performed using the RAxML v7.2.6 maximum likelihood software. Bootstrap replications have been set to 100 and no partitioning of the dataset has been performed. The resulting tree was viewed and edited with FigTree v1.4.3. The preliminary tree resulting from the analysis of maximum likelihood shows us that these species coming from Benin are much diversified and are distributed in four different clades (Inosperma, Inocybe, Mallocybe and Pseudosperma) on the seven clades of Inocybaceae but the phylogeny position of 7 is currently known. This study marks the diversity of Inocybe in Benin and the investigations will continue and a protection plan will be developed in the coming years.Keywords: Benin, diversity, Inocybe, phylogeny placement
Procedia PDF Downloads 1492446 Application of Argumentation for Improving the Classification Accuracy in Inductive Concept Formation
Authors: Vadim Vagin, Marina Fomina, Oleg Morosin
Abstract:
This paper contains the description of argumentation approach for the problem of inductive concept formation. It is proposed to use argumentation, based on defeasible reasoning with justification degrees, to improve the quality of classification models, obtained by generalization algorithms. The experiment’s results on both clear and noisy data are also presented.Keywords: argumentation, justification degrees, inductive concept formation, noise, generalization
Procedia PDF Downloads 4422445 Comparison of Various Classification Techniques Using WEKA for Colon Cancer Detection
Authors: Beema Akbar, Varun P. Gopi, V. Suresh Babu
Abstract:
Colon cancer causes the deaths of about half a million people every year. The common method of its detection is histopathological tissue analysis, it leads to tiredness and workload to the pathologist. A novel method is proposed that combines both structural and statistical pattern recognition used for the detection of colon cancer. This paper presents a comparison among the different classifiers such as Multilayer Perception (MLP), Sequential Minimal Optimization (SMO), Bayesian Logistic Regression (BLR) and k-star by using classification accuracy and error rate based on the percentage split method. The result shows that the best algorithm in WEKA is MLP classifier with an accuracy of 83.333% and kappa statistics is 0.625. The MLP classifier which has a lower error rate, will be preferred as more powerful classification capability.Keywords: colon cancer, histopathological image, structural and statistical pattern recognition, multilayer perception
Procedia PDF Downloads 5742444 Tomato-Weed Classification by RetinaNet One-Step Neural Network
Authors: Dionisio Andujar, Juan lópez-Correa, Hugo Moreno, Angela Ri
Abstract:
The increased number of weeds in tomato crops highly lower yields. Weed identification with the aim of machine learning is important to carry out site-specific control. The last advances in computer vision are a powerful tool to face the problem. The analysis of RGB (Red, Green, Blue) images through Artificial Neural Networks had been rapidly developed in the past few years, providing new methods for weed classification. The development of the algorithms for crop and weed species classification looks for a real-time classification system using Object Detection algorithms based on Convolutional Neural Networks. The site study was located in commercial corn fields. The classification system has been tested. The procedure can detect and classify weed seedlings in tomato fields. The input to the Neural Network was a set of 10,000 RGB images with a natural infestation of Cyperus rotundus l., Echinochloa crus galli L., Setaria italica L., Portulaca oeracea L., and Solanum nigrum L. The validation process was done with a random selection of RGB images containing the aforementioned species. The mean average precision (mAP) was established as the metric for object detection. The results showed agreements higher than 95 %. The system will provide the input for an online spraying system. Thus, this work plays an important role in Site Specific Weed Management by reducing herbicide use in a single step.Keywords: deep learning, object detection, cnn, tomato, weeds
Procedia PDF Downloads 1032443 Solanum tuberosum Ammonium Transporter Gene: Some Bioinformatics Insights
Authors: A. T. Adetunji, F. B. Lewu, R. Mundembe
Abstract:
Plants require nitrogen (N) to support desired production levels. Nitrogen is available to plants in the form of nitrate or ammonium, which are transported into the cell with the aid of various transport proteins. Ammonium transporters (AMTs) play a role in the uptake of ammonium, the form in which nitrogen is preferentially absorbed by plants. Solanum tuberosum AMT1 (StAMT1) was characterized using molecular biology and bioinformatics methods. Nucleotide database sequences were used to design AMT1-specific primers which were used to amplify the AMT1 internal regions. Nucleotide sequencing, alignment and phylogenetic analysis assigned StAMT1 to the AMT1 family. The deduced amino acid sequences showed that StAMT1 is 92%, 83% and 76% similar to Solanum lycopersicum LeAMT1.1, Lotus japonicus LjAMT1.1 and Solanum lycopersicum LeAMT1.2 respectively. StAMT1 fragments were shown to correspond to the 5th - 10th trans-membrane domains. Residue StAMT1 D15 is predicted to be essential for ammonium transport, while mutations of StAMT1 S76A may further enhance ammonium transport.Keywords: ammonium transporter, bioinformatics, nitrogen, primers, Solanum tuberosum
Procedia PDF Downloads 248