Search results for: DNA sequences classification
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2652

Search results for: DNA sequences classification

2292 Molecular Characterization of Grain Storage Proteins in Some Hordeum Species

Authors: Manar Makhoul, Buthainah Alsalamah, Salam Lawand, Hassan Azzam

Abstract:

The major storage proteins in endosperm of 33 cultivated and wild barley genotypes (H.vulgare, H. spontaneum, H. bulbosum, H. murinum, H. marinum) were analyzed to demonstrate the variation in the hordein polypeptides encoded by multigene families in grains. The SDS-PAGE revealed 13 and 17 alleles at the Hor1 and the Hor2 loci respectively, with frequencies from 0.83 to 14 and 0.56 to 13.41% respectively, while seven alleles at the Hor3 locus with frequencies from 3.63 to 30.91% were recognized. The phylogenetic analysis indicated to relevance of the polymorphism in hordein patterns as successful tool in identifying the individual genotypes and discriminating the species according to genome type. We also reported in this research complete nucleotide sequence B-hordein genes of seven wild and cultivated barley genotypes. A 152bp upstream sequence of B-hordein promoter contained a TATA box, CATC box, AAAG motif, N-motif and E-motif. In silico analysis of B-Hordein sequences demonstrated that the coding regions were not interrupted by any intron, and included the complete ORF which varied between 882 and 906 bp, and encoded mature proteins with 293-301 residues characterized by high contents of glutamine (29%), and proline (18%). Comparison of the predicted polypeptide sequences with the published ones suggested that all S-rich prolamins genes are descended from common ancestor. The sequence started at N-terminal with a signal peptide, and then followed directly by two domains; a repetitive one based on the repetition of the repeat unit PQQPFPQQ and C-terminal domain. Also, it was found that positions of the eight cysteine residues were highly conserved in all the B-hordein sequences, but Hordeum bulbosum had additional unpaired one. The phylogenetic tree of B-hordein polypeptide separated the genotypes in distinct seven subgroups. In general, the high homology between B-hordeins and LMW glutenin subunits suggests similar bread-making influences for these B-hordeins.

Keywords: hordeum, phylogenetic tree, sequencing, storage protein

Procedia PDF Downloads 243
2291 Random Forest Classification for Population Segmentation

Authors: Regina Chua

Abstract:

To reduce the costs of re-fielding a large survey, a Random Forest classifier was applied to measure the accuracy of classifying individuals into their assigned segments with the fewest possible questions. Given a long survey, one needed to determine the most predictive ten or fewer questions that would accurately assign new individuals to custom segments. Furthermore, the solution needed to be quick in its classification and usable in non-Python environments. In this paper, a supervised Random Forest classifier was modeled on a dataset with 7,000 individuals, 60 questions, and 254 features. The Random Forest consisted of an iterative collection of individual decision trees that result in a predicted segment with robust precision and recall scores compared to a single tree. A random 70-30 stratified sampling for training the algorithm was used, and accuracy trade-offs at different depths for each segment were identified. Ultimately, the Random Forest classifier performed at 87% accuracy at a depth of 10 with 20 instead of 254 features and 10 instead of 60 questions. With an acceptable accuracy in prioritizing feature selection, new tools were developed for non-Python environments: a worksheet with a formulaic version of the algorithm and an embedded function to predict the segment of an individual in real-time. Random Forest was determined to be an optimal classification model by its feature selection, performance, processing speed, and flexible application in other environments.

Keywords: machine learning, supervised learning, data science, random forest, classification, prediction, predictive modeling

Procedia PDF Downloads 79
2290 Genetic Algorithms for Feature Generation in the Context of Audio Classification

Authors: José A. Menezes, Giordano Cabral, Bruno T. Gomes

Abstract:

Choosing good features is an essential part of machine learning. Recent techniques aim to automate this process. For instance, feature learning intends to learn the transformation of raw data into a useful representation to machine learning tasks. In automatic audio classification tasks, this is interesting since the audio, usually complex information, needs to be transformed into a computationally convenient input to process. Another technique tries to generate features by searching a feature space. Genetic algorithms, for instance, have being used to generate audio features by combining or modifying them. We find this approach particularly interesting and, despite the undeniable advances of feature learning approaches, we wanted to take a step forward in the use of genetic algorithms to find audio features, combining them with more conventional methods, like PCA, and inserting search control mechanisms, such as constraints over a confusion matrix. This work presents the results obtained on particular audio classification problems.

Keywords: feature generation, feature learning, genetic algorithm, music information retrieval

Procedia PDF Downloads 415
2289 Machine Learning-Enabled Classification of Climbing Using Small Data

Authors: Nicholas Milburn, Yu Liang, Dalei Wu

Abstract:

Athlete performance scoring within the climbing do-main presents interesting challenges as the sport does not have an objective way to assign skill. Assessing skill levels within any sport is valuable as it can be used to mark progress while training, and it can help an athlete choose appropriate climbs to attempt. Machine learning-based methods are popular for complex problems like this. The dataset available was composed of dynamic force data recorded during climbing; however, this dataset came with challenges such as data scarcity, imbalance, and it was temporally heterogeneous. Investigated solutions to these challenges include data augmentation, temporal normalization, conversion of time series to the spectral domain, and cross validation strategies. The investigated solutions to the classification problem included light weight machine classifiers KNN and SVM as well as the deep learning with CNN. The best performing model had an 80% accuracy. In conclusion, there seems to be enough information within climbing force data to accurately categorize climbers by skill.

Keywords: classification, climbing, data imbalance, data scarcity, machine learning, time sequence

Procedia PDF Downloads 128
2288 An Approach for Vocal Register Recognition Based on Spectral Analysis of Singing

Authors: Aleksandra Zysk, Pawel Badura

Abstract:

Recognizing and controlling vocal registers during singing is a difficult task for beginner vocalist. It requires among others identifying which part of natural resonators is being used when a sound propagates through the body. Thus, an application has been designed allowing for sound recording, automatic vocal register recognition (VRR), and a graphical user interface providing real-time visualization of the signal and recognition results. Six spectral features are determined for each time frame and passed to the support vector machine classifier yielding a binary decision on the head or chest register assignment of the segment. The classification training and testing data have been recorded by ten professional female singers (soprano, aged 19-29) performing sounds for both chest and head register. The classification accuracy exceeded 93% in each of various validation schemes. Apart from a hard two-class clustering, the support vector classifier returns also information on the distance between particular feature vector and the discrimination hyperplane in a feature space. Such an information reflects the level of certainty of the vocal register classification in a fuzzy way. Thus, the designed recognition and training application is able to assess and visualize the continuous trend in singing in a user-friendly graphical mode providing an easy way to control the vocal emission.

Keywords: classification, singing, spectral analysis, vocal emission, vocal register

Procedia PDF Downloads 289
2287 Performance Comparison of Deep Convolutional Neural Networks for Binary Classification of Fine-Grained Leaf Images

Authors: Kamal KC, Zhendong Yin, Dasen Li, Zhilu Wu

Abstract:

Intra-plant disease classification based on leaf images is a challenging computer vision task due to similarities in texture, color, and shape of leaves with a slight variation of leaf spot; and external environmental changes such as lighting and background noises. Deep convolutional neural network (DCNN) has proven to be an effective tool for binary classification. In this paper, two methods for binary classification of diseased plant leaves using DCNN are presented; model created from scratch and transfer learning. Our main contribution is a thorough evaluation of 4 networks created from scratch and transfer learning of 5 pre-trained models. Training and testing of these models were performed on a plant leaf images dataset belonging to 16 distinct classes, containing a total of 22,265 images from 8 different plants, consisting of a pair of healthy and diseased leaves. We introduce a deep CNN model, Optimized MobileNet. This model with depthwise separable CNN as a building block attained an average test accuracy of 99.77%. We also present a fine-tuning method by introducing the concept of a convolutional block, which is a collection of different deep neural layers. Fine-tuned models proved to be efficient in terms of accuracy and computational cost. Fine-tuned MobileNet achieved an average test accuracy of 99.89% on 8 pairs of [healthy, diseased] leaf ImageSet.

Keywords: deep convolution neural network, depthwise separable convolution, fine-grained classification, MobileNet, plant disease, transfer learning

Procedia PDF Downloads 168
2286 Harmonic Data Preparation for Clustering and Classification

Authors: Ali Asheibi

Abstract:

The rapid increase in the size of databases required to store power quality monitoring data has demanded new techniques for analysing and understanding the data. One suggested technique to assist in analysis is data mining. Preparing raw data to be ready for data mining exploration take up most of the effort and time spent in the whole data mining process. Clustering is an important technique in data mining and machine learning in which underlying and meaningful groups of data are discovered. Large amounts of harmonic data have been collected from an actual harmonic monitoring system in a distribution system in Australia for three years. This amount of acquired data makes it difficult to identify operational events that significantly impact the harmonics generated on the system. In this paper, harmonic data preparation processes to better understanding of the data have been presented. Underlying classes in this data has then been identified using clustering technique based on the Minimum Message Length (MML) method. The underlying operational information contained within the clusters can be rapidly visualised by the engineers. The C5.0 algorithm was used for classification and interpretation of the generated clusters.

Keywords: data mining, harmonic data, clustering, classification

Procedia PDF Downloads 230
2285 Unravelling the Knot: Towards a Definition of ‘Digital Labor’

Authors: Marta D'Onofrio

Abstract:

The debate on the digitalization of the economy has raised questions about how both labor and the regulation of work processes are changing due to the introduction of digital technologies in the productive system. Within the literature, the term ‘digital labor’ is commonly used to identify the impact of digitalization on labor. Despite the wide use of this term, it is still not available an unambiguous definition of it, and this could create confusion in the use of terminology and in the attempts of classification. As a consequence, the purpose of this paper is to provide for a definition and to propose a classification of ‘digital labor’, resorting to the theoretical approach of organizational studies.

Keywords: digital labor, digitalization, data-driven algorithms, big data, organizational studies

Procedia PDF Downloads 135
2284 Classification of Tropical Semi-Modules

Authors: Wagneur Edouard

Abstract:

Tropical algebra is the algebra constructed over an idempotent semifield S. We show here that every m-dimensional tropical module M over S with strongly independent basis can be embedded into Sm, and provide an algebraic invariant -the Γ-matrix of M- which characterises the isomorphy class of M. The strong independence condition also yields a significant improvement to the Whitney embedding for tropical torsion modules published earlier We also show that the strong independence of the basis of M is equivalent to the unique representation of elements of M. Numerous examples illustrate our results.

Keywords: classification, idempotent semi-modules, strong independence, tropical algebra

Procedia PDF Downloads 358
2283 Classification of Potential Biomarkers in Breast Cancer Using Artificial Intelligence Algorithms and Anthropometric Datasets

Authors: Aref Aasi, Sahar Ebrahimi Bajgani, Erfan Aasi

Abstract:

Breast cancer (BC) continues to be the most frequent cancer in females and causes the highest number of cancer-related deaths in women worldwide. Inspired by recent advances in studying the relationship between different patient attributes and features and the disease, in this paper, we have tried to investigate the different classification methods for better diagnosis of BC in the early stages. In this regard, datasets from the University Hospital Centre of Coimbra were chosen, and different machine learning (ML)-based and neural network (NN) classifiers have been studied. For this purpose, we have selected favorable features among the nine provided attributes from the clinical dataset by using a random forest algorithm. This dataset consists of both healthy controls and BC patients, and it was noted that glucose, BMI, resistin, and age have the most importance, respectively. Moreover, we have analyzed these features with various ML-based classifier methods, including Decision Tree (DT), K-Nearest Neighbors (KNN), eXtreme Gradient Boosting (XGBoost), Logistic Regression (LR), Naive Bayes (NB), and Support Vector Machine (SVM) along with NN-based Multi-Layer Perceptron (MLP) classifier. The results revealed that among different techniques, the SVM and MLP classifiers have the most accuracy, with amounts of 96% and 92%, respectively. These results divulged that the adopted procedure could be used effectively for the classification of cancer cells, and also it encourages further experimental investigations with more collected data for other types of cancers.

Keywords: breast cancer, diagnosis, machine learning, biomarker classification, neural network

Procedia PDF Downloads 115
2282 Phylogenetic Analysis Based On the Internal Transcribed Spacer-2 (ITS2) Sequences of Diadegma semiclausum (Hymenoptera: Ichneumonidae) Populations Reveals Significant Adaptive Evolution

Authors: Ebraheem Al-Jouri, Youssef Abu-Ahmad, Ramasamy Srinivasan

Abstract:

The parasitoid, Diadegma semiclausum (Hymenoptera: Ichneumonidae) is one of the most effective exotic parasitoids of diamondback moth (DBM), Plutella xylostella in the lowland areas of Homs, Syria. Molecular evolution studies are useful tools to shed light on the molecular bases of insect geographical spread and adaptation to new hosts and environment and for designing better control strategies. In this study, molecular evolution analysis was performed based on the 42 nuclear internal transcribed spacer-2 (ITS2) sequences representing the D. semiclausum and eight other Diadegma spp. from Syria and worldwide. Possible recombination events were identified by RDP4 program. Four potential recombinants of the American D. insulare and D. fenestrale (Jeju) were detected. After detecting and removing recombinant sequences, the ratio of non-synonymous (dN) to synonymous (dS) substitutions per site (dN/dS=ɷ) has been used to identify codon positions involved in adaptive processes. Bayesian techniques were applied to detect selective pressures at a codon level by using five different approaches including: fixed effects likelihood (FEL), internal fixed effects likelihood (IFEL), random effects method (REL), mixed effects model of evolution (MEME) and Program analysis of maximum liklehood (PAML). Among the 40 positively selected amino acids (aa) that differed significantly between clades of Diadegma species, three aa under positive selection were only identified in D. semiclausum. Additionally, all D. semiclausum branches tree were highly found under episodic diversifying selection (EDS) at p≤0.05. Our study provide evidence that both recombination and positive selection have contributed to the molecular diversity of Diadegma spp. and highlights the significant contribution of D. semiclausum in adaptive evolution and influence the fitness in the DBM parasitoid.

Keywords: diadegma sp, DBM, ITS2, phylogeny, recombination, dN/dS, evolution, positive selection

Procedia PDF Downloads 399
2281 Effect of the Birth Order and Arrival of Younger Siblings on the Development of a Child: Evidence from India

Authors: Swati Srivastava, Ashish Kumar Upadhyay

Abstract:

Using longitudinal data from three waves of Young Lives Study and Ordinary Least Square methods, study has investigated the effect of birth order and arrival of younger siblings on child development in India. Study used child’s height for age z-score, weight for age z-score, BMI for age z-score, Peabody Picture Vocabulary Test (PPVT-Score)c, maths score, Early Grade Reading Assessment Test (ERGA) score, and memory score to measure the physical and cognitive development of child during wave-3. Findings suggest that having a high birth order is detrimental for child development and the gap between adjacent siblings is larger for children late in the birth sequences than early in the birth sequences. Study also reported that not only older siblings but arrival of younger siblings before assessment of test also reduces the development of a child. The effects become stronger in case of female children than male children.

Keywords: height for age z-score, weight for age z-score, BMI for z-score, PPVT score, math score, EGRA score, memory score, birth order, siblings, Young Lives Study, India

Procedia PDF Downloads 316
2280 Engineering Parameters and Classification of Marly Soils of Tabriz

Authors: Amirali Mahouti, Hooshang Katebi

Abstract:

Enlargement of Tabriz metropolis to the east and north-east caused urban construction to be built on Marl layers and because of increase in excavations depth, further information of this layer is inescapable. Looking at geotechnical investigation shows there is not enough information about Tabriz Marl and this soil has been classified only by color. Tabriz Marl is lacustrine carbonate sediment outcrops, surrounds eastern, northern and southern region of city in the East Azerbaijan Province of Iran and is known as bed rock of city under alluvium sediments. This investigation aims to characterize geotechnical parameters of this soil to identify and set it in classification system of carbonated soils. For this purpose, specimens obtained from 80 locations over the city and subjected to physical and mechanical tests, such as Atterberg limits, density, moisture content, unconfined compression, direct shear and consolidation. CaCO3 content, organic content, PH, XRD, XRF, TGA and geophysical downhole tests also have been done on some of them.

Keywords: carbonated soils, classification of soils, mineralogy, physical and mechanical tests for Marls, Tabriz Marl

Procedia PDF Downloads 302
2279 Computational Model for Predicting Effective siRNA Sequences Using Whole Stacking Energy (ΔG) for Gene Silencing

Authors: Reena Murali, David Peter S.

Abstract:

The small interfering RNA (siRNA) alters the regulatory role of mRNA during gene expression by translational inhibition. Recent studies shows that up regulation of mRNA cause serious diseases like Cancer. So designing effective siRNA with good knockdown effects play an important role in gene silencing. Various siRNA design tools had been developed earlier. In this work, we are trying to analyze the existing good scoring second generation siRNA predicting tools and to optimize the efficiency of siRNA prediction by designing a computational model using Artificial Neural Network and whole stacking energy (ΔG), which may help in gene silencing and drug design in cancer therapy. Our model is trained and tested against a large data set of siRNA sequences. Validation of our results is done by finding correlation coefficient of experimental versus observed inhibition efficacy of siRNA. We achieved a correlation coefficient of 0.727 in our previous computational model and we could improve the correlation coefficient up to 0.753 when the threshold of whole tacking energy is greater than or equal to -32.5 kcal/mol.

Keywords: artificial neural network, double stranded RNA, RNA interference, short interfering RNA

Procedia PDF Downloads 512
2278 Occurrence of Porcine circovirus Type 2 in Pigs of Eastern Cape Province South Africa

Authors: Kayode O. Afolabi, Benson C. Iweriebor, Anthony I. Okoh, Larry C. Obi

Abstract:

Porcine circovirus type 2 (PCV2) is the major etiological viral agent of porcine multisystemic wasting syndrome (PWMS) and other porcine circovirus-associated diseases (PCVAD) of great economic importance in pig industry globally. In an effort to determine the status of swine herds in the Province as regarding the ‘small but powerful’ viral pathogen; a total of 375 blood, faecal and nasal swab samples were obtained from seven pig farms (commercial and communal) in Amathole, O.R. Tambo and Chris-Hani District Municipalities of Eastern Cape Province between the year 2015 and 2016. Three hundred and thirty nine (339) samples out of the total sample were subjected to molecular screening using PCV2 specific primers by conventional polymerase chain reaction (PCR). Selected sequences were further analyzed and confirmed through genome sequencing and phylogenetic analyses. The data obtained revealed that 15.93% of the screened samples (54/339) from the swine herds of the studied areas were positive for PCV2; while the severity of occurrence of the viral pathogen as observed at farm level ranges from approximately 5.6% to 60% in the studied farms. The Majority, precisely 15 out of 17 (88%) analyzed sequences were found clustering with other PCV2b reference strains in the phylogenetic analysis. More interestingly, two other sequences obtained were also found clustering within PCV2d genogroup, which is presently another fast-spreading genotype with observable higher virulence in global swine herds. This finding confirmed the presence of this all-important viral pathogen in pigs of the region; which could result in a serious outbreak of PCVAD and huge economic loss at the instances of triggering factors if no appropriate measures are taken to curb its spread effectively.

Keywords: pigs, polymerase chain reaction, porcine circovirus type 2, South Africa

Procedia PDF Downloads 193
2277 Using New Machine Algorithms to Classify Iranian Musical Instruments According to Temporal, Spectral and Coefficient Features

Authors: Ronak Khosravi, Mahmood Abbasi Layegh, Siamak Haghipour, Avin Esmaili

Abstract:

In this paper, a study on classification of musical woodwind instruments using a small set of features selected from a broad range of extracted ones by the sequential forward selection method was carried out. Firstly, we extract 42 features for each record in the music database of 402 sound files belonging to five different groups of Flutes (end blown and internal duct), Single –reed, Double –reed (exposed and capped), Triple reed and Quadruple reed. Then, the sequential forward selection method is adopted to choose the best feature set in order to achieve very high classification accuracy. Two different classification techniques of support vector machines and relevance vector machines have been tested out and an accuracy of up to 96% can be achieved by using 21 time, frequency and coefficient features and relevance vector machine with the Gaussian kernel function.

Keywords: coefficient features, relevance vector machines, spectral features, support vector machines, temporal features

Procedia PDF Downloads 299
2276 Stabilization of Clay Soil Using A-3 Soil

Authors: Mohammed Mustapha Alhaji, Sadiku Salawu

Abstract:

A clay soil which classified under A-7-6 soil according to AASHTO soil classification system and CH according to the unified soil classification system was stabilized using A-3 soil (AASHTO soil classification system). The clay soil was replaced with 0%, 10%, 20% to 100% A-3 soil, compacted at both the BSL and BSH compaction energy level and using unconfined compressive strength as evaluation criteria. The MDD of the compactions at both the BSL and BSH compaction energy levels showed increase in MDD from 0% A-3 soil replacement to 40% A-3 soil replacement after which the values reduced to 100% A-3 soil replacement. The trend of the OMC with varied A-3 soil replacement is similar to that of MDD but in a reversed order. The OMC reduced from 0% A-3 soil replacement to 40% A-3 soil replacement after which the values increased to 100% A-3 soil replacement. This trend was attributed to the observed reduction in the void ratio from 0% A-3 soil replacement to 40% A-3 soil replacement after which the void ratio increased to 100% A-3 soil replacement. The maximum UCS for clay at varied A-3 soil replacement increased from 272 and 770kN/m2 for BSL and BSH compaction energy level at 0% A-3 soil replacement to 295 and 795kN/m2 for BSL and BSH compaction energy level respectively at 10% A-3 soil replacement after which the values reduced to 22 and 60kN/m2 for BSL and BSH compaction energy level respectively at 70% A-3 soil replacement. Beyond 70% A-3 soil replacement, the mixture cannot be moulded for UCS test.

Keywords: A-3 soil, clay minerals, pozzolanic action, stabilization

Procedia PDF Downloads 412
2275 Using India’s Traditional Knowledge Digital Library on Traditional Tibetan Medicine

Authors: Chimey Lhamo, Ngawang Tsering

Abstract:

Traditional Tibetan medicine, known as Sowa Rigpa (Science of healing), originated more than 2500 years ago with an insightful background, and it has been growing significant attention in many Asian countries like China, India, Bhutan, and Nepal. Particularly, the Indian government has targeted Traditional Tibetan medicine as its major Indian medical system, including Ayurveda. Although Traditional Tibetan medicine has been growing interest and has a long history, it is not easily recognized worldwide because it exists only in the Tibetan language and it is neither accessible nor understood by patent examiners at the international patent office, data about Traditional Tibetan medicine is not yet broadly exist in the Internet. There has also been the exploitation of traditional Tibetan medicine increasing. The Traditional Knowledge Digital Library is a database aiming to prevent the patenting and misappropriation of India’s traditional medicine knowledge by using India’s Traditional knowledge Digital Library on Sowa Rigpa in order to prevent its exploitation at international patent with the help of information technology tools and an innovative classification systems-traditional knowledge resource classification (TKRC). As of date, more than 3000 Sowa Rigpa formulations have been transcribed into a Traditional Knowledge Digital Library database. In this paper, we are presenting India's Traditional Knowledge Digital Library for Traditional Tibetan medicine, and this database system helps to preserve and prevent the exploitation of Sowa Rigpa. Gradually it will be approved and accepted globally.

Keywords: traditional Tibetan medicine, India's traditional knowledge digital library, traditional knowledge resources classification, international patent classification

Procedia PDF Downloads 111
2274 A Review: Detection and Classification Defects on Banana and Apples by Computer Vision

Authors: Zahow Muoftah

Abstract:

Traditional manual visual grading of fruits has been one of the agricultural industry’s major challenges due to its laborious nature as well as inconsistency in the inspection and classification process. The main requirements for computer vision and visual processing are some effective techniques for identifying defects and estimating defect areas. Automated defect detection using computer vision and machine learning has emerged as a promising area of research with a high and direct impact on the visual inspection domain. Grading, sorting, and disease detection are important factors in determining the quality of fruits after harvest. Many studies have used computer vision to evaluate the quality level of fruits during post-harvest. Many studies have used computer vision to evaluate the quality level of fruits during post-harvest. Many studies have been conducted to identify diseases and pests that affect the fruits of agricultural crops. However, most previous studies concentrated solely on the diagnosis of a lesion or disease. This study focused on a comprehensive study to identify pests and diseases of apple and banana fruits using detection and classification defects on Banana and Apples by Computer Vision. As a result, the current article includes research from these domains as well. Finally, various pattern recognition techniques for detecting apple and banana defects are discussed.

Keywords: computer vision, banana, apple, detection, classification

Procedia PDF Downloads 84
2273 Predicting Potential Protein Therapeutic Candidates from the Gut Microbiome

Authors: Prasanna Ramachandran, Kareem Graham, Helena Kiefel, Sunit Jain, Todd DeSantis

Abstract:

Microbes that reside inside the mammalian GI tract, commonly referred to as the gut microbiome, have been shown to have therapeutic effects in animal models of disease. We hypothesize that specific proteins produced by these microbes are responsible for this activity and may be used directly as therapeutics. To speed up the discovery of these key proteins from the big-data metagenomics, we have applied machine learning techniques. Using amino acid sequences of known epitopes and their corresponding binding partners, protein interaction descriptors (PID) were calculated, making a positive interaction set. A negative interaction dataset was calculated using sequences of proteins known not to interact with these same binding partners. Using Random Forest and positive and negative PID, a machine learning model was trained and used to predict interacting versus non-interacting proteins. Furthermore, the continuous variable, cosine similarity in the interaction descriptors was used to rank bacterial therapeutic candidates. Laboratory binding assays were conducted to test the candidates for their potential as therapeutics. Results from binding assays reveal the accuracy of the machine learning prediction and are subsequently used to further improve the model.

Keywords: protein-interactions, machine-learning, metagenomics, microbiome

Procedia PDF Downloads 347
2272 Maturity Classification of Oil Palm Fresh Fruit Bunches Using Thermal Imaging Technique

Authors: Shahrzad Zolfagharnassab, Abdul Rashid Mohamed Shariff, Reza Ehsani, Hawa Ze Jaffar, Ishak Aris

Abstract:

Ripeness estimation of oil palm fresh fruit is important processes that affect the profitableness and salability of oil palm fruits. The adulthood or ripeness of the oil palm fruits influences the quality of oil palm. Conventional procedure includes physical grading of Fresh Fruit Bunches (FFB) maturity by calculating the number of loose fruits per bunch. This physical classification of oil palm FFB is costly, time consuming and the results may have human error. Hence, many researchers try to develop the methods for ascertaining the maturity of oil palm fruits and thereby, deviously the oil content of distinct palm fruits without the need for exhausting oil extraction and analysis. This research investigates the potential of infrared images (Thermal Images) as a predictor to classify the oil palm FFB ripeness. A total of 270 oil palm fresh fruit bunches from most common cultivar of oil palm bunches Nigresens according to three maturity categories: under ripe, ripe and over ripe were collected. Each sample was scanned by the thermal imaging cameras FLIR E60 and FLIR T440. The average temperature of each bunches were calculated by using image processing in FLIR Tools and FLIR ThermaCAM researcher pro 2.10 environment software. The results show that temperature content decreased from immature to over mature oil palm FFBs. An overall analysis-of-variance (ANOVA) test was proved that this predictor gave significant difference between underripe, ripe and overripe maturity categories. This shows that the temperature as predictors can be good indicators to classify oil palm FFB. Classification analysis was performed by using the temperature of the FFB as predictors through Linear Discriminant Analysis (LDA), Mahalanobis Discriminant Analysis (MDA), Artificial Neural Network (ANN) and K- Nearest Neighbor (KNN) methods. The highest overall classification accuracy was 88.2% by using Artificial Neural Network. This research proves that thermal imaging and neural network method can be used as predictors of oil palm maturity classification.

Keywords: artificial neural network, maturity classification, oil palm FFB, thermal imaging

Procedia PDF Downloads 338
2271 Amharic Text News Classification Using Supervised Learning

Authors: Misrak Assefa

Abstract:

The Amharic language is the second most widely spoken Semitic language in the world. There are several new overloaded on the web. Searching some useful documents from the web on a specific topic, which is written in the Amharic language, is a challenging task. Hence, document categorization is required for managing and filtering important information. In the classification of Amharic text news, there is still a gap in the domain of information that needs to be launch. This study attempts to design an automatic Amharic news classification using a supervised learning mechanism on four un-touch classes. To achieve this research, 4,182 news articles were used. Naive Bayes (NB) and Decision tree (j48) algorithms were used to classify the given Amharic dataset. In this paper, k-fold cross-validation is used to estimate the accuracy of the classifier. As a result, it shows those algorithms can be applicable in Amharic news categorization. The best average accuracy result is achieved by j48 decision tree and naïve Bayes is 95.2345 %, and 94.6245 % respectively using three categories. This research indicated that a typical decision tree algorithm is more applicable to Amharic news categorization.

Keywords: text categorization, supervised machine learning, naive Bayes, decision tree

Procedia PDF Downloads 172
2270 Review of Different Machine Learning Algorithms

Authors: Syed Romat Ali Shah, Bilal Shoaib, Saleem Akhtar, Munib Ahmad, Shahan Sadiqui

Abstract:

Classification is a data mining technique, which is recognizedon Machine Learning (ML) algorithm. It is used to classifythe individual articlein a knownofinformation into a set of predefinemodules or group. Web mining is also a portion of that sympathetic of data mining methods. The main purpose of this paper to analysis and compare the performance of Naïve Bayse Algorithm, Decision Tree, K-Nearest Neighbor (KNN), Artificial Neural Network (ANN)and Support Vector Machine (SVM). This paper consists of different ML algorithm and their advantages and disadvantages and also define research issues.

Keywords: Data Mining, Web Mining, classification, ML Algorithms

Procedia PDF Downloads 270
2269 Interpretation of the Russia-Ukraine 2022 War via N-Gram Analysis

Authors: Elcin Timur Cakmak, Ayse Oguzlar

Abstract:

This study presents the results of the tweets sent by Twitter users on social media about the Russia-Ukraine war by bigram and trigram methods. On February 24, 2022, Russian President Vladimir Putin declared a military operation against Ukraine, and all eyes were turned to this war. Many people living in Russia and Ukraine reacted to this war and protested and also expressed their deep concern about this war as they felt the safety of their families and their futures were at stake. Most people, especially those living in Russia and Ukraine, express their views on the war in different ways. The most popular way to do this is through social media. Many people prefer to convey their feelings using Twitter, one of the most frequently used social media tools. Since the beginning of the war, it is seen that there have been thousands of tweets about the war from many countries of the world on Twitter. These tweets accumulated in data sources are extracted using various codes for analysis through Twitter API and analysed by Python programming language. The aim of the study is to find the word sequences in these tweets by the n-gram method, which is known for its widespread use in computational linguistics and natural language processing. The tweet language used in the study is English. The data set consists of the data obtained from Twitter between February 24, 2022, and April 24, 2022. The tweets obtained from Twitter using the #ukraine, #russia, #war, #putin, #zelensky hashtags together were captured as raw data, and the remaining tweets were included in the analysis stage after they were cleaned through the preprocessing stage. In the data analysis part, the sentiments are found to present what people send as a message about the war on Twitter. Regarding this, negative messages make up the majority of all the tweets as a ratio of %63,6. Furthermore, the most frequently used bigram and trigram word groups are found. Regarding the results, the most frequently used word groups are “he, is”, “I, do”, “I, am” for bigrams. Also, the most frequently used word groups are “I, do, not”, “I, am, not”, “I, can, not” for trigrams. In the machine learning phase, the accuracy of classifications is measured by Classification and Regression Trees (CART) and Naïve Bayes (NB) algorithms. The algorithms are used separately for bigrams and trigrams. We gained the highest accuracy and F-measure values by the NB algorithm and the highest precision and recall values by the CART algorithm for bigrams. On the other hand, the highest values for accuracy, precision, and F-measure values are achieved by the CART algorithm, and the highest value for the recall is gained by NB for trigrams.

Keywords: classification algorithms, machine learning, sentiment analysis, Twitter

Procedia PDF Downloads 57
2268 The Optimization of Decision Rules in Multimodal Decision-Level Fusion Scheme

Authors: Andrey V. Timofeev, Dmitry V. Egorov

Abstract:

This paper introduces an original method of parametric optimization of the structure for multimodal decision-level fusion scheme which combines the results of the partial solution of the classification task obtained from assembly of the mono-modal classifiers. As a result, a multimodal fusion classifier which has the minimum value of the total error rate has been obtained.

Keywords: classification accuracy, fusion solution, total error rate, multimodal fusion classifier

Procedia PDF Downloads 446
2267 Stereo Camera Based Speed-Hump Detection Process for Real Time Driving Assistance System in the Daytime

Authors: Hyun-Koo Kim, Yong-Hun Kim, Soo-Young Suk, Ju H. Park, Ho-Youl Jung

Abstract:

This paper presents an effective speed hump detection process at the day-time. we focus only on round types of speed humps in the day-time dynamic road environment. The proposed speed hump detection scheme consists mainly of two process as stereo matching and speed hump detection process. Our proposed process focuses to speed hump detection process. Speed hump detection process consist of noise reduction step, data fusion step, and speed hemp detection step. The proposed system is tested on Intel Core CPU with 2.80 GHz and 4 GB RAM tested in the urban road environments. The frame rate of test videos is 30 frames per second and the size of each frame of grabbed image sequences is 1280 pixels by 670 pixels. Using object-marked sequences acquired with an on-vehicle camera, we recorded speed humps and non-speed humps samples. Result of the tests, our proposed method can be applied in real-time systems by computation time is 13 ms. For instance; our proposed method reaches 96.1 %.

Keywords: data fusion, round types speed hump, speed hump detection, surface filter

Procedia PDF Downloads 495
2266 An Overview of the Porosity Classification in Carbonate Reservoirs and Their Challenges: An Example of Macro-Microporosity Classification from Offshore Miocene Carbonate in Central Luconia, Malaysia

Authors: Hammad T. Janjuhah, Josep Sanjuan, Mohamed K. Salah

Abstract:

Biological and chemical activities in carbonates are responsible for the complexity of the pore system. Primary porosity is generally of natural origin while secondary porosity is subject to chemical reactivity through diagenetic processes. To understand the integrated part of hydrocarbon exploration, it is necessary to understand the carbonate pore system. However, the current porosity classification scheme is limited to adequately predict the petrophysical properties of different reservoirs having various origins and depositional environments. Rock classification provides a descriptive method for explaining the lithofacies but makes no significant contribution to the application of porosity and permeability (poro-perm) correlation. The Central Luconia carbonate system (Malaysia) represents a good example of pore complexity (in terms of nature and origin) mainly related to diagenetic processes which have altered the original reservoir. For quantitative analysis, 32 high-resolution images of each thin section were taken using transmitted light microscopy. The quantification of grains, matrix, cement, and macroporosity (pore types) was achieved using a petrographic analysis of thin sections and FESEM images. The point counting technique was used to estimate the amount of macroporosity from thin section, which was then subtracted from the total porosity to derive the microporosity. The quantitative observation of thin sections revealed that the mouldic porosity (macroporosity) is the dominant porosity type present, whereas the microporosity seems to correspond to a sum of 40 to 50% of the total porosity. It has been proven that these Miocene carbonates contain a significant amount of microporosity, which significantly complicates the estimation and production of hydrocarbons. Neglecting its impact can increase uncertainty about estimating hydrocarbon reserves. Due to the diversity of geological parameters, the application of existing porosity classifications does not allow a better understanding of the poro-perm relationship. However, the classification can be improved by including the pore types and pore structures where they can be divided into macro- and microporosity. Such studies of microporosity identification/classification represent now a major concern in limestone reservoirs around the world.

Keywords: overview of porosity classification, reservoir characterization, microporosity, carbonate reservoir

Procedia PDF Downloads 132
2265 Using Time Series NDVI to Model Land Cover Change: A Case Study in the Berg River Catchment Area, Western Cape, South Africa

Authors: Adesuyi Ayodeji Steve, Zahn Munch

Abstract:

This study investigates the use of MODIS NDVI to identify agricultural land cover change areas on an annual time step (2007 - 2012) and characterize the trend in the study area. An ISODATA classification was performed on the MODIS imagery to select only the agricultural class producing 3 class groups namely: agriculture, agriculture/semi-natural, and semi-natural. NDVI signatures were created for the time series to identify areas dominated by cereals and vineyards with the aid of ancillary, pictometry and field sample data. The NDVI signature curve and training samples aided in creating a decision tree model in WEKA 3.6.9. From the training samples two classification models were built in WEKA using decision tree classifier (J48) algorithm; Model 1 included ISODATA classification and Model 2 without, both having accuracies of 90.7% and 88.3% respectively. The two models were used to classify the whole study area, thus producing two land cover maps with Model 1 and 2 having classification accuracies of 77% and 80% respectively. Model 2 was used to create change detection maps for all the other years. Subtle changes and areas of consistency (unchanged) were observed in the agricultural classes and crop practices over the years as predicted by the land cover classification. 41% of the catchment comprises of cereals with 35% possibly following a crop rotation system. Vineyard largely remained constant over the years, with some conversion to vineyard (1%) from other land cover classes. Some of the changes might be as a result of misclassification and crop rotation system.

Keywords: change detection, land cover, modis, NDVI

Procedia PDF Downloads 381
2264 Ontology-Based Backpropagation Neural Network Classification and Reasoning Strategy for NoSQL and SQL Databases

Authors: Hao-Hsiang Ku, Ching-Ho Chi

Abstract:

Big data applications have become an imperative for many fields. Many researchers have been devoted into increasing correct rates and reducing time complexities. Hence, the study designs and proposes an Ontology-based backpropagation neural network classification and reasoning strategy for NoSQL big data applications, which is called ON4NoSQL. ON4NoSQL is responsible for enhancing the performances of classifications in NoSQL and SQL databases to build up mass behavior models. Mass behavior models are made by MapReduce techniques and Hadoop distributed file system based on Hadoop service platform. The reference engine of ON4NoSQL is the ontology-based backpropagation neural network classification and reasoning strategy. Simulation results indicate that ON4NoSQL can efficiently achieve to construct a high performance environment for data storing, searching, and retrieving.

Keywords: Hadoop, NoSQL, ontology, back propagation neural network, high distributed file system

Procedia PDF Downloads 246
2263 Land Use Change Detection Using Satellite Images for Najran City, Kingdom of Saudi Arabia (KSA)

Authors: Ismail Elkhrachy

Abstract:

Determination of land use changing is an important component of regional planning for applications ranging from urban fringe change detection to monitoring change detection of land use. This data are very useful for natural resources management.On the other hand, the technologies and methods of change detection also have evolved dramatically during past 20 years. So it has been well recognized that the change detection had become the best methods for researching dynamic change of land use by multi-temporal remotely-sensed data. The objective of this paper is to assess, evaluate and monitor land use change surrounding the area of Najran city, Kingdom of Saudi Arabia (KSA) using Landsat images (June 23, 2009) and ETM+ image(June. 21, 2014). The post-classification change detection technique was applied. At last,two-time subset images of Najran city are compared on a pixel-by-pixel basis using the post-classification comparison method and the from-to change matrix is produced, the land use change information obtained.Three classes were obtained, urban, bare land and agricultural land from unsupervised classification method by using Erdas Imagine and ArcGIS software. Accuracy assessment of classification has been performed before calculating change detection for study area. The obtained accuracy is between 61% to 87% percent for all the classes. Change detection analysis shows that rapid growth in urban area has been increased by 73.2%, the agricultural area has been decreased by 10.5 % and barren area reduced by 7% between 2009 and 2014. The quantitative study indicated that the area of urban class has unchanged by 58.2 km〗^2, gained 70.3 〖km〗^2 and lost 16 〖km〗^2. For bare land class 586.4〖km〗^2 has unchanged, 53.2〖km〗^2 has gained and 101.5〖km〗^2 has lost. While agriculture area class, 20.2〖km〗^2 has unchanged, 31.2〖km〗^2 has gained and 37.2〖km〗^2 has lost.

Keywords: land use, remote sensing, change detection, satellite images, image classification

Procedia PDF Downloads 506