Search results for: sequence data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24994

Search results for: sequence data

24904 The Effect of Ingredients Mixing Sequence in Rubber Compounding on the Formation of Bound Rubber and Cross-Link Density of Natural Rubber

Authors: Abu Hasan, Rochmadi, Hary Sulistyo, Suharto Honggokusumo

Abstract:

This research purpose is to study the effect of Ingredients mixing sequence in rubber compounding onto the formation of bound rubber and cross link density of natural rubber and also the relationship of bound rubber and cross link density. Analysis of bound rubber formation of rubber compound and cross link density of rubber vulcanizates were carried out on a natural rubber formula having masticated and mixing, followed by curing. There were four methods of mixing and each mixing process was followed by four mixing sequence methods of carbon black into the rubber. In the first method of mixing sequence, rubber was masticated for 5 min and then rubber chemicals and carbon black N 330 were added simultaneously. In the second one, rubber was masticated for 1 min and followed by addition of rubber chemicals and carbon black N 330 simultaneously using the different method of mixing then the first one. In the third one, carbon black N 660 was used for the same mixing procedure of the second one, and in the last one, rubber was masticated for 3 min, carbon black N 330 and rubber chemicals were added subsequently. The addition of rubber chemicals and carbon black into masticated rubber was distinguished by the sequence and time allocated for each mixing process. Carbon black was added into two stages. In the first stage, 10 phr was added first and the remaining 40 phr was added later along with oil. In the second one to the fourth one, the addition of carbon black in the first and the second stage was added in the phr ratio 20:30, 30:20, and 40:10. The results showed that the ingredients mixing process influenced bound rubber formation and cross link density. In the three methods of mixing, the bound rubber formation was proportional with crosslink density. In contrast in the fourth one, bound rubber formation and cross link density had contradictive relation. Regardless of the mixing method operated, bound rubber had non linear relationship with cross link density. The high cross link density was formed when low bound rubber formation. The cross link density became constant at high bound rubber content.

Keywords: bound-rubber, cross-link density, natural rubber, rubber mixing process

Procedia PDF Downloads 381
24903 Machine Learning-Enabled Classification of Climbing Using Small Data

Authors: Nicholas Milburn, Yu Liang, Dalei Wu

Abstract:

Athlete performance scoring within the climbing do-main presents interesting challenges as the sport does not have an objective way to assign skill. Assessing skill levels within any sport is valuable as it can be used to mark progress while training, and it can help an athlete choose appropriate climbs to attempt. Machine learning-based methods are popular for complex problems like this. The dataset available was composed of dynamic force data recorded during climbing; however, this dataset came with challenges such as data scarcity, imbalance, and it was temporally heterogeneous. Investigated solutions to these challenges include data augmentation, temporal normalization, conversion of time series to the spectral domain, and cross validation strategies. The investigated solutions to the classification problem included light weight machine classifiers KNN and SVM as well as the deep learning with CNN. The best performing model had an 80% accuracy. In conclusion, there seems to be enough information within climbing force data to accurately categorize climbers by skill.

Keywords: classification, climbing, data imbalance, data scarcity, machine learning, time sequence

Procedia PDF Downloads 120
24902 Speed Breaker/Pothole Detection Using Hidden Markov Models: A Deep Learning Approach

Authors: Surajit Chakrabarty, Piyush Chauhan, Subhasis Panda, Sujoy Bhattacharya

Abstract:

A large proportion of roads in India are not well maintained as per the laid down public safety guidelines leading to loss of direction control and fatal accidents. We propose a technique to detect speed breakers and potholes using mobile sensor data captured from multiple vehicles and provide a profile of the road. This would, in turn, help in monitoring roads and revolutionize digital maps. Incorporating randomness in the model formulation for detection of speed breakers and potholes is crucial due to substantial heterogeneity observed in data obtained using a mobile application from multiple vehicles driven by different drivers. This is accomplished with Hidden Markov Models, whose hidden state sequence is found for each time step given the observables sequence, and are then fed as input to LSTM network with peephole connections. A precision score of 0.96 and 0.63 is obtained for classifying bumps and potholes, respectively, a significant improvement from the machine learning based models. Further visualization of bumps/potholes is done by converting time series to images using Markov Transition Fields where a significant demarcation among bump/potholes is observed.

Keywords: deep learning, hidden Markov model, pothole, speed breaker

Procedia PDF Downloads 108
24901 An Analysis of Sequential Pattern Mining on Databases Using Approximate Sequential Patterns

Authors: J. Suneetha, Vijayalaxmi

Abstract:

Sequential Pattern Mining involves applying data mining methods to large data repositories to extract usage patterns. Sequential pattern mining methodologies used to analyze the data and identify patterns. The patterns have been used to implement efficient systems can recommend on previously observed patterns, in making predictions, improve usability of systems, detecting events, and in general help in making strategic product decisions. In this paper, identified performance of approximate sequential pattern mining defines as identifying patterns approximately shared with many sequences. Approximate sequential patterns can effectively summarize and represent the databases by identifying the underlying trends in the data. Conducting an extensive and systematic performance over synthetic and real data. The results demonstrate that ApproxMAP effective and scalable in mining large sequences databases with long patterns.

Keywords: multiple data, performance analysis, sequential pattern, sequence database scalability

Procedia PDF Downloads 307
24900 Molecular Cloning and Identification of a Double WAP Domain–Containing Protein 3 Gene from Chinese Mitten Crab Eriocheir sinensis

Authors: Fengmei Li, Li Xu, Guoliang Xia

Abstract:

Whey acidic proteins (WAP) domain-containing proteins in crustacean are involved in innate immune response against microbial invasion. In the present study, a novel double WAP domain (DWD)-containing protein gene 3 was identified from Chinese mitten crab Eriocheir sinensis (designated EsDWD3) by expressed sequence tag (EST) analysis and PCR techniques. The full-length cDNA of EsDWD3 was of 1223 bp, consisting of a 5′-terminal untranslated region (UTR) of 74 bp, a 3′ UTR of 727 bp with a polyadenylation signal sequence AATAAA and a polyA tail, and an open reading frame (ORF) of 423 bp. The ORF encoded a polypeptide of 140 amino acids with a signal peptide of 22 amino acids. The deduced protein sequence EsDWD3 showed 96.4 % amino acid similar to other reported EsDWD1 from E. sinensis, and phylogenetic tree analysis revealed that EsDWD3 had closer relationships with the reported two double WAP domain-containing proteins of E. sinensis species.

Keywords: Chinese mitten crab, Eriocheir sinensis, cloning, double WAP domain-containing protein

Procedia PDF Downloads 331
24899 Polymorphic Positions, Haplotypes, and Mutations Detected In The Mitochonderial DNA Coding Region By Sanger Sequence Technique

Authors: Imad H. Hameed, Mohammad A. Jebor, Ammera J. Omer

Abstract:

The aim of this research is to study the mitochonderial coding region by using the Sanger sequencing technique and establish the degree of variation characteristic of a fragment. FTA® Technology (FTA™ paper DNA extraction) utilized to extract DNA. Portion of coding region encompassing positions 11719 –12384 amplified in accordance with the Anderson reference sequence. PCR products purified by EZ-10 spin column then sequenced and Detected by using the ABI 3730xL DNA Analyzer. Five new polymorphic positions 11741, 11756, 11878, 11887 and 12133 are described may be suitable sources for identification purpose in future. The calculated value D= 0.95 and RMP=0.048 of the genetic diversity should be understood as high in the context of coding function of the analysed DNA fragment. The relatively high gene diversity and a relatively low random match probability were observed in Iraq population. The obtained data can be used to identify the variable nucleotide positions characterized by frequent occurrence which is most promising for various identifications.

Keywords: coding region, Iraq, mitochondrial DNA, polymorphic positions, sanger technique

Procedia PDF Downloads 406
24898 Computationally Efficient Stacking Sequence Blending for Composite Structures with a Large Number of Design Regions Using Cellular Automata

Authors: Ellen Van Den Oord, Julien Marie Jan Ferdinand Van Campen

Abstract:

This article introduces a computationally efficient method for stacking sequence blending of composite structures. The computational efficiency makes the presented method especially interesting for composite structures with a large number of design regions. Optimization of composite structures with an unequal load distribution may lead to locally optimized thicknesses and ply orientations that are incompatible with one another. Blending constraints can be enforced to achieve structural continuity. In literature, many methods can be found to implement structural continuity by means of stacking sequence blending in one way or another. The complexity of the problem makes the blending of a structure with a large number of adjacent design regions, and thus stacking sequences, prohibitive. In this work the local stacking sequence optimization is preconditioned using a method found in the literature that couples the mechanical behavior of the laminate, in the form of lamination parameters, to blending constraints, yielding near-optimal easy-to-blend designs. The preconditioned design is then fed to the scheme using cellular automata that have been developed by the authors. The method is applied to the benchmark 18-panel horseshoe blending problem to demonstrate its performance. The computational efficiency of the proposed method makes it especially suited for composite structures with a large number of design regions.

Keywords: composite, blending, optimization, lamination parameters

Procedia PDF Downloads 194
24897 Cloning and Analysis of Nile Tilapia Toll-like receptors Type-3 mRNA

Authors: Abdelazeem Algammal, Reham Abouelmaatti, Xiaokun Li, Jisheng Ma, Eman Abdelnaby, Wael Elfeil

Abstract:

Toll-like receptors (TLRs) are the best understood of the innate immune receptors that detect infections in vertebrates. However, the fish TLRs also exhibit very distinct features and a large diversity, which is likely derived from their diverse evolutionary history and the distinct environments that they occupy. Little is known about the fish immune system structure. Our work was aimed to identify and clone the Nile tilapiaTLR-3 as a model of freshwater fish species; we cloned the full-length cDNA sequence of Nile tilapia (Oreochromis niloticus) TLR-3 and according to our knowledge, it is the first report illustrating tilapia TLR-3. The complete cDNA sequence of Nile tilapia TLR-3 was 2736 pair base and it encodes a polypeptide of 912 amino acids. Analysis of the deduced amino acid sequence indicated that Nile tilapia TLR-3 has typical structural features and main components of proteins belonging to the TLR family. Our results illustrate a complete and functional Nile tilapia TLR-3 and it is considered an ortholog of the other vertebrate’s receptor.

Keywords: Nile tilapia, TLR-3, cloning, gene expression

Procedia PDF Downloads 113
24896 Exploring MPI-Based Parallel Computing in Analyzing Very Large Sequences

Authors: Bilal Wajid, Erchin Serpedin

Abstract:

The health industry is aiming towards personalized medicine. If the patient’s genome needs to be sequenced it is important that the entire analysis be completed quickly. This paper explores use of parallel computing to analyze very large sequences. Two cases have been considered. In the first case, the sequence is kept constant and the effect of increasing the number of MPI-based processes is evaluated in terms of execution time, speed and efficiency. In the second case the number of MPI-based processes have been kept constant whereas, the length of the sequence was increased.

Keywords: parallel computing, alignment, genome assembly, alignment

Procedia PDF Downloads 241
24895 Identifying Protein-Coding and Non-Coding Regions in Transcriptomes

Authors: Angela U. Makolo

Abstract:

Protein-coding and Non-coding regions determine the biology of a sequenced transcriptome. Research advances have shown that Non-coding regions are important in disease progression and clinical diagnosis. Existing bioinformatics tools have been targeted towards Protein-coding regions alone. Therefore, there are challenges associated with gaining biological insights from transcriptome sequence data. These tools are also limited to computationally intensive sequence alignment, which is inadequate and less accurate to identify both Protein-coding and Non-coding regions. Alignment-free techniques can overcome the limitation of identifying both regions. Therefore, this study was designed to develop an efficient sequence alignment-free model for identifying both Protein-coding and Non-coding regions in sequenced transcriptomes. Feature grouping and randomization procedures were applied to the input transcriptomes (37,503 data points). Successive iterations were carried out to compute the gradient vector that converged the developed Protein-coding and Non-coding Region Identifier (PNRI) model to the approximate coefficient vector. The logistic regression algorithm was used with a sigmoid activation function. A parameter vector was estimated for every sample in 37,503 data points in a bid to reduce the generalization error and cost. Maximum Likelihood Estimation (MLE) was used for parameter estimation by taking the log-likelihood of six features and combining them into a summation function. Dynamic thresholding was used to classify the Protein-coding and Non-coding regions, and the Receiver Operating Characteristic (ROC) curve was determined. The generalization performance of PNRI was determined in terms of F1 score, accuracy, sensitivity, and specificity. The average generalization performance of PNRI was determined using a benchmark of multi-species organisms. The generalization error for identifying Protein-coding and Non-coding regions decreased from 0.514 to 0.508 and to 0.378, respectively, after three iterations. The cost (difference between the predicted and the actual outcome) also decreased from 1.446 to 0.842 and to 0.718, respectively, for the first, second and third iterations. The iterations terminated at the 390th epoch, having an error of 0.036 and a cost of 0.316. The computed elements of the parameter vector that maximized the objective function were 0.043, 0.519, 0.715, 0.878, 1.157, and 2.575. The PNRI gave an ROC of 0.97, indicating an improved predictive ability. The PNRI identified both Protein-coding and Non-coding regions with an F1 score of 0.970, accuracy (0.969), sensitivity (0.966), and specificity of 0.973. Using 13 non-human multi-species model organisms, the average generalization performance of the traditional method was 74.4%, while that of the developed model was 85.2%, thereby making the developed model better in the identification of Protein-coding and Non-coding regions in transcriptomes. The developed Protein-coding and Non-coding region identifier model efficiently identified the Protein-coding and Non-coding transcriptomic regions. It could be used in genome annotation and in the analysis of transcriptomes.

Keywords: sequence alignment-free model, dynamic thresholding classification, input randomization, genome annotation

Procedia PDF Downloads 35
24894 Tectonic Complexity: Out-of-Sequence Thrusting in the Higher Himalaya of Jhakri-Sarahan region, Himachal Pradesh, India

Authors: Rajkumar Ghosh

Abstract:

The study focuses on the tectonics of out-of-sequence thrusting (OOST) in the NW region of the Himalaya, particularly in Himachal Pradesh. The research aims to identify the features and nature of OOST in the field and the associated rock types and lithological boundaries in the field of NW Himalaya, Himachal Pradesh, India. The research employs fieldwork and micro-structure observations, correlations, and analyses to identify and analyze the OOST features and associated rock types. The study reveals the presence of three OOSTs, namely Jhakri Thrust (JT), Sarahan Thrust (ST), and Chaura Thrust (CT), which consist of several branches, some of which are still active. The thrust system exhibits varying internal geometry, including box folds, boudins, scar folds, crenulation cleavages, kink folds, and tension gashes. The CT, which is concealed beneath Jutogh Thrust sheet, represents a steepened downward thrust, while the JT has a western dip and is south-westward verging. The research provides crucial information on the tectonics of OOST in the NW region of the Himalaya, particularly in Himachal Pradesh, which is crucial in understanding the regional geological evolution and associated hazards. The data were collected through fieldwork and micro-structure observations, correlations, and analyses of rock samples. The data were analyzed using tectonic and geochronological techniques to identify the nature and characteristics of OOST. The research addressed the question of identifying Higher Himalayan OOST in the field of NW Himalaya, Himachal Pradesh, India, and the associated rock types and lithological boundaries. The study concludes that there is minimal documentation and a lack of suitable exposure of rocks to generalize the features of OOST in the field in NW Higher Himalaya, Himachal Pradesh. The study recommends more extensive mapping and fieldwork to improve understanding of OOST in the region.

Keywords: out-of-sequence thrust (OOST), main central thrust (MCT), jhakri thrust (JT), sarahan thrust (ST), chaura thrust (CT), higher himalaya (HH)

Procedia PDF Downloads 62
24893 Genomic Sequence Representation Learning: An Analysis of K-Mer Vector Embedding Dimensionality

Authors: James Jr. Mashiyane, Risuna Nkolele, Stephanie J. Müller, Gciniwe S. Dlamini, Rebone L. Meraba, Darlington S. Mapiye

Abstract:

When performing language tasks in natural language processing (NLP), the dimensionality of word embeddings is chosen either ad-hoc or is calculated by optimizing the Pairwise Inner Product (PIP) loss. The PIP loss is a metric that measures the dissimilarity between word embeddings, and it is obtained through matrix perturbation theory by utilizing the unitary invariance of word embeddings. Unlike in natural language, in genomics, especially in genome sequence processing, unlike in natural language processing, there is no notion of a “word,” but rather, there are sequence substrings of length k called k-mers. K-mers sizes matter, and they vary depending on the goal of the task at hand. The dimensionality of word embeddings in NLP has been studied using the matrix perturbation theory and the PIP loss. In this paper, the sufficiency and reliability of applying word-embedding algorithms to various genomic sequence datasets are investigated to understand the relationship between the k-mer size and their embedding dimension. This is completed by studying the scaling capability of three embedding algorithms, namely Latent Semantic analysis (LSA), Word2Vec, and Global Vectors (GloVe), with respect to the k-mer size. Utilising the PIP loss as a metric to train embeddings on different datasets, we also show that Word2Vec outperforms LSA and GloVe in accurate computing embeddings as both the k-mer size and vocabulary increase. Finally, the shortcomings of natural language processing embedding algorithms in performing genomic tasks are discussed.

Keywords: word embeddings, k-mer embedding, dimensionality reduction

Procedia PDF Downloads 107
24892 Lambda-Levelwise Statistical Convergence of a Sequence of Fuzzy Numbers

Authors: F. Berna Benli, Özgür Keskin

Abstract:

Lately, many mathematicians have been studied the statistical convergence of a sequence of fuzzy numbers. We know that Lambda-statistically convergence is a kind of convergence between ordinary convergence and statistical convergence. In this paper, we will introduce the new kind of convergence such as λ-levelwise statistical convergence. Then, we will define the concept of the λ-levelwise statistical cluster and limit points of a sequence of fuzzy numbers. Also, we will discuss the relations between the sets of λ-levelwise statistical cluster points and λ-levelwise statistical limit points of sequences of fuzzy numbers. This work has been extended in this paper, where some relations have been considered such that when lambda-statistical limit inferior and lambda-statistical limit superior for lambda-statistically convergent sequences of fuzzy numbers are equal. Furthermore, lambda-statistical boundedness condition for different sequences of fuzzy numbers has been studied.

Keywords: fuzzy number, λ-levelwise statistical cluster points, λ-levelwise statistical convergence, λ-levelwise statistical limit points, λ-statistical cluster points, λ-statistical convergence, λ-statistical limit points

Procedia PDF Downloads 439
24891 Dwindling the Stability of DNA Sequence by Base Substitution at Intersection of COMT and MIR4761 Gene

Authors: Srishty Gulati, Anju Singh, Shrikant Kukreti

Abstract:

The manifestation of structural polymorphism in DNA depends on the sequence and surrounding environment. Ample of folded DNA structures have been found in the cellular system out of which DNA hairpins are very common, however, are indispensable due to their role in the replication initiation sites, recombination, transcription regulation, and protein recognition. We enumerate this approach in our study, where the two base substitutions and change in temperature embark destabilization of DNA structure and misbalance the equilibrium between two structures of a sequence present at the overlapping region of the human COMT gene and MIR4761 gene. COMT and MIR4761 gene encodes for catechol-O-methyltransferase (COMT) enzyme and microRNAs (miRNAs), respectively. Environmental changes and errors during cell division lead to genetic abnormalities. The COMT gene entailed in dopamine regulation fosters neurological diseases like Parkinson's disease, schizophrenia, velocardiofacial syndrome, etc. A 19-mer deoxyoligonucleotide sequence 5'-AGGACAAGGTGTGCATGCC-3' (COMT19) is located at exon-4 on chromosome 22 and band q11.2 at the intersection of COMT and MIR4761 gene. Bioinformatics studies suggest that this sequence is conserved in humans and few other organisms and is involved in recognition of transcription factors in the vicinity of 3'-end. Non-denaturating gel electrophoresis and CD spectroscopy of COMT sequences indicate the formation of hairpin type DNA structures. Temperature-dependent CD studies revealed an unusual shift in the slipped DNA-Hairpin DNA equilibrium with the change in temperature. Also, UV-thermal melting techniques suggest that the two base substitutions on the complementary strand of COMT19 did not affect the structure but reduces the stability of duplex. This study gives insight about the possibility of existing structurally polymorphic transient states within DNA segments present at the intersection of COMT and MIR4761 gene.

Keywords: base-substitution, catechol-o-methyltransferase (COMT), hairpin-DNA, structural polymorphism

Procedia PDF Downloads 99
24890 In-Vitro Dextran Synthesis and Characterization of an Intracellular Glucosyltransferase from Leuconostoc Mesenteroides AA1

Authors: Afsheen Aman, Shah Ali Ul Qader

Abstract:

Dextransucrase [EC 2.4.1.5] is a glucosyltransferase that catalysis the biosynthesis of a natural biopolymer called dextran. It can catalyze the transfer of D-glucopyranosyl residues from sucrose to the main chain of dextran. This unique biopolymer has multiple applications in several industries and the key utilization of dextran lies on its molecular weight and the type of branching. Extracellular dextransucrase from Leuconostoc mesenteroides is most extensively studied and characterized. Limited data is available regarding cell-bound or intracellular dextransucrase and on the characterization of dextran produced by in-vitro reaction of intracellular dextransucrase. L. mesenteroides AA1 is reported to produce extracellular dextransucrase that catalyzes biosynthesis of a high molecular weight dextran with only α-(1→6) linkage. Current study deals with the characterization of an intracellular dextransucrase and in vitro biosynthesis of low molecular weight dextran from L. mesenteroides AA1. Intracellular dextransucrase was extracted from cytoplasm and purified to homogeneity for characterization. Kinetic constants, molecular weight and N-terminal sequence analysis of intracellular dextransucrase reveal unique variation with previously reported extracellular dextransucrase from the same strain. In vitro synthesized biopolymer was characterized using NMR spectroscopic techniques. Intracellular dextransucrase exhibited Vmax and Km values of 130.8 DSU ml-1 hr-1 and 221.3 mM, respectively. Optimum catalytic activity was detected at 35°C in 0.15 M citrate phosphate buffer (pH-5.5) in 05 minutes. Molecular mass of purified intracellular dextransucrase is approximately 220.0 kDa on SDS-PAGE. N-terminal sequence of the intracellular enzyme is: GLPGYFGVN that showed no homology with previously reported sequence for the extracellular dextransucrase. This intracellular dextransucrase is capable of in vitro synthesis of dextran under specific conditions. This intracellular dextransucrase is capable of in vitro synthesis of dextran under specific conditions and this biopolymer can be hydrolyzed into different molecular weight fractions for various applications.

Keywords: characterization, dextran, dextransucrase, leuconostoc mesenteroides

Procedia PDF Downloads 368
24889 Language Shapes Thought: An Experimental Study on English and Mandarin Native Speakers' Sequencing of Size

Authors: Hsi Wei

Abstract:

Does the language we speak affect the way we think? This question has been discussed for a long time from different aspects. In this article, the issue is examined with an experiment on how speakers of different languages tend to do different sequencing when it comes to the size of general objects. An essential difference between the usage of English and Mandarin is the way we sequence the size of places or objects. In English, when describing the location of something we may say, for example, ‘The pen is inside the trashcan next to the tree at the park.’ In Mandarin, however, we would say, ‘The pen is at the park next to the tree inside the trashcan.’ It’s clear that generally English use the sequence of small to big while Mandarin the opposite. Therefore, the experiment was conducted to test if the difference of the languages affects the speakers’ ability to do the different sequencing. There were two groups of subjects; one consisted of English native speakers, another of Mandarin native speakers. Within the experiment, three nouns were showed as a group to the subjects as their native languages. Before they saw the nouns, they would first get an instruction of ‘big to small’, ‘small to big’, or ‘repeat’. Therefore, the subjects had to sequence the following group of nouns as the instruction they get or simply repeat the nouns. After completing every sequencing and repetition in their minds, they pushed a button as reaction. The repetition design was to gather the mere reading time of the person. As the result of the experiment showed, English native speakers reacted more quickly to the sequencing of ‘small to big’; on the other hand, Mandarin native speakers reacted more quickly to the sequence ‘big to small’. To conclude, this study may be of importance as a support for linguistic relativism that the language we speak do shape the way we think.

Keywords: language, linguistic relativism, size, sequencing

Procedia PDF Downloads 253
24888 Towards End-To-End Disease Prediction from Raw Metagenomic Data

Authors: Maxence Queyrel, Edi Prifti, Alexandre Templier, Jean-Daniel Zucker

Abstract:

Analysis of the human microbiome using metagenomic sequencing data has demonstrated high ability in discriminating various human diseases. Raw metagenomic sequencing data require multiple complex and computationally heavy bioinformatics steps prior to data analysis. Such data contain millions of short sequences read from the fragmented DNA sequences and stored as fastq files. Conventional processing pipelines consist in multiple steps including quality control, filtering, alignment of sequences against genomic catalogs (genes, species, taxonomic levels, functional pathways, etc.). These pipelines are complex to use, time consuming and rely on a large number of parameters that often provide variability and impact the estimation of the microbiome elements. Training Deep Neural Networks directly from raw sequencing data is a promising approach to bypass some of the challenges associated with mainstream bioinformatics pipelines. Most of these methods use the concept of word and sentence embeddings that create a meaningful and numerical representation of DNA sequences, while extracting features and reducing the dimensionality of the data. In this paper we present an end-to-end approach that classifies patients into disease groups directly from raw metagenomic reads: metagenome2vec. This approach is composed of four steps (i) generating a vocabulary of k-mers and learning their numerical embeddings; (ii) learning DNA sequence (read) embeddings; (iii) identifying the genome from which the sequence is most likely to come and (iv) training a multiple instance learning classifier which predicts the phenotype based on the vector representation of the raw data. An attention mechanism is applied in the network so that the model can be interpreted, assigning a weight to the influence of the prediction for each genome. Using two public real-life data-sets as well a simulated one, we demonstrated that this original approach reaches high performance, comparable with the state-of-the-art methods applied directly on processed data though mainstream bioinformatics workflows. These results are encouraging for this proof of concept work. We believe that with further dedication, the DNN models have the potential to surpass mainstream bioinformatics workflows in disease classification tasks.

Keywords: deep learning, disease prediction, end-to-end machine learning, metagenomics, multiple instance learning, precision medicine

Procedia PDF Downloads 99
24887 Data Augmentation for Automatic Graphical User Interface Generation Based on Generative Adversarial Network

Authors: Xulu Yao, Moi Hoon Yap, Yanlong Zhang

Abstract:

As a branch of artificial neural network, deep learning is widely used in the field of image recognition, but the lack of its dataset leads to imperfect model learning. By analysing the data scale requirements of deep learning and aiming at the application in GUI generation, it is found that the collection of GUI dataset is a time-consuming and labor-consuming project, which is difficult to meet the needs of current deep learning network. To solve this problem, this paper proposes a semi-supervised deep learning model that relies on the original small-scale datasets to produce a large number of reliable data sets. By combining the cyclic neural network with the generated countermeasure network, the cyclic neural network can learn the sequence relationship and characteristics of data, make the generated countermeasure network generate reasonable data, and then expand the Rico dataset. Relying on the network structure, the characteristics of collected data can be well analysed, and a large number of reasonable data can be generated according to these characteristics. After data processing, a reliable dataset for model training can be formed, which alleviates the problem of dataset shortage in deep learning.

Keywords: GUI, deep learning, GAN, data augmentation

Procedia PDF Downloads 152
24886 Evaluating Generative Neural Attention Weights-Based Chatbot on Customer Support Twitter Dataset

Authors: Sinarwati Mohamad Suhaili, Naomie Salim, Mohamad Nazim Jambli

Abstract:

Sequence-to-sequence (seq2seq) models augmented with attention mechanisms are playing an increasingly important role in automated customer service. These models, which are able to recognize complex relationships between input and output sequences, are crucial for optimizing chatbot responses. Central to these mechanisms are neural attention weights that determine the focus of the model during sequence generation. Despite their widespread use, there remains a gap in the comparative analysis of different attention weighting functions within seq2seq models, particularly in the domain of chatbots using the Customer Support Twitter (CST) dataset. This study addresses this gap by evaluating four distinct attention-scoring functions—dot, multiplicative/general, additive, and an extended multiplicative function with a tanh activation parameter — in neural generative seq2seq models. Utilizing the CST dataset, these models were trained and evaluated over 10 epochs with the AdamW optimizer. Evaluation criteria included validation loss and BLEU scores implemented under both greedy and beam search strategies with a beam size of k=3. Results indicate that the model with the tanh-augmented multiplicative function significantly outperforms its counterparts, achieving the lowest validation loss (1.136484) and the highest BLEU scores (0.438926 under greedy search, 0.443000 under beam search, k=3). These results emphasize the crucial influence of selecting an appropriate attention-scoring function in improving the performance of seq2seq models for chatbots. Particularly, the model that integrates tanh activation proves to be a promising approach to improve the quality of chatbots in the customer support context.

Keywords: attention weight, chatbot, encoder-decoder, neural generative attention, score function, sequence-to-sequence

Procedia PDF Downloads 53
24885 Chinese Sentence Level Lip Recognition

Authors: Peng Wang, Tigang Jiang

Abstract:

The computer based lip reading method of different languages cannot be universal. At present, for the research of Chinese lip reading, whether the work on data sets or recognition algorithms, is far from mature. In this paper, we study the Chinese lipreading method based on machine learning, and propose a Chinese Sentence-level lip-reading network (CNLipNet) model which consists of spatio-temporal convolutional neural network(CNN), recurrent neural network(RNN) and Connectionist Temporal Classification (CTC) loss function. This model can map variable-length sequence of video frames to Chinese Pinyin sequence and is trained end-to-end. More over, We create CNLRS, a Chinese Lipreading Dataset, which contains 5948 samples and can be shared through github. The evaluation of CNLipNet on this dataset yielded a 41% word correct rate and a 70.6% character correct rate. This evaluation result is far superior to the professional human lip readers, indicating that CNLipNet performs well in lipreading.

Keywords: lipreading, machine learning, spatio-temporal, convolutional neural network, recurrent neural network

Procedia PDF Downloads 101
24884 Phylogenetic Analysis of Klebsiella Species from Clinical Specimens from Nelson Mandela Academic Hospital in Mthatha, South Africa

Authors: Sandeep Vasaikar, Lary Obi

Abstract:

Rapid and discriminative genotyping methods are useful for determining the clonality of the isolates in nosocomial or household outbreaks. Multilocus sequence typing (MLST) is a nucleotide sequence-based approach for characterising bacterial isolates. The genetic diversity and the clinical relevance of the drug-resistant Klebsiella isolates from Mthatha are largely unknown. For this reason, prospective, experimental study of the molecular epidemiology of Klebsiella isolates from patients being treated in Mthatha over a three-year period was analysed. Methodology: PCR amplification and sequencing of the drug-resistance-associated genes, and multilocus sequence typing (MLST) using 7 housekeeping genes mdh, pgi, infB, FusAR, phoE, gapA and rpoB were conducted. A total of 32 isolates were analysed. Results: The percentages of multidrug-resistant (MDR), extensively drug-resistance (XDR) and pandrug-resistant (PDR) isolates were; MDR 65.6 % (21) and XDR and PDR with 0 % each. In this study, K. pneumoniae was 19/32 (59.4 %). MLST results showed 22 sequence types (STs) were identified, which were further separated by Maximum Parsimony into 10 clonal complexes and 12 singletons. The most dominant group was Klebsiella pneumoniae with 23/32 (71.8 %) isolates, Klebsiella oxytoca as a second group with 2/32 (6.25 %) isolates, and a single (3.1 %) K. varricola as a third group while 6 isolates were of unknown sequences. Conclusions/significance: A phylogenetic analysis of the concatenated sequences of the 7 housekeeping genes showed that strains of K. pneumoniae form a distinct lineage within the genus Klebsiella, with K. oxytoca and K. varricola its nearest phylogenetic neighbours. With the analysis of 7 genes were determined 1 K. variicola, which was mistakenly identified as K. pneumoniae by phenotypic methods. Two misidentifications of K. oxytoca were found when phenotypic methods were used. No significant differences were observed between ESBL blaCTX-M, blaTEM and blaSHV groups in the distribution of Sequence types (STs) or Clonal complexes (CCs).

Keywords: phylogenetic analysis, phylogeny, klebsiella phylogenetic, klebsiella

Procedia PDF Downloads 334
24883 Non-Population Search Algorithms for Capacitated Material Requirement Planning in Multi-Stage Assembly Flow Shop with Alternative Machines

Authors: Watcharapan Sukkerd, Teeradej Wuttipornpun

Abstract:

This paper aims to present non-population search algorithms called tabu search (TS), simulated annealing (SA) and variable neighborhood search (VNS) to minimize the total cost of capacitated MRP problem in multi-stage assembly flow shop with two alternative machines. There are three main steps for the algorithm. Firstly, an initial sequence of orders is constructed by a simple due date-based dispatching rule. Secondly, the sequence of orders is repeatedly improved to reduce the total cost by applying TS, SA and VNS separately. Finally, the total cost is further reduced by optimizing the start time of each operation using the linear programming (LP) model. Parameters of the algorithm are tuned by using real data from automotive companies. The result shows that VNS significantly outperforms TS, SA and the existing algorithm.

Keywords: capacitated MRP, tabu search, simulated annealing, variable neighborhood search, linear programming, assembly flow shop, application in industry

Procedia PDF Downloads 214
24882 Hidden Markov Model for the Simulation Study of Neural States and Intentionality

Authors: R. B. Mishra

Abstract:

Hidden Markov Model (HMM) has been used in prediction and determination of states that generate different neural activations as well as mental working conditions. This paper addresses two applications of HMM; one to determine the optimal sequence of states for two neural states: Active (AC) and Inactive (IA) for the three emission (observations) which are for No Working (NW), Waiting (WT) and Working (W) conditions of human beings. Another is for the determination of optimal sequence of intentionality i.e. Believe (B), Desire (D), and Intention (I) as the states and three observational sequences: NW, WT and W. The computational results are encouraging and useful.

Keywords: hiden markov model, believe desire intention, neural activation, simulation

Procedia PDF Downloads 351
24881 Establishing Sequence Stratigraphic Framework and Hydrocarbon Potential of the Late Cretaceous Strata: A Case Study from Central Indus Basin, Pakistan

Authors: Bilal Wadood, Suleman Khan, Sajjad Ahmed

Abstract:

The Late Cretaceous strata (Mughal Kot Formation) exposed in Central Indus Basin, Pakistan is evaluated for establishing sequence stratigraphic framework and potential of hydrocarbon accumulation. The petrographic studies and SEM analysis were carried out to infer the hydrocarbon potential of the rock unit. The petrographic details disclosed 4 microfacies including Pelagic Mudstone, OrbitoidalWackestone, Quartz Arenite, and Quartz Wacke. The lowermost part of the rock unit consists of OrbitoidalWackestone which shows deposition in the middle shelf environment. The Quartz Arenite and Quartz Wacke suggest deposition on the deep slope settings while the Pelagic Mudstone microfacies point toward deposition in the distal deep marine settings. Based on the facies stacking patterns and cyclicity in the chronostratigraphic context, the strata is divided into two 3rd order cycles. One complete sequence i.e Transgressive system tract (TST), Highstand system tract (HST) and Lowstand system tract (LST) are again replaced by another Transgressive system tract and Highstant system tract with no markers of sequence boundary. The LST sands are sandwiched between TST and HST shales but no potential porosity/permeability values have been determined. Microfacies and SEM studies revealed very fewer chances for hydrocarbon accumulation and overall reservoir potential is characterized as low.

Keywords: cycle, deposition, microfacies, reservoir

Procedia PDF Downloads 120
24880 A Novel Approach to Asynchronous State Machine Modeling on Multisim for Avoiding Function Hazards

Authors: Parisi L., Hamili D., Azlan N.

Abstract:

The aim of this study was to design and simulate a particular type of Asynchronous State Machine (ASM), namely a ‘traffic light controller’ (TLC), operated at a frequency of 0.5 Hz. The design task involved two main stages: firstly, designing a 4-bit binary counter using J-K flip flops as the timing signal and subsequently, attaining the digital logic by deploying ASM design process. The TLC was designed such that it showed a sequence of three different colours, i.e. red, yellow and green, corresponding to set thresholds by deploying the least number of AND, OR and NOT gates possible. The software Multisim was deployed to design such circuit and simulate it for circuit troubleshooting in order for it to display the output sequence of the three different colours on the traffic light in the correct order. A clock signal, an asynchronous 4-bit binary counter that was designed through the use of J-K flip flops along with an ASM were used to complete this sequence, which was programmed to be repeated indefinitely. Eventually, the circuit was debugged and optimized, thus displaying the correct waveforms of the three outputs through the logic analyzer. However, hazards occurred when the frequency was increased to 10 MHz. This was attributed to delays in the feedback being too high.

Keywords: asynchronous state machine, traffic light controller, circuit design, digital electronics

Procedia PDF Downloads 404
24879 Structure, Bioinformatics Analysis and Substrate Specificity of a 6-Phospho-β-Glucosidase Glycoside Hydrolase 1 Enzyme from Bacillus licheniformis

Authors: Wayde Veldman, Ozlem T. Bishop, Igor Polikarpov

Abstract:

In bacteria, mono and disaccharides are phosphorylated during uptake into the cell via the widely used phosphoenolpyruvate (PEP)-dependent phosphotransferase transport system. As an initial step in the phosphorylated disaccharide metabolism pathway, certain glycoside hydrolase family 1 (GH1) enzymes play a crucial role in releasing phosphorylated and non-phosphorylated monosaccharides. However, structural determinants for the specificity of these enzymes still need to be clarified. GH1 enzymes are known to have a wide array of functions. According to the CAZy database, there are twenty-one different enzymatic activities in the GH1 family. Here, the structure and substrate specificity of a GH1 enzyme from Bacillus licheniformis, hereafter known as BlBglH, was investigated. The sequence of the enzyme BlBglH was compared to the sequences of other characterized GH1 enzymes using sequence alignment, sequence identity calculations, phylogenetic analysis, and motif discovery. Through these various analyses, BlBglH was found to have sequence features characteristic of the 6-phospho-β-glucosidase activity enzymes. Additionally, motif and structure comparisons of the three most commonly studied GH1 enzyme-activities revealed a shared loop amongst the different structures that consist of different sequence motifs – this loop is thought to guide specific substrates (depending on activity) towards the active-site. To further affirm BlBglH enzyme activity, molecular docking and molecular dynamics simulations were performed. Docking was carried out using 6-phospho-β-glucosidase enzyme-activity positive (p-Nitrophenyl-beta-D-glucoside-6-phosphate) and negative (p-Nitrophenyl-beta-D-galactoside-6-phosphate) control ligands, followed by 400 ns molecular dynamics simulations. The positive-control ligand maintained favourable interactions within the active site until the end of the simulation. The negative-control ligand was observed exiting the enzyme at 287 ns. Binding free energy calculations showed that the positive-control complex had a substantially more favourable binding energy compared to the negative-control complex. Jointly, the findings of this study suggest that the BlBglH enzyme possesses 6-phospho-β-glucosidase enzymatic activity.

Keywords: 6-P-β-glucosidase, glycoside hydrolase 1, molecular dynamics, sequence analysis, substrate specificity

Procedia PDF Downloads 107
24878 Wayfinding Strategies in an Unfamiliar Homogenous Environment

Authors: Ahemd Sameer, Braj Bhushan

Abstract:

The objective of our study was to compare wayfinding strategies to remember route while navigation in an unfamiliar homogenous environment. Two videos developed using free ware Trimble Sketchup© each having nine identical turns (3 right, 3 left, 3 straight) with no distinguishing feature at any turn. Thirt-two male post-graduate students of IIT Kanpur participated in the study. The experiment was conducted in three phases. In the first phase participant generated a list of personally known items to be used as landmarks. In the second phase participant saw the first video and was required to remember the sequence of turns. In the second video participant was required to imagine a landmark from the list generated in the first phase at each turn and associate the turn with it. In both the task the participant was asked to recall the sequence of turns as it appeared in the video. In the third phase, which was 20 minutes after the second phase, participants again recalled the sequence of turns. Results showed that performance in the first condition i.e. without use of landmarks was better than imaginary landmark condition. The difference, however, became significant when the participant were tested again about 30 minutes later though performance was still better in no-landmark condition. The finding is surprising given the past research in memory and is explained in terms of cognitive factors such as mental workload.

Keywords: Wayfinding, Landmark, Homogenous Environment, Memory

Procedia PDF Downloads 428
24877 Geoelectrical Investigation Around Bomo Area, Kaduna State, Nigeria

Authors: B. S. Jatau, Baba Adama, S. I. Fadele

Abstract:

Electrical resistivity investigation was carried out around Bomo area, Zaria, Kaduna state in order to study the subsurface geologic layer with a view of determining the depth to the bedrock and thickness of the geologic layers. Vertical Electrical Sounding (VES) using Schlumberger array was carried out at fifteen (15) VES stations. ABEM terrameter (SAS 300) was used for the data acquisition. The field data obtained have been analyzed using computer software (IPI2win) which gives an automatic interpretation of the apparent resistivity. The VES results revealed heterogeneous nature of the subsurface geological sequence. The geologic sequence beneath the study area is composed of hard pan top soil (clayey and sandy-lateritic), weathered layer, partly weathered or fractured basement and fresh basement. The resistivity value for the topsoil layer varies from 40Ωm to 450Ωm with thickness ranging from 1.25 to 7.5 m. The weathered basement has resistivity values ranging from 50Ωm to 593Ωm and thickness between 1.37 and 20.1 m. The fractured basement has resistivity values ranging from 218Ωm to 520Ωm and thickness of between 12.9 and 26.3 m. The fresh basement (bedrock) has resistivity values ranging from 1215Ωm to 2150Ωm with infinite depth. However, the depth of the earth’s surface to the bedrock surface varies between 2.63 and 34.99 m. The study further stressed the importance of the findings in civil engineering structures and groundwater prospecting.

Keywords: electrical resistivity, CERT (CT), vertical electrical sounding (VES), top soil (TP), weathered basement (WB), partly weathered basement (PWB), fresh basement (FB)

Procedia PDF Downloads 307
24876 DeepOmics: Deep Learning for Understanding Genome Functioning and the Underlying Genetic Causes of Disease

Authors: Vishnu Pratap Singh Kirar, Madhuri Saxena

Abstract:

Advancement in sequence data generation technologies is churning out voluminous omics data and posing a massive challenge to annotate the biological functional features. With so much data available, the use of machine learning methods and tools to make novel inferences has become obvious. Machine learning methods have been successfully applied to a lot of disciplines, including computational biology and bioinformatics. Researchers in computational biology are interested to develop novel machine learning frameworks to classify the huge amounts of biological data. In this proposal, it plan to employ novel machine learning approaches to aid the understanding of how apparently innocuous mutations (in intergenic DNA and at synonymous sites) cause diseases. We are also interested in discovering novel functional sites in the genome and mutations in which can affect a phenotype of interest.

Keywords: genome wide association studies (GWAS), next generation sequencing (NGS), deep learning, omics

Procedia PDF Downloads 65
24875 Fatigue-Induced Debonding Propagation in FM300 Adhesive

Authors: Reza Hedayati, Meysam Jahanbakhshi

Abstract:

Fracture Mechanics is used to predict debonding propagation in adhesive joint between aluminum and composite plates. Three types of loadings and two types of glass-epoxy composite sequences: [0/90]2s and [0/45/-45/90]s are considered for the composite plate and their results are compared. It was seen that generally the cases with stacking sequence of [0/45/-45/90]s have much shorter lives than cases with [0/90]2s. It was also seen that in cases with λ=0 the ends of the debonding front propagates forward more than its middle, while in cases with λ=0.5 or λ=1 it is vice versa. Moreover, regardless of value of λ, the difference between the debonding propagations of the ends and the middle of the debonding front is very close in cases λ=0.5 and λ=1. Another main conclusion was the non-dimensionalized debonding front profile is almost independent of sequence type or the applied load value.

Keywords: adhesive joint, debonding, fracture, LEFM, APDL

Procedia PDF Downloads 334