Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 145

Search results for: bioinformatics

145 The Development and Provision of a Knowledge Management Ecosystem, Optimized for Genomics

Authors: Matthew I. Bellgard

Abstract:

The field of bioinformatics has made, and continues to make, substantial progress and contributions to life science research and development. However, this paper contends that a systems approach integrates bioinformatics activities for any project in a defined manner. The application of critical control points in this bioinformatics systems approach may be useful to identify and evaluate points in a pathway where specified activity risk can be reduced, monitored and quality enhanced.

Keywords: bioinformatics, food security, personalized medicine, systems approach

Procedia PDF Downloads 297
144 Meta-Learning for Hierarchical Classification and Applications in Bioinformatics

Authors: Fabio Fabris, Alex A. Freitas

Abstract:

Hierarchical classification is a special type of classification task where the class labels are organised into a hierarchy, with more generic class labels being ancestors of more specific ones. Meta-learning for classification-algorithm recommendation consists of recommending to the user a classification algorithm, from a pool of candidate algorithms, for a dataset, based on the past performance of the candidate algorithms in other datasets. Meta-learning is normally used in conventional, non-hierarchical classification. By contrast, this paper proposes a meta-learning approach for more challenging task of hierarchical classification, and evaluates it in a large number of bioinformatics datasets. Hierarchical classification is especially relevant for bioinformatics problems, as protein and gene functions tend to be organised into a hierarchy of class labels. This work proposes meta-learning approach for recommending the best hierarchical classification algorithm to a hierarchical classification dataset. This work’s contributions are: 1) proposing an algorithm for splitting hierarchical datasets into new datasets to increase the number of meta-instances, 2) proposing meta-features for hierarchical classification, and 3) interpreting decision-tree meta-models for hierarchical classification algorithm recommendation.

Keywords: algorithm recommendation, meta-learning, bioinformatics, hierarchical classification

Procedia PDF Downloads 126
143 Solanum tuberosum Ammonium Transporter Gene: Some Bioinformatics Insights

Authors: A. T. Adetunji, F. B. Lewu, R. Mundembe

Abstract:

Plants require nitrogen (N) to support desired production levels. Nitrogen is available to plants in the form of nitrate or ammonium, which are transported into the cell with the aid of various transport proteins. Ammonium transporters (AMTs) play a role in the uptake of ammonium, the form in which nitrogen is preferentially absorbed by plants. Solanum tuberosum AMT1 (StAMT1) was characterized using molecular biology and bioinformatics methods. Nucleotide database sequences were used to design AMT1-specific primers which were used to amplify the AMT1 internal regions. Nucleotide sequencing, alignment and phylogenetic analysis assigned StAMT1 to the AMT1 family. The deduced amino acid sequences showed that StAMT1 is 92%, 83% and 76% similar to Solanum lycopersicum LeAMT1.1, Lotus japonicus LjAMT1.1 and Solanum lycopersicum LeAMT1.2 respectively. StAMT1 fragments were shown to correspond to the 5th - 10th trans-membrane domains. Residue StAMT1 D15 is predicted to be essential for ammonium transport, while mutations of StAMT1 S76A may further enhance ammonium transport.

Keywords: ammonium transporter, bioinformatics, nitrogen, primers, Solanum tuberosum

Procedia PDF Downloads 124
142 Intellectual Property Protection of CRISPR Related Technologies

Authors: Zheng Miao, Dennis Fernandez

Abstract:

CRISPR research has the potential to completely transform life science, agriculture, live-stock and the health care industry. The Intellectual Property derived from its research has raised significant attention in the academic as well as the biopharmaceutical industry culminating an urgent need for strategic IP protection. We review the rudimentary concepts and key competitors of CRISPR technologies as well as the paramount strategies for intellectual property protection. Further, we elaborate on prosecution issues related to CRISPR patents as well as possible solutions to various patent laws, interferences and litigation. Finally, we address how the bioinformatics of the CRISPR technology begs an inquiry into issues of privacy and a host of ethical concerns.

Keywords: bioinformatics, CRISPR, biotechnology, intellectual property

Procedia PDF Downloads 127
141 Isolate-Specific Variations among Clinical Isolates of Brucella Identified by Whole-Genome Sequencing, Bioinformatics and Comparative Genomics

Authors: Abu S. Mustafa, Mohammad W. Khan, Faraz Shaheed Khan, Nazima Habibi

Abstract:

Brucellosis is a zoonotic disease of worldwide prevalence. There are at least four species and several strains of Brucella that cause human disease. Brucella genomes have very limited variation across strains, which hinder strain identification using classical molecular techniques, including PCR and 16 S rDNA sequencing. The aim of this study was to perform whole genome sequencing of clinical isolates of Brucella and perform bioinformatics and comparative genomics analyses to determine the existence of genetic differences across the isolates of a single Brucella species and strain. The draft sequence data were generated from 15 clinical isolates of Brucella melitensis (biovar 2 strain 63/9) using MiSeq next generation sequencing platform. The generated reads were used for further assembly and analysis. All the analysis was performed using Bioinformatics work station (8 core i7 processor, 8GB RAM with Bio-Linux operating system). FastQC was used to determine the quality of reads and low quality reads were trimmed or eliminated using Fastx_trimmer. Assembly was done by using Velvet and ABySS softwares. The ordering of assembled contigs was performed by Mauve. An online server RAST was employed to annotate the contigs assembly. Annotated genomes were compared using Mauve and ACT tools. The QC score for DNA sequence data, generated by MiSeq, was higher than 30 for 80% of reads with more than 100x coverage, which suggested that data could be utilized for further analysis. However when analyzed by FastQC, quality of four reads was not good enough for creating a complete genome draft so remaining 11 samples were used for further analysis. The comparative genome analyses showed that despite sharing same gene sets, single nucleotide polymorphisms and insertions/deletions existed across different genomes, which provided a variable extent of diversity to these bacteria. In conclusion, the next generation sequencing, bioinformatics, and comparative genome analysis can be utilized to find variations (point mutations, insertions and deletions) across different genomes of Brucella within a single strain. This information could be useful in surveillance and epidemiological studies supported by Kuwait University Research Sector grants MI04/15 and SRUL02/13.

Keywords: brucella, bioinformatics, comparative genomics, whole genome sequencing

Procedia PDF Downloads 252
140 Characterization of Solanum tuberosum Ammonium Transporter Gene Using Bioinformatics Approach

Authors: Adewole Tomiwa Adetunji, Francis Bayo Lewu, Richard Mundembe

Abstract:

Plants require nitrogen (N) to support desired production levels. There is a need for better understanding of N transport mechanism in order to improve N assimilation by plant root. Nitrogen is available to plants in the form of nitrate or ammonium, which are transported into the cell with the aid of various transport proteins. Ammonium transporters (AMTs) play a role in the uptake of ammonium, the form in which N is preferentially absorbed by plants. Solanum tuberosum AMT1 (StAMT1) was amplified, sequenced and characterized using molecular biology and bioinformatics methods. Nucleotide database sequences were used to design 976 base pairs AMT1-specific primers which include forward primer 5’- GCCATCGCCGCCGCCGG-3’ and reverse primer 5’-GGGTCAGATCCATACCCGC-3’. These primers were used to amplify the Solanum tuberosum AMT1 internal regions. Nucleotide sequencing, alignment and phylogenetic analysis assigned StAMT1 to the AMT1 family due to the clade and high similarity it shared with other plant AMT1 genes. The deduced amino acid sequences showed that StAMT1 is 92%, 83% and 76% similar to Solanum lycopersicum LeAMT1.1, Lotus japonicus LjAMT1.1, and Solanum lycopersicum LeAMT1.2 respectively. StAMT1 fragments were shown to correspond to the 5th-10th trans-membrane domains. Residue StAMT1 D15 is predicted to be essential for ammonium transport, while mutations of StAMT1 S76A may further enhance ammonium transport.

Keywords: ammonium transporter, bioinformatics, nitrogen, primers, Solanum tuberosum

Procedia PDF Downloads 115
139 Bioinformatics Approach to Identify Physicochemical and Structural Properties Associated with Successful Cell-free Protein Synthesis

Authors: Alexander A. Tokmakov

Abstract:

Cell-free protein synthesis is widely used to synthesize recombinant proteins. It allows genome-scale expression of various polypeptides under strictly controlled uniform conditions. However, only a minor fraction of all proteins can be successfully expressed in the systems of protein synthesis that are currently used. The factors determining expression success are poorly understood. At present, the vast volume of data is accumulated in cell-free expression databases. It makes possible comprehensive bioinformatics analysis and identification of multiple features associated with successful cell-free expression. Here, we describe an approach aimed at identification of multiple physicochemical and structural properties of amino acid sequences associated with protein solubility and aggregation and highlight major correlations obtained using this approach. The developed method includes: categorical assessment of the protein expression data, calculation and prediction of multiple properties of expressed amino acid sequences, correlation of the individual properties with the expression scores, and evaluation of statistical significance of the observed correlations. Using this approach, we revealed a number of statistically significant correlations between calculated and predicted features of protein sequences and their amenability to cell-free expression. It was found that some of the features, such as protein pI, hydrophobicity, presence of signal sequences, etc., are mostly related to protein solubility, whereas the others, such as protein length, number of disulfide bonds, content of secondary structure, etc., affect mainly the expression propensity. We also demonstrated that amenability of polypeptide sequences to cell-free expression correlates with the presence of multiple sites of post-translational modifications. The correlations revealed in this study provide a plethora of important insights into protein folding and rationalization of protein production. The developed bioinformatics approach can be of practical use for predicting expression success and optimizing cell-free protein synthesis.

Keywords: bioinformatics analysis, cell-free protein synthesis, expression success, optimization, recombinant proteins

Procedia PDF Downloads 306
138 Uncovering Anti-Hypertensive Obesity Targets and Mechanisms of Metformin, an Anti-Diabetic Medication

Authors: Lu Yang, Keng Po Lai

Abstract:

Metformin, a well-known clinical drug against diabetes, is found with potential anti-diabetic and anti-obese benefits, as reported in increasing evidences. However, the current clinical and experimental investigations are not to reveal the detailed mechanisms of metformin-anti-obesity/hypertension. We have used the bioinformatics strategy, including network pharmacology and molecular docking methodology, to uncover the key targets and pathways of bioactive compounds against clinical disorders, such as cancers, coronavirus disease. Thus, in this report, the in-silico approach was utilized to identify the hug targets, pharmacological function, and mechanism of metformin against obesity and hypertension. The networking analysis identified 154 differentially expressed genes of obesity and hypertension, 21 interaction genes, and 6 hug genes of metformin treating hypertensive obesity. As a result, the molecular docking findings indicated the potent binding capability of metformin with the key proteins, including interleukin 6 (IL-6) and chemokine (C-C motif) Ligand 2 (CCL2), in hypertensive obesity. The metformin-exerted anti-hypertensive obesity action involved in metabolic regulation, inflammatory reaction. And the anti-hypertensive obesity mechanisms of metformin were revealed, including regulation of inflammatory and immunological signaling pathways for metabolic homeostasis in tissue and microenvironmental melioration in blood pressure. In conclusion, our identified findings with bioinformatics analysis have demonstrated the detailed hug and pharmacological targets, biological functions, and signaling pathways of metformin treating hypertensive obesity.

Keywords: metformin, obesity, hypertension, bioinformatics findings

Procedia PDF Downloads 24
137 Prediction and Identification of a Permissive Epitope Insertion Site for St Toxoid in cfaB from Enterotoxigenic Escherichia coli

Authors: N. Zeinalzadeh, Mahdi Sadeghi

Abstract:

Enterotoxigenic Escherichia coli (ETEC) is the most common cause of non-inflammatory diarrhea in the developing countries, resulting in approximately 20% of all diarrheal episodes in children in these areas. ST is one of the most important virulence factors and CFA/I is one of the frequent colonization factors that help to process of ETEC infection. ST and CfaB (CFA/I subunit) are among vaccine candidates against ETEC. So, ST because of its small size is not a good immunogenic in the natural form. However to increase its immunogenic potential, here we explored candidate positions for ST insertion in CfaB sequence. After bioinformatics analysis, one of the candidate positions was selected and the chimeric gene (cfaB*st) sequence was synthesized and expressed in E. coli BL21 (DE3). The chimeric recombinant protein was purified with Ni-NTA columns and characterized with western blot analysis. The residue 74-75 of CfaB sequence could be a good candidate position for ST and other epitopes insertion.

Keywords: bioinformatics, CFA/I, enterotoxigenic E. coli, ST toxoid

Procedia PDF Downloads 322
136 Identification and Characterization of Small Peptides Encoded by Small Open Reading Frames using Mass Spectrometry and Bioinformatics

Authors: Su Mon Saw, Joe Rothnagel

Abstract:

Short open reading frames (sORFs) located in 5’UTR of mRNAs are known as uORFs. Characterization of uORF-encoded peptides (uPEPs) i.e., a subset of short open reading frame encoded peptides (sPEPs) and their translation regulation lead to understanding of causes of genetic disease, proteome complexity and development of treatments. Existence of uORFs within cellular proteome could be detected by LC-MS/MS. The ability of uORF to be translated into uPEP and achievement of uPEP identification will allow uPEP’s characterization, structures, functions, subcellular localization, evolutionary maintenance (conservation in human and other species) and abundance in cells. It is hypothesized that a subset of sORFs are translatable and that their encoded sPEPs are functional and are endogenously expressed contributing to the eukaryotic cellular proteome complexity. This project aimed to investigate whether sORFs encode functional peptides. Liquid chromatography-mass spectrometry (LC-MS) and bioinformatics were thus employed. Due to probable low abundance of sPEPs and small in sizes, the need for efficient peptide enrichment strategies for enriching small proteins and depleting the sub-proteome of large and abundant proteins is crucial for identifying sPEPs. Low molecular weight proteins were extracted using SDS-PAGE from Human Embryonic Kidney (HEK293) cells and Strong Cation Exchange Chromatography (SCX) from secreted HEK293 cells. Extracted proteins were digested by trypsin to peptides, which were detected by LC-MS/MS. The MS/MS data obtained was searched against Swiss-Prot using MASCOT version 2.4 to filter out known proteins, and all unmatched spectra were re-searched against human RefSeq database. ProteinPilot v5.0.1 was used to identify sPEPs by searching against human RefSeq, Vanderperre and Human Alternative Open Reading Frame (HaltORF) databases. Potential sPEPs were analyzed by bioinformatics. Since SDS PAGE electrophoresis could not separate proteins <20kDa, this could not identify sPEPs. All MASCOT-identified peptide fragments were parts of main open reading frame (mORF) by ORF Finder search and blastp search. No sPEP was detected and existence of sPEPs could not be identified in this study. 13 translated sORFs in HEK293 cells by mass spectrometry in previous studies were characterized by bioinformatics. Identified sPEPs from previous studies were <100 amino acids and <15 kDa. Bioinformatics results showed that sORFs are translated to sPEPs and contribute to proteome complexity. uPEP translated from uORF of SLC35A4 was strongly conserved in human and mouse while uPEP translated from uORF of MKKS was strongly conserved in human and Rhesus monkey. Cross-species conserved uORFs in association with protein translation strongly suggest evolutionary maintenance of coding sequence and indicate probable functional expression of peptides encoded within these uORFs. Translation of sORFs was confirmed by mass spectrometry and sPEPs were characterized with bioinformatics.

Keywords: bioinformatics, HEK293 cells, liquid chromatography-mass spectrometry, ProteinPilot, Strong Cation Exchange Chromatography, SDS-PAGE, sPEPs

Procedia PDF Downloads 93
135 Knowledge Engineering Based Smart Healthcare Solution

Authors: Rhaed Khiati, Muhammad Hanif

Abstract:

In the past decade, smart healthcare systems have been on an ascendant drift, especially with the evolution of hospitals and their increasing reliance on bioinformatics and software specializing in healthcare. Doctors have become reliant on technology more than ever, something that in the past would have been looked down upon, as technology has become imperative in reducing overall costs and improving the quality of patient care. With patient-doctor interactions becoming more necessary and more complicated than ever, systems must be developed while taking into account costs, patient comfort, and patient data, among other things. In this work, we proposed a smart hospital bed, which mixes the complexity and big data usage of traditional healthcare systems with the comfort found in soft beds while taking certain concerns like data confidentiality, security, and maintaining SLA agreements, etc. into account. This research work potentially provides users, namely patients and doctors, with a seamless interaction with to their respective nurses, as well as faster access to up-to-date personal data, including prescriptions and severity of the condition in contrast to the previous research in the area where there is lack of consideration of such provisions.

Keywords: big data, smart healthcare, distributed systems, bioinformatics

Procedia PDF Downloads 82
134 Bioinformatics and Molecular Biological Characterization of a Hypothetical Protein SAV1226 as a Potential Drug Target for Methicillin/Vancomycin-Staphylococcus aureus Infections

Authors: Nichole Haag, Kimberly Velk, Tyler McCune, Chun Wu

Abstract:

Methicillin/multiple-resistant Staphylococcus aureus (MRSA) are infectious bacteria that are resistant to common antibiotics. A previous in silico study in our group has identified a hypothetical protein SAV1226 as one of the potential drug targets. In this study, we reported the bioinformatics characterization, as well as cloning, expression, purification and kinetic assays of hypothetical protein SAV1226 from methicillin/vancomycin-resistant Staphylococcus aureus Mu50 strain. MALDI-TOF/MS analysis revealed a low degree of structural similarity with known proteins. Kinetic assays demonstrated that hypothetical protein SAV1226 is neither a domain of an ATP dependent dihydroxyacetone kinase nor of a phosphotransferase system (PTS) dihydroxyacetone kinase, suggesting that the function of hypothetical protein SAV1226 might be misannotated on public databases such as UniProt and InterProScan 5.

Keywords: Methicillin-resistant Staphylococcus aureus, dihydroxyacetone kinase, essential genes, drug target, phosphoryl group donor

Procedia PDF Downloads 295
133 Towards End-To-End Disease Prediction from Raw Metagenomic Data

Authors: Maxence Queyrel, Edi Prifti, Alexandre Templier, Jean-Daniel Zucker

Abstract:

Analysis of the human microbiome using metagenomic sequencing data has demonstrated high ability in discriminating various human diseases. Raw metagenomic sequencing data require multiple complex and computationally heavy bioinformatics steps prior to data analysis. Such data contain millions of short sequences read from the fragmented DNA sequences and stored as fastq files. Conventional processing pipelines consist in multiple steps including quality control, filtering, alignment of sequences against genomic catalogs (genes, species, taxonomic levels, functional pathways, etc.). These pipelines are complex to use, time consuming and rely on a large number of parameters that often provide variability and impact the estimation of the microbiome elements. Training Deep Neural Networks directly from raw sequencing data is a promising approach to bypass some of the challenges associated with mainstream bioinformatics pipelines. Most of these methods use the concept of word and sentence embeddings that create a meaningful and numerical representation of DNA sequences, while extracting features and reducing the dimensionality of the data. In this paper we present an end-to-end approach that classifies patients into disease groups directly from raw metagenomic reads: metagenome2vec. This approach is composed of four steps (i) generating a vocabulary of k-mers and learning their numerical embeddings; (ii) learning DNA sequence (read) embeddings; (iii) identifying the genome from which the sequence is most likely to come and (iv) training a multiple instance learning classifier which predicts the phenotype based on the vector representation of the raw data. An attention mechanism is applied in the network so that the model can be interpreted, assigning a weight to the influence of the prediction for each genome. Using two public real-life data-sets as well a simulated one, we demonstrated that this original approach reaches high performance, comparable with the state-of-the-art methods applied directly on processed data though mainstream bioinformatics workflows. These results are encouraging for this proof of concept work. We believe that with further dedication, the DNN models have the potential to surpass mainstream bioinformatics workflows in disease classification tasks.

Keywords: deep learning, disease prediction, end-to-end machine learning, metagenomics, multiple instance learning, precision medicine

Procedia PDF Downloads 18
132 A Systems Approach to Targeting Cyclooxygenase: Genomics, Bioinformatics and Metabolomics Analysis of COX-1 -/- and COX-2-/- Lung Fibroblasts Providing Indication of Sterile Inflammation

Authors: Abul B. M. M. K. Islam, Mandar Dave, Roderick V. Jensen, Ashok R. Amin

Abstract:

A systems approach was applied to characterize differentially expressed transcripts, bioinformatics pathways, and proteins and prostaglandins (PGs) from lung fibroblasts procured from wild-type (WT), COX-1-/- and COX-2-/- mice to understand system level control mechanism. Bioinformatics analysis of COX-2 and COX-1 ablated cells induced COX-1 and COX-2 specific signature respectively, which significantly overlapped with an 'IL-1β induced inflammatory signature'. This defined novel cross-talk signals that orchestrated coordinated activation of pathways of sterile inflammation sensed by cellular stress. The overlapping signals showed significant over-representation of shared pathways for interferon y and immune responses, T cell functions, NOD, and toll-like receptor signaling. Gene Ontology Biological Process (GOBP) and pathway enrichment analysis specifically showed an increase in mRNA expression associated with: (a) organ development and homeostasis in COX-1-/- cells and (b) oxidative stress and response, spliceosomes and proteasomes activity, mTOR and p53 signaling in COX-2-/- cells. COX-1 and COX-2 showed signs of functional pathways committed to cell cycle and DNA replication at the genomics level. As compared to WT, metabolomics analysis revealed a significant increase in COX-1 mRNA and synthesis of basal levels of eicosanoids (PGE2, PGD2, TXB2, LTB4, PGF1α, and PGF2α) in COX-2 ablated cells and increase in synthesis of PGE2, and PGF1α in COX-1 null cells. There was a compensation of PGE2 and PGF1α in COX-1-/- and COX-2-/- cells. Collectively, these results support a broader, differential and collaborative regulation of both COX-1 and COX-2 pathways at the metabolic, signaling, and genomics levels in cellular homeostasis and sterile inflammation induced by cellular stress.

Keywords: cyclooxygenases, inflammation, lung fibroblasts, systemic

Procedia PDF Downloads 185
131 Syntax and Words as Evolutionary Characters in Comparative Linguistics

Authors: Nancy Retzlaff, Sarah J. Berkemer, Trudie Strauss

Abstract:

In the last couple of decades, the advent of digitalization of any kind of data was probably one of the major advances in all fields of study. This paves the way for also analysing these data even though they might come from disciplines where there was no initial computational necessity to do so. Especially in linguistics, one can find a rather manual tradition. Still when considering studies that involve the history of language families it is hard to overlook the striking similarities to bioinformatics (phylogenetic) approaches. Alignments of words are such a fairly well studied example of an application of bioinformatics methods to historical linguistics. In this paper we will not only consider alignments of strings, i.e., words in this case, but also alignments of syntax trees of selected Indo-European languages. Based on initial, crude alignments, a sophisticated scoring model is trained on both letters and syntactic features. The aim is to gain a better understanding on which features in two languages are related, i.e., most likely to have the same root. Initially, all words in two languages are pre-aligned with a basic scoring model that primarily selects consonants and adjusts them before fitting in the vowels. Mixture models are subsequently used to filter ‘good’ alignments depending on the alignment length and the number of inserted gaps. Using these selected word alignments it is possible to perform tree alignments of the given syntax trees and consequently find sentences that correspond rather well to each other across languages. The syntax alignments are then filtered for meaningful scores—’good’ scores contain evolutionary information and are therefore used to train the sophisticated scoring model. Further iterations of alignments and training steps are performed until the scoring model saturates, i.e., barely changes anymore. A better evaluation of the trained scoring model and its function in containing evolutionary meaningful information will be given. An assessment of sentence alignment compared to possible phrase structure will also be provided. The method described here may have its flaws because of limited prior information. This, however, may offer a good starting point to study languages where only little prior knowledge is available and a detailed, unbiased study is needed.

Keywords: alignments, bioinformatics, comparative linguistics, historical linguistics, statistical methods

Procedia PDF Downloads 48
130 Identification of Disease Causing DNA Motifs in Human DNA Using Clustering Approach

Authors: G. Tamilpavai, C. Vishnuppriya

Abstract:

Studying DNA (deoxyribonucleic acid) sequence is useful in biological processes and it is applied in the fields such as diagnostic and forensic research. DNA is the hereditary information in human and almost all other organisms. It is passed to their generations. Earlier stage detection of defective DNA sequence may lead to many developments in the field of Bioinformatics. Nowadays various tedious techniques are used to identify defective DNA. The proposed work is to analyze and identify the cancer-causing DNA motif in a given sequence. Initially the human DNA sequence is separated as k-mers using k-mer separation rule. The separated k-mers are clustered using Self Organizing Map (SOM). Using Levenshtein distance measure, cancer associated DNA motif is identified from the k-mer clusters. Experimental results of this work indicate the presence or absence of cancer causing DNA motif. If the cancer associated DNA motif is found in DNA, it is declared as the cancer disease causing DNA sequence. Otherwise the input human DNA is declared as normal sequence. Finally, elapsed time is calculated for finding the presence of cancer causing DNA motif using clustering formation. It is compared with normal process of finding cancer causing DNA motif. Locating cancer associated motif is easier in cluster formation process than the other one. The proposed work will be an initiative aid for finding genetic disease related research.

Keywords: bioinformatics, cancer motif, DNA, k-mers, Levenshtein distance, SOM

Procedia PDF Downloads 83
129 Estimation of Transition and Emission Probabilities

Authors: Aakansha Gupta, Neha Vadnere, Tapasvi Soni, M. Anbarsi

Abstract:

Protein secondary structure prediction is one of the most important goals pursued by bioinformatics and theoretical chemistry; it is highly important in medicine and biotechnology. Some aspects of protein functions and genome analysis can be predicted by secondary structure prediction. This is used to help annotate sequences, classify proteins, identify domains, and recognize functional motifs. In this paper, we represent protein secondary structure as a mathematical model. To extract and predict the protein secondary structure from the primary structure, we require a set of parameters. Any constants appearing in the model are specified by these parameters, which also provide a mechanism for efficient and accurate use of data. To estimate these model parameters there are many algorithms out of which the most popular one is the EM algorithm or called the Expectation Maximization Algorithm. These model parameters are estimated with the use of protein datasets like RS126 by using the Bayesian Probabilistic method (data set being categorical). This paper can then be extended into comparing the efficiency of EM algorithm to the other algorithms for estimating the model parameters, which will in turn lead to an efficient component for the Protein Secondary Structure Prediction. Further this paper provides a scope to use these parameters for predicting secondary structure of proteins using machine learning techniques like neural networks and fuzzy logic. The ultimate objective will be to obtain greater accuracy better than the previously achieved.

Keywords: model parameters, expectation maximization algorithm, protein secondary structure prediction, bioinformatics

Procedia PDF Downloads 367
128 The Use of Network Tool for Brain Signal Data Analysis: A Case Study with Blind and Sighted Individuals

Authors: Cleiton Pons Ferreira, Diana Francisca Adamatti

Abstract:

Advancements in computers technology have allowed to obtain information for research in biology and neuroscience. In order to transform the data from these surveys, networks have long been used to represent important biological processes, changing the use of this tools from purely illustrative and didactic to more analytic, even including interaction analysis and hypothesis formulation. Many studies have involved this application, but not directly for interpretation of data obtained from brain functions, asking for new perspectives of development in neuroinformatics using existent models of tools already disseminated by the bioinformatics. This study includes an analysis of neurological data through electroencephalogram (EEG) signals, using the Cytoscape, an open source software tool for visualizing complex networks in biological databases. The data were obtained from a comparative case study developed in a research from the University of Rio Grande (FURG), using the EEG signals from a Brain Computer Interface (BCI) with 32 eletrodes prepared in the brain of a blind and a sighted individuals during the execution of an activity that stimulated the spatial ability. This study intends to present results that lead to better ways for use and adapt techniques that support the data treatment of brain signals for elevate the understanding and learning in neuroscience.

Keywords: neuroinformatics, bioinformatics, network tools, brain mapping

Procedia PDF Downloads 39
127 Intra-miR-ExploreR, a Novel Bioinformatics Platform for Integrated Discovery of MiRNA:mRNA Gene Regulatory Networks

Authors: Surajit Bhattacharya, Daniel Veltri, Atit A. Patel, Daniel N. Cox

Abstract:

miRNAs have emerged as key post-transcriptional regulators of gene expression, however identification of biologically-relevant target genes for this epigenetic regulatory mechanism remains a significant challenge. To address this knowledge gap, we have developed a novel tool in R, Intra-miR-ExploreR, that facilitates integrated discovery of miRNA targets by incorporating target databases and novel target prediction algorithms, using statistical methods including Pearson and Distance Correlation on microarray data, to arrive at high confidence intragenic miRNA target predictions. We have explored the efficacy of this tool using Drosophila melanogaster as a model organism for bioinformatics analyses and functional validation. A number of putative targets were obtained which were also validated using qRT-PCR analysis. Additional features of the tool include downloadable text files containing GO analysis from DAVID and Pubmed links of literature related to gene sets. Moreover, we are constructing interaction maps of intragenic miRNAs, using both micro array and RNA-seq data, focusing on neural tissues to uncover regulatory codes via which these molecules regulate gene expression to direct cellular development.

Keywords: miRNA, miRNA:mRNA target prediction, statistical methods, miRNA:mRNA interaction network

Procedia PDF Downloads 293
126 Bioinformatic Study of Follicle Stimulating Hormone Receptor (FSHR) Gene in Different Buffalo Breeds

Authors: Hamid Mustafa, Adeela Ajmal, Kim EuiSoo, Noor-ul-Ain

Abstract:

World wild, buffalo production is considered as most important component of food industry. Efficient buffalo production is related with reproductive performance of this species. Lack of knowledge of reproductive efficiency and its related genes in buffalo species is a major constraint for sustainable buffalo production. In this study, we performed some bioinformatics analysis on Follicle Stimulating Hormone Receptor (FSHR) gene and explored the possible relationship of this gene among different buffalo breeds and with other farm animals. We also found the evolution pattern for this gene among these species. We investigate CDS lengths, Stop codon variation, homology search, signal peptide, isoelectic point, tertiary structure, motifs and phylogenetic tree. The results of this study indicate 4 different motif in this gene, which are Activin-recp, GS motif, STYKc Protein kinase and transmembrane. The results also indicate that this gene has very close relationship with cattle, bison, sheep and goat. Multiple alignment (MA) showed high conservation of motif which indicates constancy of this gene during evolution. The results of this study can be used and applied for better understanding of this gene for better characterization of Follicle Stimulating Hormone Receptor (FSHR) gene structure in different farm animals, which would be helpful for efficient breeding plans for animal’s production.

Keywords: buffalo, FSHR gene, bioinformatics, production

Procedia PDF Downloads 433
125 Molecular Portraits: The Role of Posttranslational Modification in Cancer Metastasis

Authors: Navkiran Kaur, Apoorva Mathur, Abhishree Agarwal, Sakshi Gupta, Tuhin Rashmi

Abstract:

Aim: Breast cancer is the most common cancer in women worldwide, and resistance to the current therapeutics, often concurrently, is an increasing clinical challenge. Glycosylation of proteins is one of the most important post-translational modifications. It is widely known that aberrant glycosylation has been implicated in many different diseases due to changes associated with biological function and protein folding. Alterations in cell surface glycosylation, can promote invasive behavior of tumor cells that ultimately lead to the progression of cancer. In breast cancer, there is an increasing evidence pertaining to the role of glycosylation in tumor formation and metastasis. In the present study, an attempt has been made to study the disease associated sialoglycoproteins in breast cancer by using bioinformatics tools. The sequence will be retrieved from UniProt database. A database in the form of a word document was made by a collection of FASTA sequences of breast cancer gene sequence. Glycosylation was studied using yinOyang tool on ExPASy and Differential genes expression and protein analysis was done in context of breast cancer metastasis. The number of residues predicted O-glc NAc threshold containing 50 aberrant glycosylation sites or more was detected and recorded for individual sequence. We found that the there is a significant change in the expression profiling of glycosylation patterns of various proteins associated with breast cancer. Differential aberrant glycosylated proteins in breast cancer cells with respect to non-neoplastic cells are an important factor for the overall progression and development of cancer.

Keywords: breast cancer, bioinformatics, cancer, metastasis, glycosylation

Procedia PDF Downloads 184
124 Finding the Longest Common Subsequence in Normal DNA and Disease Affected Human DNA Using Self Organizing Map

Authors: G. Tamilpavai, C. Vishnuppriya

Abstract:

Bioinformatics is an active research area which combines biological matter as well as computer science research. The longest common subsequence (LCSS) is one of the major challenges in various bioinformatics applications. The computation of the LCSS plays a vital role in biomedicine and also it is an essential task in DNA sequence analysis in genetics. It includes wide range of disease diagnosing steps. The objective of this proposed system is to find the longest common subsequence which presents in a normal and various disease affected human DNA sequence using Self Organizing Map (SOM) and LCSS. The human DNA sequence is collected from National Center for Biotechnology Information (NCBI) database. Initially, the human DNA sequence is separated as k-mer using k-mer separation rule. Mean and median values are calculated from each separated k-mer. These calculated values are fed as input to the Self Organizing Map for the purpose of clustering. Then obtained clusters are given to the Longest Common Sub Sequence (LCSS) algorithm for finding common subsequence which presents in every clusters. It returns nx(n-1)/2 subsequence for each cluster where n is number of k-mer in a specific cluster. Experimental outcomes of this proposed system produce the possible number of longest common subsequence of normal and disease affected DNA data. Thus the proposed system will be a good initiative aid for finding disease causing sequence. Finally, performance analysis is carried out for different DNA sequences. The obtained values show that the retrieval of LCSS is done in a shorter time than the existing system.

Keywords: clustering, k-mers, longest common subsequence, SOM

Procedia PDF Downloads 185
123 Isotherm Study for Phenol Removal onto GAC

Authors: Lallan Singh Yadav, Bijay Kumar Mishra, Manoj Kumar Mahapatra, Arvind Kumar

Abstract:

Adsorption data for phenol removal onto granular activated carbon were fitted to Langmuir and Freundlich isotherms. The adsorption capacity of phenol was estimated to be 16.12 mg/g at initial pH=5.7. The thermodynamics of adsorption process has also been determined in the present work.

Keywords: adsorption, phenol, granular activated carbon, bioinformatics, biomedicine

Procedia PDF Downloads 398
122 Meanings and Concepts of Standardization in Systems Medicine

Authors: Imme Petersen, Wiebke Sick, Regine Kollek

Abstract:

In systems medicine, high-throughput technologies produce large amounts of data on different biological and pathological processes, including (disturbed) gene expressions, metabolic pathways and signaling. The large volume of data of different types, stored in separate databases and often located at different geographical sites have posed new challenges regarding data handling and processing. Tools based on bioinformatics have been developed to resolve the upcoming problems of systematizing, standardizing and integrating the various data. However, the heterogeneity of data gathered at different levels of biological complexity is still a major challenge in data analysis. To build multilayer disease modules, large and heterogeneous data of disease-related information (e.g., genotype, phenotype, environmental factors) are correlated. Therefore, a great deal of attention in systems medicine has been put on data standardization, primarily to retrieve and combine large, heterogeneous datasets into standardized and incorporated forms and structures. However, this data-centred concept of standardization in systems medicine is contrary to the debate in science and technology studies (STS) on standardization that rather emphasizes the dynamics, contexts and negotiations of standard operating procedures. Based on empirical work on research consortia that explore the molecular profile of diseases to establish systems medical approaches in the clinic in Germany, we trace how standardized data are processed and shaped by bioinformatics tools, how scientists using such data in research perceive such standard operating procedures and which consequences for knowledge production (e.g. modeling) arise from it. Hence, different concepts and meanings of standardization are explored to get a deeper insight into standard operating procedures not only in systems medicine, but also beyond.

Keywords: data, science and technology studies (STS), standardization, systems medicine

Procedia PDF Downloads 232
121 Nonlinear Waves in Two-Layer Systems with Heat Release/Consumption at the Interface

Authors: Ilya Simanovskii

Abstract:

Nonlinear convective flows developed under the joint action of buoyant and thermo-capillary effects in a two-layer system with periodic boundary conditions on the lateral walls have been investigated. The influence of an interfacial heat release on oscillatory regimes has been studied. The computational regions with different lengths have been considered. It is shown that the development of oscillatory instability can lead to the appearance of different no steady flows.

Keywords: interface, instabilities, two-layer systems, bioinformatics, biomedicine

Procedia PDF Downloads 298
120 Whole Exome Sequencing Data Analysis of Rare Diseases: Non-Coding Variants and Copy Number Variations

Authors: S. Fahiminiya, J. Nadaf, F. Rauch, L. Jerome-Majewska, J. Majewski

Abstract:

Background: Sequencing of protein coding regions of human genome (Whole Exome Sequencing; WES), has demonstrated a great success in the identification of causal mutations for several rare genetic disorders in human. Generally, most of WES studies have focused on rare variants in coding exons and splicing-sites where missense substitutions lead to the alternation of protein product. Although focusing on this category of variants has revealed the mystery behind many inherited genetic diseases in recent years, a subset of them remained still inconclusive. Here, we present the result of our WES studies where analyzing only rare variants in coding regions was not conclusive but further investigation revealed the involvement of non-coding variants and copy number variations (CNV) in etiology of the diseases. Methods: Whole exome sequencing was performed using our standard protocols at Genome Quebec Innovation Center, Montreal, Canada. All bioinformatics analyses were done using in-house WES pipeline. Results: To date, we successfully identified several disease causing mutations within gene coding regions (e.g. SCARF2: Van den Ende-Gupta syndrome and SNAP29: 22q11.2 deletion syndrome) by using WES. In addition, we showed that variants in non-coding regions and CNV have also important value and should not be ignored and/or filtered out along the way of bioinformatics analysis on WES data. For instance, in patients with osteogenesis imperfecta type V and in patients with glucocorticoid deficiency, we identified variants in 5'UTR, resulting in the production of longer or truncating non-functional proteins. Furthermore, CNVs were identified as the main cause of the diseases in patients with metaphyseal dysplasia with maxillary hypoplasia and brachydactyly and in patients with osteogenesis imperfecta type VII. Conclusions: Our study highlights the importance of considering non-coding variants and CNVs during interpretation of WES data, as they can be the only cause of disease under investigation.

Keywords: whole exome sequencing data, non-coding variants, copy number variations, rare diseases

Procedia PDF Downloads 302
119 Characterizing and Developing the Clinical Grade Microbiome Assay with a Robust Bioinformatics Pipeline for Supporting Precision Medicine Driven Clinical Development

Authors: Danyi Wang, Andrew Schriefer, Dennis O'Rourke, Brajendra Kumar, Yang Liu, Fei Zhong, Juergen Scheuenpflug, Zheng Feng

Abstract:

Purpose: It has been recognized that the microbiome plays critical roles in disease pathogenesis, including cancer, autoimmune disease, and multiple sclerosis. To develop a clinical-grade assay for exploring microbiome-derived clinical biomarkers across disease areas, a two-phase approach is implemented. 1) Identification of the optimal sample preparation reagents using pre-mixed bacteria and healthy donor stool samples coupled with proprietary Sigma-Aldrich® bioinformatics solution. 2) Exploratory analysis of patient samples for enabling precision medicine. Study Procedure: In phase 1 study, we first compared the 16S sequencing results of two ATCC® microbiome standards (MSA 2002 and MSA 2003) across five different extraction kits (Kit A, B, C, D & E). Both microbiome standards samples were extracted in triplicate across all extraction kits. Following isolation, DNA quantity was determined by Qubit assay. DNA quality was assessed to determine purity and to confirm extracted DNA is of high molecular weight. Bacterial 16S ribosomal ribonucleic acid (rRNA) amplicons were generated via amplification of the V3/V4 hypervariable region of the 16S rRNA. Sequencing was performed using a 2x300 bp paired-end configuration on the Illumina MiSeq. Fastq files were analyzed using the Sigma-Aldrich® Microbiome Platform. The Microbiome Platform is a cloud-based service that offers best-in-class 16S-seq and WGS analysis pipelines and databases. The Platform and its methods have been extensively benchmarked using microbiome standards generated internally by MilliporeSigma and other external providers. Data Summary: The DNA yield using the extraction kit D and E is below the limit of detection (100 pg/µl) of Qubit assay as both extraction kits are intended for samples with low bacterial counts. The pre-mixed bacterial pellets at high concentrations with an input of 2 x106 cells for MSA-2002 and 1 x106 cells from MSA-2003 were not compatible with the kits. Among the remaining 3 extraction kits, kit A produced the greatest yield whereas kit B provided the least yield (Kit-A/MSA-2002: 174.25 ± 34.98; Kit-A/MSA-2003: 179.89 ± 30.18; Kit-B/MSA-2002: 27.86 ± 9.35; Kit-B/MSA-2003: 23.14 ± 6.39; Kit-C/MSA-2002: 55.19 ± 10.18; Kit-C/MSA-2003: 35.80 ± 11.41 (Mean ± SD)). Also, kit A produced the greatest yield, whereas kit B provided the least yield. The PCoA 3D visualization of the Weighted Unifrac beta diversity shows that kits A and C cluster closely together while kit B appears as an outlier. The kit A sequencing samples cluster more closely together than both the other kits. The taxonomic profiles of kit B have lower recall when compared to the known mixture profiles indicating that kit B was inefficient at detecting some of the bacteria. Conclusion: Our data demonstrated that the DNA extraction method impacts DNA concentration, purity, and microbial communities detected by next-generation sequencing analysis. Further microbiome analysis performance comparison of using healthy stool samples is underway; also, colorectal cancer patients' samples will be acquired for further explore the clinical utilities. Collectively, our comprehensive qualification approach, including the evaluation of optimal DNA extraction conditions, the inclusion of positive controls, and the implementation of a robust qualified bioinformatics pipeline, assures accurate characterization of the microbiota in a complex matrix for deciphering the deep biology and enabling precision medicine.

Keywords: 16S rRNA sequencing, analytical validation, bioinformatics pipeline, metagenomics

Procedia PDF Downloads 24
118 ELISA Based hTSH Assessment Using Two Sensitive and Specific Anti-hTSH Polyclonal Antibodies

Authors: Maysam Mard-Soltani, Mohamad Javad Rasaee, Saeed Khalili, Abdol Karim Sheikhi, Mehdi Hedayati

Abstract:

Production of specific antibody responses against hTSH is a cumbersome process due to the high identity between the hTSH and the other members of the glycoprotein hormone family (FSH, LH and HCG) and the high identity between the human hTSH and host animals for antibody production. Therefore, two polyclonal antibodies were purified against two recombinant proteins. Four possible ELISA tests were designed based on these antibodies. These ELISA tests were checked against hTSH and other glycoprotein hormones, and their sensitivity and specificity were assessed. Bioinformatics tools were used to analyze the immunological properties. After the immunogen region selection from hTSH protein, c terminal of B hTSH was selected and applied. Two recombinant genes, with these cut pieces (first: two repeats of C terminal of B hTSH, second: tetanous toxin+B hTSH C terminal), were designed and sub-cloned into the pET32a expression vector. Standard methods were used for protein expression, purification, and verification. Thereafter, immunizations of the white New Zealand rabbits were performed and the serums of them were used for antibody titration, purification and characterization. Then, four ELISA tests based on two antibodies were employed to assess the hTSH and other glycoprotein hormones. The results of these assessments were compared with standard amounts. The obtained results indicated that the desired antigens were successfully designed, sub-cloned, expressed, confirmed and used for in vivo immunization. The raised antibodies were capable of specific and sensitive hTSH detection, while the cross reactivity with the other members of the glycoprotein hormone family was minimum. Among the four designed tests, the test in which the antibody against first protein was used as capture antibody, and the antibody against second protein was used as detector antibody did not show any hook effect up to 50 miu/l. Both proteins have the ability to induce highly sensitive and specific antibody responses against the hTSH. One of the antibody combinations of these antibodies has the highest sensitivity and specificity in hTSH detection.

Keywords: hTSH, bioinformatics, protein expression, cross reactivity

Procedia PDF Downloads 68
117 Microbial Bioproduction with Design of Metabolism and Enzyme Engineering

Authors: Tomokazu Shirai, Akihiko Kondo

Abstract:

Technologies of metabolic engineering or synthetic biology are essential for effective microbial bioproduction. It is especially important to develop an in silico tool for designing a metabolic pathway producing an unnatural and valuable chemical such as fossil materials of fuel or plastics. We here demonstrated two in silico tools for designing novel metabolic pathways: BioProV and HyMeP. Furthermore, we succeeded in creating an artificial metabolic pathway by enzyme engineering.

Keywords: bioinformatics, metabolic engineering, synthetic biology, genome scale model

Procedia PDF Downloads 210
116 Integrative Omics-Portrayal Disentangles Molecular Heterogeneity and Progression Mechanisms of Cancer

Authors: Binder Hans

Abstract:

Cancer is no longer seen as solely a genetic disease where genetic defects such as mutations and copy number variations affect gene regulation and eventually lead to aberrant cell functioning which can be monitored by transcriptome analysis. It has become obvious that epigenetic alterations represent a further important layer of (de-)regulation of gene activity. For example, aberrant DNA methylation is a hallmark of many cancer types, and methylation patterns were successfully used to subtype cancer heterogeneity. Hence, unraveling the interplay between different omics levels such as genome, transcriptome and epigenome is inevitable for a mechanistic understanding of molecular deregulation causing complex diseases such as cancer. This objective requires powerful downstream integrative bioinformatics methods as an essential prerequisite to discover the whole genome mutational, transcriptome and epigenome landscapes of cancer specimen and to discover cancer genesis, progression and heterogeneity. Basic challenges and tasks arise ‘beyond sequencing’ because of the big size of the data, their complexity, the need to search for hidden structures in the data, for knowledge mining to discover biological function and also systems biology conceptual models to deduce developmental interrelations between different cancer states. These tasks are tightly related to cancer biology as an (epi-)genetic disease giving rise to aberrant genomic regulation under micro-environmental control and clonal evolution which leads to heterogeneous cellular states. Machine learning algorithms such as self organizing maps (SOM) represent one interesting option to tackle these bioinformatics tasks. The SOMmethod enables recognizing complex patterns in large-scale data generated by highthroughput omics technologies. It portrays molecular phenotypes by generating individualized, easy to interpret images of the data landscape in combination with comprehensive analysis options. Our image-based, reductionist machine learning methods provide one interesting perspective how to deal with massive data in the discovery of complex diseases, gliomas, melanomas and colon cancer on molecular level. As an important new challenge, we address the combined portrayal of different omics data such as genome-wide genomic, transcriptomic and methylomic ones. The integrative-omics portrayal approach is based on the joint training of the data and it provides separate personalized data portraits for each patient and data type which can be analyzed by visual inspection as one option. The new method enables an integrative genome-wide view on the omics data types and the underlying regulatory modes. It is applied to high and low-grade gliomas and to melanomas where it disentangles transversal and longitudinal molecular heterogeneity in terms of distinct molecular subtypes and progression paths with prognostic impact.

Keywords: integrative bioinformatics, machine learning, molecular mechanisms of cancer, gliomas and melanomas

Procedia PDF Downloads 68