Search results for: bio-informatics
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 203

Search results for: bio-informatics

203 The Development and Provision of a Knowledge Management Ecosystem, Optimized for Genomics

Authors: Matthew I. Bellgard

Abstract:

The field of bioinformatics has made, and continues to make, substantial progress and contributions to life science research and development. However, this paper contends that a systems approach integrates bioinformatics activities for any project in a defined manner. The application of critical control points in this bioinformatics systems approach may be useful to identify and evaluate points in a pathway where specified activity risk can be reduced, monitored and quality enhanced.

Keywords: bioinformatics, food security, personalized medicine, systems approach

Procedia PDF Downloads 422
202 Evaluation and Assessment of Bioinformatics Methods and Their Applications

Authors: Fatemeh Nokhodchi Bonab

Abstract:

Bioinformatics, in its broad sense, involves application of computer processes to solve biological problems. A wide range of computational tools are needed to effectively and efficiently process large amounts of data being generated as a result of recent technological innovations in biology and medicine. A number of computational tools have been developed or adapted to deal with the experimental riches of complex and multivariate data and transition from data collection to information or knowledge. These bioinformatics tools are being evaluated and applied in various medical areas including early detection, risk assessment, classification, and prognosis of cancer. The goal of these efforts is to develop and identify bioinformatics methods with optimal sensitivity, specificity, and predictive capabilities. The recent flood of data from genome sequences and functional genomics has given rise to new field, bioinformatics, which combines elements of biology and computer science. Bioinformatics is conceptualizing biology in terms of macromolecules (in the sense of physical-chemistry) and then applying "informatics" techniques (derived from disciplines such as applied maths, computer science, and statistics) to understand and organize the information associated with these molecules, on a large-scale. Here we propose a definition for this new field and review some of the research that is being pursued, particularly in relation to transcriptional regulatory systems.

Keywords: methods, applications, transcriptional regulatory systems, techniques

Procedia PDF Downloads 127
201 Meta-Learning for Hierarchical Classification and Applications in Bioinformatics

Authors: Fabio Fabris, Alex A. Freitas

Abstract:

Hierarchical classification is a special type of classification task where the class labels are organised into a hierarchy, with more generic class labels being ancestors of more specific ones. Meta-learning for classification-algorithm recommendation consists of recommending to the user a classification algorithm, from a pool of candidate algorithms, for a dataset, based on the past performance of the candidate algorithms in other datasets. Meta-learning is normally used in conventional, non-hierarchical classification. By contrast, this paper proposes a meta-learning approach for more challenging task of hierarchical classification, and evaluates it in a large number of bioinformatics datasets. Hierarchical classification is especially relevant for bioinformatics problems, as protein and gene functions tend to be organised into a hierarchy of class labels. This work proposes meta-learning approach for recommending the best hierarchical classification algorithm to a hierarchical classification dataset. This work’s contributions are: 1) proposing an algorithm for splitting hierarchical datasets into new datasets to increase the number of meta-instances, 2) proposing meta-features for hierarchical classification, and 3) interpreting decision-tree meta-models for hierarchical classification algorithm recommendation.

Keywords: algorithm recommendation, meta-learning, bioinformatics, hierarchical classification

Procedia PDF Downloads 314
200 Solanum tuberosum Ammonium Transporter Gene: Some Bioinformatics Insights

Authors: A. T. Adetunji, F. B. Lewu, R. Mundembe

Abstract:

Plants require nitrogen (N) to support desired production levels. Nitrogen is available to plants in the form of nitrate or ammonium, which are transported into the cell with the aid of various transport proteins. Ammonium transporters (AMTs) play a role in the uptake of ammonium, the form in which nitrogen is preferentially absorbed by plants. Solanum tuberosum AMT1 (StAMT1) was characterized using molecular biology and bioinformatics methods. Nucleotide database sequences were used to design AMT1-specific primers which were used to amplify the AMT1 internal regions. Nucleotide sequencing, alignment and phylogenetic analysis assigned StAMT1 to the AMT1 family. The deduced amino acid sequences showed that StAMT1 is 92%, 83% and 76% similar to Solanum lycopersicum LeAMT1.1, Lotus japonicus LjAMT1.1 and Solanum lycopersicum LeAMT1.2 respectively. StAMT1 fragments were shown to correspond to the 5th - 10th trans-membrane domains. Residue StAMT1 D15 is predicted to be essential for ammonium transport, while mutations of StAMT1 S76A may further enhance ammonium transport.

Keywords: ammonium transporter, bioinformatics, nitrogen, primers, Solanum tuberosum

Procedia PDF Downloads 248
199 Characterization of (GRAS37) Gibberellin Acid Insensitive (GAI), Repressor (RGA), and Scarecrow (SCR) Gene by Using Bioinformatics Tools

Authors: Yusra Tariq

Abstract:

The Grass 37 gene is presently known in tomatoes, which are the source of healthy substances such as ascorbic acid, polyphenols, carotenoids and nutrients. It has a significant impact on the growth and development of humans. The GRASS 37 gene is a plant Transcription factor group assuming significant parts in various reactions of different Abiotic stresses such as (drought, salinity, thermal stresses, temperature, and bright waves) which could highly affect the growth. Tomatoes are very sensitive to temperature, and their growth or production occurs optimally in a temperature range from 21 C to 29.5 C during the daytime and from 18.5 C to 21 C during the night. This protein acts as a positive regulator of salt stress response and abscisic acid signaling. This study summarizes the structure characterized by molecular formula and protein-binding domains by different bioinformatics tools such as Expasy translate tool, Expasy Portparam, Swiss Prot and Inter Pro Scan, Clustal W tool regulatory procedure of GRASS gene components, also their reactions to both biotic and Abiotic stresses.

Keywords: GRAS37, gene, bioinformatics, tool

Procedia PDF Downloads 53
198 Intellectual Property Protection of CRISPR Related Technologies

Authors: Zheng Miao, Dennis Fernandez

Abstract:

CRISPR research has the potential to completely transform life science, agriculture, live-stock and the health care industry. The Intellectual Property derived from its research has raised significant attention in the academic as well as the biopharmaceutical industry culminating an urgent need for strategic IP protection. We review the rudimentary concepts and key competitors of CRISPR technologies as well as the paramount strategies for intellectual property protection. Further, we elaborate on prosecution issues related to CRISPR patents as well as possible solutions to various patent laws, interferences and litigation. Finally, we address how the bioinformatics of the CRISPR technology begs an inquiry into issues of privacy and a host of ethical concerns.

Keywords: bioinformatics, CRISPR, biotechnology, intellectual property

Procedia PDF Downloads 252
197 Isolate-Specific Variations among Clinical Isolates of Brucella Identified by Whole-Genome Sequencing, Bioinformatics and Comparative Genomics

Authors: Abu S. Mustafa, Mohammad W. Khan, Faraz Shaheed Khan, Nazima Habibi

Abstract:

Brucellosis is a zoonotic disease of worldwide prevalence. There are at least four species and several strains of Brucella that cause human disease. Brucella genomes have very limited variation across strains, which hinder strain identification using classical molecular techniques, including PCR and 16 S rDNA sequencing. The aim of this study was to perform whole genome sequencing of clinical isolates of Brucella and perform bioinformatics and comparative genomics analyses to determine the existence of genetic differences across the isolates of a single Brucella species and strain. The draft sequence data were generated from 15 clinical isolates of Brucella melitensis (biovar 2 strain 63/9) using MiSeq next generation sequencing platform. The generated reads were used for further assembly and analysis. All the analysis was performed using Bioinformatics work station (8 core i7 processor, 8GB RAM with Bio-Linux operating system). FastQC was used to determine the quality of reads and low quality reads were trimmed or eliminated using Fastx_trimmer. Assembly was done by using Velvet and ABySS softwares. The ordering of assembled contigs was performed by Mauve. An online server RAST was employed to annotate the contigs assembly. Annotated genomes were compared using Mauve and ACT tools. The QC score for DNA sequence data, generated by MiSeq, was higher than 30 for 80% of reads with more than 100x coverage, which suggested that data could be utilized for further analysis. However when analyzed by FastQC, quality of four reads was not good enough for creating a complete genome draft so remaining 11 samples were used for further analysis. The comparative genome analyses showed that despite sharing same gene sets, single nucleotide polymorphisms and insertions/deletions existed across different genomes, which provided a variable extent of diversity to these bacteria. In conclusion, the next generation sequencing, bioinformatics, and comparative genome analysis can be utilized to find variations (point mutations, insertions and deletions) across different genomes of Brucella within a single strain. This information could be useful in surveillance and epidemiological studies supported by Kuwait University Research Sector grants MI04/15 and SRUL02/13.

Keywords: brucella, bioinformatics, comparative genomics, whole genome sequencing

Procedia PDF Downloads 382
196 Characterization of Solanum tuberosum Ammonium Transporter Gene Using Bioinformatics Approach

Authors: Adewole Tomiwa Adetunji, Francis Bayo Lewu, Richard Mundembe

Abstract:

Plants require nitrogen (N) to support desired production levels. There is a need for better understanding of N transport mechanism in order to improve N assimilation by plant root. Nitrogen is available to plants in the form of nitrate or ammonium, which are transported into the cell with the aid of various transport proteins. Ammonium transporters (AMTs) play a role in the uptake of ammonium, the form in which N is preferentially absorbed by plants. Solanum tuberosum AMT1 (StAMT1) was amplified, sequenced and characterized using molecular biology and bioinformatics methods. Nucleotide database sequences were used to design 976 base pairs AMT1-specific primers which include forward primer 5’- GCCATCGCCGCCGCCGG-3’ and reverse primer 5’-GGGTCAGATCCATACCCGC-3’. These primers were used to amplify the Solanum tuberosum AMT1 internal regions. Nucleotide sequencing, alignment and phylogenetic analysis assigned StAMT1 to the AMT1 family due to the clade and high similarity it shared with other plant AMT1 genes. The deduced amino acid sequences showed that StAMT1 is 92%, 83% and 76% similar to Solanum lycopersicum LeAMT1.1, Lotus japonicus LjAMT1.1, and Solanum lycopersicum LeAMT1.2 respectively. StAMT1 fragments were shown to correspond to the 5th-10th trans-membrane domains. Residue StAMT1 D15 is predicted to be essential for ammonium transport, while mutations of StAMT1 S76A may further enhance ammonium transport.

Keywords: ammonium transporter, bioinformatics, nitrogen, primers, Solanum tuberosum

Procedia PDF Downloads 226
195 Bioinformatics Approach to Identify Physicochemical and Structural Properties Associated with Successful Cell-free Protein Synthesis

Authors: Alexander A. Tokmakov

Abstract:

Cell-free protein synthesis is widely used to synthesize recombinant proteins. It allows genome-scale expression of various polypeptides under strictly controlled uniform conditions. However, only a minor fraction of all proteins can be successfully expressed in the systems of protein synthesis that are currently used. The factors determining expression success are poorly understood. At present, the vast volume of data is accumulated in cell-free expression databases. It makes possible comprehensive bioinformatics analysis and identification of multiple features associated with successful cell-free expression. Here, we describe an approach aimed at identification of multiple physicochemical and structural properties of amino acid sequences associated with protein solubility and aggregation and highlight major correlations obtained using this approach. The developed method includes: categorical assessment of the protein expression data, calculation and prediction of multiple properties of expressed amino acid sequences, correlation of the individual properties with the expression scores, and evaluation of statistical significance of the observed correlations. Using this approach, we revealed a number of statistically significant correlations between calculated and predicted features of protein sequences and their amenability to cell-free expression. It was found that some of the features, such as protein pI, hydrophobicity, presence of signal sequences, etc., are mostly related to protein solubility, whereas the others, such as protein length, number of disulfide bonds, content of secondary structure, etc., affect mainly the expression propensity. We also demonstrated that amenability of polypeptide sequences to cell-free expression correlates with the presence of multiple sites of post-translational modifications. The correlations revealed in this study provide a plethora of important insights into protein folding and rationalization of protein production. The developed bioinformatics approach can be of practical use for predicting expression success and optimizing cell-free protein synthesis.

Keywords: bioinformatics analysis, cell-free protein synthesis, expression success, optimization, recombinant proteins

Procedia PDF Downloads 419
194 Uncovering Anti-Hypertensive Obesity Targets and Mechanisms of Metformin, an Anti-Diabetic Medication

Authors: Lu Yang, Keng Po Lai

Abstract:

Metformin, a well-known clinical drug against diabetes, is found with potential anti-diabetic and anti-obese benefits, as reported in increasing evidences. However, the current clinical and experimental investigations are not to reveal the detailed mechanisms of metformin-anti-obesity/hypertension. We have used the bioinformatics strategy, including network pharmacology and molecular docking methodology, to uncover the key targets and pathways of bioactive compounds against clinical disorders, such as cancers, coronavirus disease. Thus, in this report, the in-silico approach was utilized to identify the hug targets, pharmacological function, and mechanism of metformin against obesity and hypertension. The networking analysis identified 154 differentially expressed genes of obesity and hypertension, 21 interaction genes, and 6 hug genes of metformin treating hypertensive obesity. As a result, the molecular docking findings indicated the potent binding capability of metformin with the key proteins, including interleukin 6 (IL-6) and chemokine (C-C motif) Ligand 2 (CCL2), in hypertensive obesity. The metformin-exerted anti-hypertensive obesity action involved in metabolic regulation, inflammatory reaction. And the anti-hypertensive obesity mechanisms of metformin were revealed, including regulation of inflammatory and immunological signaling pathways for metabolic homeostasis in tissue and microenvironmental melioration in blood pressure. In conclusion, our identified findings with bioinformatics analysis have demonstrated the detailed hug and pharmacological targets, biological functions, and signaling pathways of metformin treating hypertensive obesity.

Keywords: metformin, obesity, hypertension, bioinformatics findings

Procedia PDF Downloads 122
193 Prediction and Identification of a Permissive Epitope Insertion Site for St Toxoid in cfaB from Enterotoxigenic Escherichia coli

Authors: N. Zeinalzadeh, Mahdi Sadeghi

Abstract:

Enterotoxigenic Escherichia coli (ETEC) is the most common cause of non-inflammatory diarrhea in the developing countries, resulting in approximately 20% of all diarrheal episodes in children in these areas. ST is one of the most important virulence factors and CFA/I is one of the frequent colonization factors that help to process of ETEC infection. ST and CfaB (CFA/I subunit) are among vaccine candidates against ETEC. So, ST because of its small size is not a good immunogenic in the natural form. However to increase its immunogenic potential, here we explored candidate positions for ST insertion in CfaB sequence. After bioinformatics analysis, one of the candidate positions was selected and the chimeric gene (cfaB*st) sequence was synthesized and expressed in E. coli BL21 (DE3). The chimeric recombinant protein was purified with Ni-NTA columns and characterized with western blot analysis. The residue 74-75 of CfaB sequence could be a good candidate position for ST and other epitopes insertion.

Keywords: bioinformatics, CFA/I, enterotoxigenic E. coli, ST toxoid

Procedia PDF Downloads 448
192 Identification and Characterization of Small Peptides Encoded by Small Open Reading Frames using Mass Spectrometry and Bioinformatics

Authors: Su Mon Saw, Joe Rothnagel

Abstract:

Short open reading frames (sORFs) located in 5’UTR of mRNAs are known as uORFs. Characterization of uORF-encoded peptides (uPEPs) i.e., a subset of short open reading frame encoded peptides (sPEPs) and their translation regulation lead to understanding of causes of genetic disease, proteome complexity and development of treatments. Existence of uORFs within cellular proteome could be detected by LC-MS/MS. The ability of uORF to be translated into uPEP and achievement of uPEP identification will allow uPEP’s characterization, structures, functions, subcellular localization, evolutionary maintenance (conservation in human and other species) and abundance in cells. It is hypothesized that a subset of sORFs are translatable and that their encoded sPEPs are functional and are endogenously expressed contributing to the eukaryotic cellular proteome complexity. This project aimed to investigate whether sORFs encode functional peptides. Liquid chromatography-mass spectrometry (LC-MS) and bioinformatics were thus employed. Due to probable low abundance of sPEPs and small in sizes, the need for efficient peptide enrichment strategies for enriching small proteins and depleting the sub-proteome of large and abundant proteins is crucial for identifying sPEPs. Low molecular weight proteins were extracted using SDS-PAGE from Human Embryonic Kidney (HEK293) cells and Strong Cation Exchange Chromatography (SCX) from secreted HEK293 cells. Extracted proteins were digested by trypsin to peptides, which were detected by LC-MS/MS. The MS/MS data obtained was searched against Swiss-Prot using MASCOT version 2.4 to filter out known proteins, and all unmatched spectra were re-searched against human RefSeq database. ProteinPilot v5.0.1 was used to identify sPEPs by searching against human RefSeq, Vanderperre and Human Alternative Open Reading Frame (HaltORF) databases. Potential sPEPs were analyzed by bioinformatics. Since SDS PAGE electrophoresis could not separate proteins <20kDa, this could not identify sPEPs. All MASCOT-identified peptide fragments were parts of main open reading frame (mORF) by ORF Finder search and blastp search. No sPEP was detected and existence of sPEPs could not be identified in this study. 13 translated sORFs in HEK293 cells by mass spectrometry in previous studies were characterized by bioinformatics. Identified sPEPs from previous studies were <100 amino acids and <15 kDa. Bioinformatics results showed that sORFs are translated to sPEPs and contribute to proteome complexity. uPEP translated from uORF of SLC35A4 was strongly conserved in human and mouse while uPEP translated from uORF of MKKS was strongly conserved in human and Rhesus monkey. Cross-species conserved uORFs in association with protein translation strongly suggest evolutionary maintenance of coding sequence and indicate probable functional expression of peptides encoded within these uORFs. Translation of sORFs was confirmed by mass spectrometry and sPEPs were characterized with bioinformatics.

Keywords: bioinformatics, HEK293 cells, liquid chromatography-mass spectrometry, ProteinPilot, Strong Cation Exchange Chromatography, SDS-PAGE, sPEPs

Procedia PDF Downloads 188
191 Knowledge Engineering Based Smart Healthcare Solution

Authors: Rhaed Khiati, Muhammad Hanif

Abstract:

In the past decade, smart healthcare systems have been on an ascendant drift, especially with the evolution of hospitals and their increasing reliance on bioinformatics and software specializing in healthcare. Doctors have become reliant on technology more than ever, something that in the past would have been looked down upon, as technology has become imperative in reducing overall costs and improving the quality of patient care. With patient-doctor interactions becoming more necessary and more complicated than ever, systems must be developed while taking into account costs, patient comfort, and patient data, among other things. In this work, we proposed a smart hospital bed, which mixes the complexity and big data usage of traditional healthcare systems with the comfort found in soft beds while taking certain concerns like data confidentiality, security, and maintaining SLA agreements, etc. into account. This research work potentially provides users, namely patients and doctors, with a seamless interaction with to their respective nurses, as well as faster access to up-to-date personal data, including prescriptions and severity of the condition in contrast to the previous research in the area where there is lack of consideration of such provisions.

Keywords: big data, smart healthcare, distributed systems, bioinformatics

Procedia PDF Downloads 198
190 Hsa-miR-192-5p, and Hsa-miR-129-5p Prominent Biomarkers in Regulation Glioblastoma Cancer Stem Cells Genes Microenvironment

Authors: Rasha Ahmadi

Abstract:

Glioblastoma is one of the most frequent brain malignancies, having a high mortality rate and limited survival in individuals with this malignancy. Despite different treatments and surgery, recurrence of glioblastoma cancer stem cells may arise as a subsequent tumor. For this reason, it is crucial to research the markers associated with glioblastoma stem cells and specifically their microenvironment. In this study, using bioinformatics analysis, we analyzed and nominated genes in the microenvironment pathways of glioblastoma stem cells. In this study, an appropriate database was selected for analysis by referring to the GEO database. This dataset comprised gene expression patterns in stem cells derived from glioblastoma patients. Gene clusters were divided as high and low expression. Enrichment databases such as Enrichr, STRING, and GEPIA were utilized to analyze the data appropriately. Finally, we extracted the potential genes 2700 high-expression and 1100 low-expression genes are implicated in the metabolic pathways of glioblastoma cancer progression. Cellular senescence, MAPK, TNF, hypoxia, zimosterol biosynthesis, and phosphatidylinositol metabolism pathways were substantially expressed and the metabolic pathways were downregulated. After assessing the association between protein networks, MSMP, SOX2, FGD4 ,and CNTNAP3 genes with high expression and DMKN and SBSN genes with low were selected. All of these genes were observed in the survival curve, with a survival of fewer than 10 percent over around 15 months. hsa-mir-192-5p, hsa-mir-129-5p, hsa-mir-215-5p, hsa-mir-335-5p, and hsa-mir-340-5p played key function in glioblastoma cancer stem cells microenviroments. We introduced critical genes through integrated and regular bioinformatics studies by assessing the amount of gene expression profile data that can play an important role in targeting genes involved in the energy and microenvironment of glioblastoma cancer stem cells. Have. This study indicated that hsa-mir-192-5p, and hsa-mir-129-5p are appropriate candidates for this.

Keywords: Glioblastoma, Cancer Stem Cells, Biomarker Discovery, Gene Expression Profiles, Bioinformatics Analysis, Tumor Microenvironment

Procedia PDF Downloads 144
189 Bioinformatics and Molecular Biological Characterization of a Hypothetical Protein SAV1226 as a Potential Drug Target for Methicillin/Vancomycin-Staphylococcus aureus Infections

Authors: Nichole Haag, Kimberly Velk, Tyler McCune, Chun Wu

Abstract:

Methicillin/multiple-resistant Staphylococcus aureus (MRSA) are infectious bacteria that are resistant to common antibiotics. A previous in silico study in our group has identified a hypothetical protein SAV1226 as one of the potential drug targets. In this study, we reported the bioinformatics characterization, as well as cloning, expression, purification and kinetic assays of hypothetical protein SAV1226 from methicillin/vancomycin-resistant Staphylococcus aureus Mu50 strain. MALDI-TOF/MS analysis revealed a low degree of structural similarity with known proteins. Kinetic assays demonstrated that hypothetical protein SAV1226 is neither a domain of an ATP dependent dihydroxyacetone kinase nor of a phosphotransferase system (PTS) dihydroxyacetone kinase, suggesting that the function of hypothetical protein SAV1226 might be misannotated on public databases such as UniProt and InterProScan 5.

Keywords: Methicillin-resistant Staphylococcus aureus, dihydroxyacetone kinase, essential genes, drug target, phosphoryl group donor

Procedia PDF Downloads 407
188 Bioinformatics Approach to Support Genetic Research in Autism in Mali

Authors: M. Kouyate, M. Sangare, S. Samake, S. Keita, H. G. Kim, D. H. Geschwind

Abstract:

Background & Objectives: Human genetic studies can be expensive, even unaffordable, in developing countries, partly due to the sequencing costs. Our aim is to pilot the use of bioinformatics tools to guide scientifically valid, locally relevant, and economically sound autism genetic research in Mali. Methods: The following databases, NCBI, HGMD, and LSDB, were used to identify hot point mutations. Phenotype, transmission pattern, theoretical protein expression in the brain, the impact of the mutation on the 3D structure of the protein) were used to prioritize selected autism genes. We used the protein database, Modeller, and clustal W. Results: We found Mef2c (Gly27Ala/Leu38Gln), Pten (Thr131IIle), Prodh (Leu289Met), Nme1 (Ser120Gly), and Dhcr7 (Pro227Thr/Glu224Lys). These mutations were associated with endonucleases BseRI, NspI, PfrJS2IV, BspGI, BsaBI, and SpoDI, respectively. Gly27Ala/Leu38Gln mutations impacted the 3D structure of the Mef2c protein. Mef2c protein sequences across species showed a high percentage of similarity with a highly conserved MADS domain. Discussion: Mef2c, Pten, Prodh, Nme1, and Dhcr 7 gene mutation frequencies in the Malian population will be very informative. PCR coupled with restriction enzyme digestion can be used to screen the targeted gene mutations. Sanger sequencing will be used for confirmation only. This will cut down considerably the sequencing cost for gene-to-gene mutation screening. The knowledge of the 3D structure and potential impact of the mutations on Mef2c protein informed the protein family and altered function (ex. Leu38Gln). Conclusion & Future Work: Bio-informatics will positively impact autism research in Mali. Our approach can be applied to another neuropsychiatric disorder.

Keywords: bioinformatics, endonucleases, autism, Sanger sequencing, point mutations

Procedia PDF Downloads 83
187 Towards End-To-End Disease Prediction from Raw Metagenomic Data

Authors: Maxence Queyrel, Edi Prifti, Alexandre Templier, Jean-Daniel Zucker

Abstract:

Analysis of the human microbiome using metagenomic sequencing data has demonstrated high ability in discriminating various human diseases. Raw metagenomic sequencing data require multiple complex and computationally heavy bioinformatics steps prior to data analysis. Such data contain millions of short sequences read from the fragmented DNA sequences and stored as fastq files. Conventional processing pipelines consist in multiple steps including quality control, filtering, alignment of sequences against genomic catalogs (genes, species, taxonomic levels, functional pathways, etc.). These pipelines are complex to use, time consuming and rely on a large number of parameters that often provide variability and impact the estimation of the microbiome elements. Training Deep Neural Networks directly from raw sequencing data is a promising approach to bypass some of the challenges associated with mainstream bioinformatics pipelines. Most of these methods use the concept of word and sentence embeddings that create a meaningful and numerical representation of DNA sequences, while extracting features and reducing the dimensionality of the data. In this paper we present an end-to-end approach that classifies patients into disease groups directly from raw metagenomic reads: metagenome2vec. This approach is composed of four steps (i) generating a vocabulary of k-mers and learning their numerical embeddings; (ii) learning DNA sequence (read) embeddings; (iii) identifying the genome from which the sequence is most likely to come and (iv) training a multiple instance learning classifier which predicts the phenotype based on the vector representation of the raw data. An attention mechanism is applied in the network so that the model can be interpreted, assigning a weight to the influence of the prediction for each genome. Using two public real-life data-sets as well a simulated one, we demonstrated that this original approach reaches high performance, comparable with the state-of-the-art methods applied directly on processed data though mainstream bioinformatics workflows. These results are encouraging for this proof of concept work. We believe that with further dedication, the DNN models have the potential to surpass mainstream bioinformatics workflows in disease classification tasks.

Keywords: deep learning, disease prediction, end-to-end machine learning, metagenomics, multiple instance learning, precision medicine

Procedia PDF Downloads 125
186 A Systems Approach to Targeting Cyclooxygenase: Genomics, Bioinformatics and Metabolomics Analysis of COX-1 -/- and COX-2-/- Lung Fibroblasts Providing Indication of Sterile Inflammation

Authors: Abul B. M. M. K. Islam, Mandar Dave, Roderick V. Jensen, Ashok R. Amin

Abstract:

A systems approach was applied to characterize differentially expressed transcripts, bioinformatics pathways, and proteins and prostaglandins (PGs) from lung fibroblasts procured from wild-type (WT), COX-1-/- and COX-2-/- mice to understand system level control mechanism. Bioinformatics analysis of COX-2 and COX-1 ablated cells induced COX-1 and COX-2 specific signature respectively, which significantly overlapped with an 'IL-1β induced inflammatory signature'. This defined novel cross-talk signals that orchestrated coordinated activation of pathways of sterile inflammation sensed by cellular stress. The overlapping signals showed significant over-representation of shared pathways for interferon y and immune responses, T cell functions, NOD, and toll-like receptor signaling. Gene Ontology Biological Process (GOBP) and pathway enrichment analysis specifically showed an increase in mRNA expression associated with: (a) organ development and homeostasis in COX-1-/- cells and (b) oxidative stress and response, spliceosomes and proteasomes activity, mTOR and p53 signaling in COX-2-/- cells. COX-1 and COX-2 showed signs of functional pathways committed to cell cycle and DNA replication at the genomics level. As compared to WT, metabolomics analysis revealed a significant increase in COX-1 mRNA and synthesis of basal levels of eicosanoids (PGE2, PGD2, TXB2, LTB4, PGF1α, and PGF2α) in COX-2 ablated cells and increase in synthesis of PGE2, and PGF1α in COX-1 null cells. There was a compensation of PGE2 and PGF1α in COX-1-/- and COX-2-/- cells. Collectively, these results support a broader, differential and collaborative regulation of both COX-1 and COX-2 pathways at the metabolic, signaling, and genomics levels in cellular homeostasis and sterile inflammation induced by cellular stress.

Keywords: cyclooxygenases, inflammation, lung fibroblasts, systemic

Procedia PDF Downloads 292
185 A Single Cell Omics Experiments as Tool for Benchmarking Bioinformatics Oncology Data Analysis Tools

Authors: Maddalena Arigoni, Maria Luisa Ratto, Raffaele A. Calogero, Luca Alessandri

Abstract:

The presence of tumor heterogeneity, where distinct cancer cells exhibit diverse morphological and phenotypic profiles, including gene expression, metabolism, and proliferation, poses challenges for molecular prognostic markers and patient classification for targeted therapies. Understanding the causes and progression of cancer requires research efforts aimed at characterizing heterogeneity, which can be facilitated by evolving single-cell sequencing technologies. However, analyzing single-cell data necessitates computational methods that often lack objective validation. Therefore, the establishment of benchmarking datasets is necessary to provide a controlled environment for validating bioinformatics tools in the field of single-cell oncology. Benchmarking bioinformatics tools for single-cell experiments can be costly due to the high expense involved. Therefore, datasets used for benchmarking are typically sourced from publicly available experiments, which often lack a comprehensive cell annotation. This limitation can affect the accuracy and effectiveness of such experiments as benchmarking tools. To address this issue, we introduce omics benchmark experiments designed to evaluate bioinformatics tools to depict the heterogeneity in single-cell tumor experiments. We conducted single-cell RNA sequencing on six lung cancer tumor cell lines that display resistant clones upon treatment of EGFR mutated tumors and are characterized by driver genes, namely ROS1, ALK, HER2, MET, KRAS, and BRAF. These driver genes are associated with downstream networks controlled by EGFR mutations, such as JAK-STAT, PI3K-AKT-mTOR, and MEK-ERK. The experiment also featured an EGFR-mutated cell line. Using 10XGenomics platform with cellplex technology, we analyzed the seven cell lines together with a pseudo-immunological microenvironment consisting of PBMC cells labeled with the Biolegend TotalSeq™-B Human Universal Cocktail (CITEseq). This technology allowed for independent labeling of each cell line and single-cell analysis of the pooled seven cell lines and the pseudo-microenvironment. The data generated from the aforementioned experiments are available as part of an online tool, which allows users to define cell heterogeneity and generates count tables as an output. The tool provides the cell line derivation for each cell and cell annotations for the pseudo-microenvironment based on CITEseq data by an experienced immunologist. Additionally, we created a range of pseudo-tumor tissues using different ratios of the aforementioned cells embedded in matrigel. These tissues were analyzed using 10XGenomics (FFPE samples) and Curio Bioscience (fresh frozen samples) platforms for spatial transcriptomics, further expanding the scope of our benchmark experiments. The benchmark experiments we conducted provide a unique opportunity to evaluate the performance of bioinformatics tools for detecting and characterizing tumor heterogeneity at the single-cell level. Overall, our experiments provide a controlled and standardized environment for assessing the accuracy and robustness of bioinformatics tools for studying tumor heterogeneity at the single-cell level, which can ultimately lead to more precise and effective cancer diagnosis and treatment.

Keywords: single cell omics, benchmark, spatial transcriptomics, CITEseq

Procedia PDF Downloads 117
184 Syntax and Words as Evolutionary Characters in Comparative Linguistics

Authors: Nancy Retzlaff, Sarah J. Berkemer, Trudie Strauss

Abstract:

In the last couple of decades, the advent of digitalization of any kind of data was probably one of the major advances in all fields of study. This paves the way for also analysing these data even though they might come from disciplines where there was no initial computational necessity to do so. Especially in linguistics, one can find a rather manual tradition. Still when considering studies that involve the history of language families it is hard to overlook the striking similarities to bioinformatics (phylogenetic) approaches. Alignments of words are such a fairly well studied example of an application of bioinformatics methods to historical linguistics. In this paper we will not only consider alignments of strings, i.e., words in this case, but also alignments of syntax trees of selected Indo-European languages. Based on initial, crude alignments, a sophisticated scoring model is trained on both letters and syntactic features. The aim is to gain a better understanding on which features in two languages are related, i.e., most likely to have the same root. Initially, all words in two languages are pre-aligned with a basic scoring model that primarily selects consonants and adjusts them before fitting in the vowels. Mixture models are subsequently used to filter ‘good’ alignments depending on the alignment length and the number of inserted gaps. Using these selected word alignments it is possible to perform tree alignments of the given syntax trees and consequently find sentences that correspond rather well to each other across languages. The syntax alignments are then filtered for meaningful scores—’good’ scores contain evolutionary information and are therefore used to train the sophisticated scoring model. Further iterations of alignments and training steps are performed until the scoring model saturates, i.e., barely changes anymore. A better evaluation of the trained scoring model and its function in containing evolutionary meaningful information will be given. An assessment of sentence alignment compared to possible phrase structure will also be provided. The method described here may have its flaws because of limited prior information. This, however, may offer a good starting point to study languages where only little prior knowledge is available and a detailed, unbiased study is needed.

Keywords: alignments, bioinformatics, comparative linguistics, historical linguistics, statistical methods

Procedia PDF Downloads 154
183 Differentially Expressed Genes in Atopic Dermatitis: Bioinformatics Analysis Of Pooled Microarray Gene Expression Datasets In Gene Expression Omnibus

Authors: Danna Jia, Bin Li

Abstract:

Background: Atopic dermatitis (AD) is a chronic and refractory inflammatory skin disease characterized by relapsing eczematous and pruritic skin lesions. The global prevalence of AD ranges from 1~ 20%, and its incidence rates are increasing. It affects individuals from infancy to adulthood, significantly impacting their daily lives and social activities. Despite its major health burden, the precise mechanisms underlying AD remain unknown. Understanding the genetic differences associated with AD is crucial for advancing diagnosis and targeted treatment development. This study aims to identify candidate genes of AD by using bioinformatics analysis. Methods: We conducted a comprehensive analysis of four pooled transcriptomic datasets (GSE16161, GSE32924, GSE130588, and GSE120721) obtained from the Gene Expression Omnibus (GEO) database. Differential gene expression analysis was performed using the R statistical language. The differentially expressed genes (DEGs) between AD patients and normal individuals were functionally analyzed using Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment. Furthermore, a protein-protein interaction (PPI) network was constructed to identify candidate genes. Results: Among the patient-level gene expression datasets, we identified 114 shared DEGs, consisting of 53 upregulated genes and 61 downregulated genes. Functional analysis using GO and KEGG revealed that the DEGs were mainly associated with the negative regulation of transcription from RNA polymerase II promoter, membrane-related functions, protein binding, and the Human papillomavirus infection pathway. Through the PPI network analysis, we identified eight core genes: CD44, STAT1, HMMR, AURKA, MKI67, and SMARCA4. Conclusion: This study elucidates key genes associated with AD, providing potential targets for diagnosis and treatment. The identified genes have the potential to contribute to the understanding and management of AD. The bioinformatics analysis conducted in this study offers new insights and directions for further research on AD. Future studies can focus on validating the functional roles of these genes and exploring their therapeutic potential in AD. While these findings will require further verification as achieved with experiments involving in vivo and in vitro models, these results provided some initial insights into dysfunctional inflammatory and immune responses associated with AD. Such information offers the potential to develop novel therapeutic targets for use in preventing and treating AD.

Keywords: atopic dermatitis, bioinformatics, biomarkers, genes

Procedia PDF Downloads 82
182 Identification of Disease Causing DNA Motifs in Human DNA Using Clustering Approach

Authors: G. Tamilpavai, C. Vishnuppriya

Abstract:

Studying DNA (deoxyribonucleic acid) sequence is useful in biological processes and it is applied in the fields such as diagnostic and forensic research. DNA is the hereditary information in human and almost all other organisms. It is passed to their generations. Earlier stage detection of defective DNA sequence may lead to many developments in the field of Bioinformatics. Nowadays various tedious techniques are used to identify defective DNA. The proposed work is to analyze and identify the cancer-causing DNA motif in a given sequence. Initially the human DNA sequence is separated as k-mers using k-mer separation rule. The separated k-mers are clustered using Self Organizing Map (SOM). Using Levenshtein distance measure, cancer associated DNA motif is identified from the k-mer clusters. Experimental results of this work indicate the presence or absence of cancer causing DNA motif. If the cancer associated DNA motif is found in DNA, it is declared as the cancer disease causing DNA sequence. Otherwise the input human DNA is declared as normal sequence. Finally, elapsed time is calculated for finding the presence of cancer causing DNA motif using clustering formation. It is compared with normal process of finding cancer causing DNA motif. Locating cancer associated motif is easier in cluster formation process than the other one. The proposed work will be an initiative aid for finding genetic disease related research.

Keywords: bioinformatics, cancer motif, DNA, k-mers, Levenshtein distance, SOM

Procedia PDF Downloads 188
181 Comparison of Rumen Microbial Analysis Pipelines Based on 16s rRNA Gene Sequencing

Authors: Xiaoxing Ye

Abstract:

To investigate complex rumen microbial communities, 16S ribosomal RNA (rRNA) sequencing is widely used. Here, we evaluated the impact of bioinformatics pipelines on the observation of OTUs and taxonomic classification of 750 cattle rumen microbial samples by comparing three commonly used pipelines (LotuS, UPARSE, and QIIME) with Usearch. In LotuS-based analyses, 189 archaeal and 3894 bacterial OTUs were observed. The observed OTUs for the Usearch analysis were significantly larger than the LotuS results. We discovered 1495 OTUs for archaea and 92665 OTUs for bacteria using Usearch analysis. In addition, taxonomic assignments were made for the rumen microbial samples. All pipelines had consistent taxonomic annotations from the phylum to the genus level. A difference in relative abundance was calculated for all microbial levels, including Bacteroidetes (QIIME: 72.2%, Usearch: 74.09%), Firmicutes (QIIME: 18.3%, Usearch: 20.20%) for the bacterial phylum, Methanobacteriales (QIIME: 64.2%, Usearch: 45.7%) for the archaeal class, Methanobacteriaceae (QIIME: 35%, Usearch: 45.7%) and Methanomassiliicoccaceae (QIIME: 35%, Usearch: 31.13%) for archaeal family. However, the most prevalent archaeal class varied between these two annotation pipelines. The Thermoplasmata was the top class according to the QIIME annotation, whereas Methanobacteria was the top class according to Usearch.

Keywords: cattle rumen, rumen microbial, 16S rRNA gene sequencing, bioinformatics pipeline

Procedia PDF Downloads 88
180 Estimation of Transition and Emission Probabilities

Authors: Aakansha Gupta, Neha Vadnere, Tapasvi Soni, M. Anbarsi

Abstract:

Protein secondary structure prediction is one of the most important goals pursued by bioinformatics and theoretical chemistry; it is highly important in medicine and biotechnology. Some aspects of protein functions and genome analysis can be predicted by secondary structure prediction. This is used to help annotate sequences, classify proteins, identify domains, and recognize functional motifs. In this paper, we represent protein secondary structure as a mathematical model. To extract and predict the protein secondary structure from the primary structure, we require a set of parameters. Any constants appearing in the model are specified by these parameters, which also provide a mechanism for efficient and accurate use of data. To estimate these model parameters there are many algorithms out of which the most popular one is the EM algorithm or called the Expectation Maximization Algorithm. These model parameters are estimated with the use of protein datasets like RS126 by using the Bayesian Probabilistic method (data set being categorical). This paper can then be extended into comparing the efficiency of EM algorithm to the other algorithms for estimating the model parameters, which will in turn lead to an efficient component for the Protein Secondary Structure Prediction. Further this paper provides a scope to use these parameters for predicting secondary structure of proteins using machine learning techniques like neural networks and fuzzy logic. The ultimate objective will be to obtain greater accuracy better than the previously achieved.

Keywords: model parameters, expectation maximization algorithm, protein secondary structure prediction, bioinformatics

Procedia PDF Downloads 480
179 The Use of Network Tool for Brain Signal Data Analysis: A Case Study with Blind and Sighted Individuals

Authors: Cleiton Pons Ferreira, Diana Francisca Adamatti

Abstract:

Advancements in computers technology have allowed to obtain information for research in biology and neuroscience. In order to transform the data from these surveys, networks have long been used to represent important biological processes, changing the use of this tools from purely illustrative and didactic to more analytic, even including interaction analysis and hypothesis formulation. Many studies have involved this application, but not directly for interpretation of data obtained from brain functions, asking for new perspectives of development in neuroinformatics using existent models of tools already disseminated by the bioinformatics. This study includes an analysis of neurological data through electroencephalogram (EEG) signals, using the Cytoscape, an open source software tool for visualizing complex networks in biological databases. The data were obtained from a comparative case study developed in a research from the University of Rio Grande (FURG), using the EEG signals from a Brain Computer Interface (BCI) with 32 eletrodes prepared in the brain of a blind and a sighted individuals during the execution of an activity that stimulated the spatial ability. This study intends to present results that lead to better ways for use and adapt techniques that support the data treatment of brain signals for elevate the understanding and learning in neuroscience.

Keywords: neuroinformatics, bioinformatics, network tools, brain mapping

Procedia PDF Downloads 182
178 Intra-miR-ExploreR, a Novel Bioinformatics Platform for Integrated Discovery of MiRNA:mRNA Gene Regulatory Networks

Authors: Surajit Bhattacharya, Daniel Veltri, Atit A. Patel, Daniel N. Cox

Abstract:

miRNAs have emerged as key post-transcriptional regulators of gene expression, however identification of biologically-relevant target genes for this epigenetic regulatory mechanism remains a significant challenge. To address this knowledge gap, we have developed a novel tool in R, Intra-miR-ExploreR, that facilitates integrated discovery of miRNA targets by incorporating target databases and novel target prediction algorithms, using statistical methods including Pearson and Distance Correlation on microarray data, to arrive at high confidence intragenic miRNA target predictions. We have explored the efficacy of this tool using Drosophila melanogaster as a model organism for bioinformatics analyses and functional validation. A number of putative targets were obtained which were also validated using qRT-PCR analysis. Additional features of the tool include downloadable text files containing GO analysis from DAVID and Pubmed links of literature related to gene sets. Moreover, we are constructing interaction maps of intragenic miRNAs, using both micro array and RNA-seq data, focusing on neural tissues to uncover regulatory codes via which these molecules regulate gene expression to direct cellular development.

Keywords: miRNA, miRNA:mRNA target prediction, statistical methods, miRNA:mRNA interaction network

Procedia PDF Downloads 509
177 Bioinformatic Study of Follicle Stimulating Hormone Receptor (FSHR) Gene in Different Buffalo Breeds

Authors: Hamid Mustafa, Adeela Ajmal, Kim EuiSoo, Noor-ul-Ain

Abstract:

World wild, buffalo production is considered as most important component of food industry. Efficient buffalo production is related with reproductive performance of this species. Lack of knowledge of reproductive efficiency and its related genes in buffalo species is a major constraint for sustainable buffalo production. In this study, we performed some bioinformatics analysis on Follicle Stimulating Hormone Receptor (FSHR) gene and explored the possible relationship of this gene among different buffalo breeds and with other farm animals. We also found the evolution pattern for this gene among these species. We investigate CDS lengths, Stop codon variation, homology search, signal peptide, isoelectic point, tertiary structure, motifs and phylogenetic tree. The results of this study indicate 4 different motif in this gene, which are Activin-recp, GS motif, STYKc Protein kinase and transmembrane. The results also indicate that this gene has very close relationship with cattle, bison, sheep and goat. Multiple alignment (MA) showed high conservation of motif which indicates constancy of this gene during evolution. The results of this study can be used and applied for better understanding of this gene for better characterization of Follicle Stimulating Hormone Receptor (FSHR) gene structure in different farm animals, which would be helpful for efficient breeding plans for animal’s production.

Keywords: buffalo, FSHR gene, bioinformatics, production

Procedia PDF Downloads 532
176 Molecular Portraits: The Role of Posttranslational Modification in Cancer Metastasis

Authors: Navkiran Kaur, Apoorva Mathur, Abhishree Agarwal, Sakshi Gupta, Tuhin Rashmi

Abstract:

Aim: Breast cancer is the most common cancer in women worldwide, and resistance to the current therapeutics, often concurrently, is an increasing clinical challenge. Glycosylation of proteins is one of the most important post-translational modifications. It is widely known that aberrant glycosylation has been implicated in many different diseases due to changes associated with biological function and protein folding. Alterations in cell surface glycosylation, can promote invasive behavior of tumor cells that ultimately lead to the progression of cancer. In breast cancer, there is an increasing evidence pertaining to the role of glycosylation in tumor formation and metastasis. In the present study, an attempt has been made to study the disease associated sialoglycoproteins in breast cancer by using bioinformatics tools. The sequence will be retrieved from UniProt database. A database in the form of a word document was made by a collection of FASTA sequences of breast cancer gene sequence. Glycosylation was studied using yinOyang tool on ExPASy and Differential genes expression and protein analysis was done in context of breast cancer metastasis. The number of residues predicted O-glc NAc threshold containing 50 aberrant glycosylation sites or more was detected and recorded for individual sequence. We found that the there is a significant change in the expression profiling of glycosylation patterns of various proteins associated with breast cancer. Differential aberrant glycosylated proteins in breast cancer cells with respect to non-neoplastic cells are an important factor for the overall progression and development of cancer.

Keywords: breast cancer, bioinformatics, cancer, metastasis, glycosylation

Procedia PDF Downloads 294
175 Finding the Longest Common Subsequence in Normal DNA and Disease Affected Human DNA Using Self Organizing Map

Authors: G. Tamilpavai, C. Vishnuppriya

Abstract:

Bioinformatics is an active research area which combines biological matter as well as computer science research. The longest common subsequence (LCSS) is one of the major challenges in various bioinformatics applications. The computation of the LCSS plays a vital role in biomedicine and also it is an essential task in DNA sequence analysis in genetics. It includes wide range of disease diagnosing steps. The objective of this proposed system is to find the longest common subsequence which presents in a normal and various disease affected human DNA sequence using Self Organizing Map (SOM) and LCSS. The human DNA sequence is collected from National Center for Biotechnology Information (NCBI) database. Initially, the human DNA sequence is separated as k-mer using k-mer separation rule. Mean and median values are calculated from each separated k-mer. These calculated values are fed as input to the Self Organizing Map for the purpose of clustering. Then obtained clusters are given to the Longest Common Sub Sequence (LCSS) algorithm for finding common subsequence which presents in every clusters. It returns nx(n-1)/2 subsequence for each cluster where n is number of k-mer in a specific cluster. Experimental outcomes of this proposed system produce the possible number of longest common subsequence of normal and disease affected DNA data. Thus the proposed system will be a good initiative aid for finding disease causing sequence. Finally, performance analysis is carried out for different DNA sequences. The obtained values show that the retrieval of LCSS is done in a shorter time than the existing system.

Keywords: clustering, k-mers, longest common subsequence, SOM

Procedia PDF Downloads 267
174 Easymodel: Web-based Bioinformatics Software for Protein Modeling Based on Modeller

Authors: Alireza Dantism

Abstract:

Presently, describing the function of a protein sequence is one of the most common problems in biology. Usually, this problem can be facilitated by studying the three-dimensional structure of proteins. In the absence of a protein structure, comparative modeling often provides a useful three-dimensional model of the protein that is dependent on at least one known protein structure. Comparative modeling predicts the three-dimensional structure of a given protein sequence (target) mainly based on its alignment with one or more proteins of known structure (templates). Comparative modeling consists of four main steps 1. Similarity between the target sequence and at least one known template structure 2. Alignment of target sequence and template(s) 3. Build a model based on alignment with the selected template(s). 4. Prediction of model errors 5. Optimization of the built model There are many computer programs and web servers that automate the comparative modeling process. One of the most important advantages of these servers is that it makes comparative modeling available to both experts and non-experts, and they can easily do their own modeling without the need for programming knowledge, but some other experts prefer using programming knowledge and do their modeling manually because by doing this they can maximize the accuracy of their modeling. In this study, a web-based tool has been designed to predict the tertiary structure of proteins using PHP and Python programming languages. This tool is called EasyModel. EasyModel can receive, according to the user's inputs, the desired unknown sequence (which we know as the target) in this study, the protein sequence file (template), etc., which also has a percentage of similarity with the primary sequence, and its third structure Predict the unknown sequence and present the results in the form of graphs and constructed protein files.

Keywords: structural bioinformatics, protein tertiary structure prediction, modeling, comparative modeling, modeller

Procedia PDF Downloads 97