Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 27916

Search results for: bioinformatics analysis

27916 The Development and Provision of a Knowledge Management Ecosystem, Optimized for Genomics

Abstract:

The field of bioinformatics has made, and continues to make, substantial progress and contributions to life science research and development. However, this paper contends that a systems approach integrates bioinformatics activities for any project in a defined manner. The application of critical control points in this bioinformatics systems approach may be useful to identify and evaluate points in a pathway where specified activity risk can be reduced, monitored and quality enhanced.

Keywords: bioinformatics, food security, personalized medicine, systems approach

Procedia PDF Downloads 422

27915 Evaluation and Assessment of Bioinformatics Methods and Their Applications

Authors: Fatemeh Nokhodchi Bonab

Abstract:

Bioinformatics, in its broad sense, involves application of computer processes to solve biological problems. A wide range of computational tools are needed to effectively and efficiently process large amounts of data being generated as a result of recent technological innovations in biology and medicine. A number of computational tools have been developed or adapted to deal with the experimental riches of complex and multivariate data and transition from data collection to information or knowledge. These bioinformatics tools are being evaluated and applied in various medical areas including early detection, risk assessment, classification, and prognosis of cancer. The goal of these efforts is to develop and identify bioinformatics methods with optimal sensitivity, specificity, and predictive capabilities. The recent flood of data from genome sequences and functional genomics has given rise to new field, bioinformatics, which combines elements of biology and computer science. Bioinformatics is conceptualizing biology in terms of macromolecules (in the sense of physical-chemistry) and then applying "informatics" techniques (derived from disciplines such as applied maths, computer science, and statistics) to understand and organize the information associated with these molecules, on a large-scale. Here we propose a definition for this new field and review some of the research that is being pursued, particularly in relation to transcriptional regulatory systems.

Keywords: methods, applications, transcriptional regulatory systems, techniques

Procedia PDF Downloads 127

27914 Isolate-Specific Variations among Clinical Isolates of Brucella Identified by Whole-Genome Sequencing, Bioinformatics and Comparative Genomics

Authors: Abu S. Mustafa, Mohammad W. Khan, Faraz Shaheed Khan, Nazima Habibi

Abstract:

Brucellosis is a zoonotic disease of worldwide prevalence. There are at least four species and several strains of Brucella that cause human disease. Brucella genomes have very limited variation across strains, which hinder strain identification using classical molecular techniques, including PCR and 16 S rDNA sequencing. The aim of this study was to perform whole genome sequencing of clinical isolates of Brucella and perform bioinformatics and comparative genomics analyses to determine the existence of genetic differences across the isolates of a single Brucella species and strain. The draft sequence data were generated from 15 clinical isolates of Brucella melitensis (biovar 2 strain 63/9) using MiSeq next generation sequencing platform. The generated reads were used for further assembly and analysis. All the analysis was performed using Bioinformatics work station (8 core i7 processor, 8GB RAM with Bio-Linux operating system). FastQC was used to determine the quality of reads and low quality reads were trimmed or eliminated using Fastx_trimmer. Assembly was done by using Velvet and ABySS softwares. The ordering of assembled contigs was performed by Mauve. An online server RAST was employed to annotate the contigs assembly. Annotated genomes were compared using Mauve and ACT tools. The QC score for DNA sequence data, generated by MiSeq, was higher than 30 for 80% of reads with more than 100x coverage, which suggested that data could be utilized for further analysis. However when analyzed by FastQC, quality of four reads was not good enough for creating a complete genome draft so remaining 11 samples were used for further analysis. The comparative genome analyses showed that despite sharing same gene sets, single nucleotide polymorphisms and insertions/deletions existed across different genomes, which provided a variable extent of diversity to these bacteria. In conclusion, the next generation sequencing, bioinformatics, and comparative genome analysis can be utilized to find variations (point mutations, insertions and deletions) across different genomes of Brucella within a single strain. This information could be useful in surveillance and epidemiological studies supported by Kuwait University Research Sector grants MI04/15 and SRUL02/13.

Keywords: brucella, bioinformatics, comparative genomics, whole genome sequencing

Procedia PDF Downloads 382

27913 Solanum tuberosum Ammonium Transporter Gene: Some Bioinformatics Insights

Authors: A. T. Adetunji, F. B. Lewu, R. Mundembe

Abstract:

Plants require nitrogen (N) to support desired production levels. Nitrogen is available to plants in the form of nitrate or ammonium, which are transported into the cell with the aid of various transport proteins. Ammonium transporters (AMTs) play a role in the uptake of ammonium, the form in which nitrogen is preferentially absorbed by plants. Solanum tuberosum AMT1 (StAMT1) was characterized using molecular biology and bioinformatics methods. Nucleotide database sequences were used to design AMT1-specific primers which were used to amplify the AMT1 internal regions. Nucleotide sequencing, alignment and phylogenetic analysis assigned StAMT1 to the AMT1 family. The deduced amino acid sequences showed that StAMT1 is 92%, 83% and 76% similar to Solanum lycopersicum LeAMT1.1, Lotus japonicus LjAMT1.1 and Solanum lycopersicum LeAMT1.2 respectively. StAMT1 fragments were shown to correspond to the 5th - 10th trans-membrane domains. Residue StAMT1 D15 is predicted to be essential for ammonium transport, while mutations of StAMT1 S76A may further enhance ammonium transport.

Keywords: ammonium transporter, bioinformatics, nitrogen, primers, Solanum tuberosum

Procedia PDF Downloads 248

27912 Prediction and Identification of a Permissive Epitope Insertion Site for St Toxoid in cfaB from Enterotoxigenic Escherichia coli

Authors: N. Zeinalzadeh, Mahdi Sadeghi

Abstract:

Enterotoxigenic Escherichia coli (ETEC) is the most common cause of non-inflammatory diarrhea in the developing countries, resulting in approximately 20% of all diarrheal episodes in children in these areas. ST is one of the most important virulence factors and CFA/I is one of the frequent colonization factors that help to process of ETEC infection. ST and CfaB (CFA/I subunit) are among vaccine candidates against ETEC. So, ST because of its small size is not a good immunogenic in the natural form. However to increase its immunogenic potential, here we explored candidate positions for ST insertion in CfaB sequence. After bioinformatics analysis, one of the candidate positions was selected and the chimeric gene (cfaB*st) sequence was synthesized and expressed in E. coli BL21 (DE3). The chimeric recombinant protein was purified with Ni-NTA columns and characterized with western blot analysis. The residue 74-75 of CfaB sequence could be a good candidate position for ST and other epitopes insertion.

Keywords: bioinformatics, CFA/I, enterotoxigenic E. coli, ST toxoid

Procedia PDF Downloads 448

27911 Meta-Learning for Hierarchical Classification and Applications in Bioinformatics

Authors: Fabio Fabris, Alex A. Freitas

Abstract:

Hierarchical classification is a special type of classification task where the class labels are organised into a hierarchy, with more generic class labels being ancestors of more specific ones. Meta-learning for classification-algorithm recommendation consists of recommending to the user a classification algorithm, from a pool of candidate algorithms, for a dataset, based on the past performance of the candidate algorithms in other datasets. Meta-learning is normally used in conventional, non-hierarchical classification. By contrast, this paper proposes a meta-learning approach for more challenging task of hierarchical classification, and evaluates it in a large number of bioinformatics datasets. Hierarchical classification is especially relevant for bioinformatics problems, as protein and gene functions tend to be organised into a hierarchy of class labels. This work proposes meta-learning approach for recommending the best hierarchical classification algorithm to a hierarchical classification dataset. This work’s contributions are: 1) proposing an algorithm for splitting hierarchical datasets into new datasets to increase the number of meta-instances, 2) proposing meta-features for hierarchical classification, and 3) interpreting decision-tree meta-models for hierarchical classification algorithm recommendation.

Keywords: algorithm recommendation, meta-learning, bioinformatics, hierarchical classification

Procedia PDF Downloads 314

27910 Uncovering Anti-Hypertensive Obesity Targets and Mechanisms of Metformin, an Anti-Diabetic Medication

Authors: Lu Yang, Keng Po Lai

Abstract:

Metformin, a well-known clinical drug against diabetes, is found with potential anti-diabetic and anti-obese benefits, as reported in increasing evidences. However, the current clinical and experimental investigations are not to reveal the detailed mechanisms of metformin-anti-obesity/hypertension. We have used the bioinformatics strategy, including network pharmacology and molecular docking methodology, to uncover the key targets and pathways of bioactive compounds against clinical disorders, such as cancers, coronavirus disease. Thus, in this report, the in-silico approach was utilized to identify the hug targets, pharmacological function, and mechanism of metformin against obesity and hypertension. The networking analysis identified 154 differentially expressed genes of obesity and hypertension, 21 interaction genes, and 6 hug genes of metformin treating hypertensive obesity. As a result, the molecular docking findings indicated the potent binding capability of metformin with the key proteins, including interleukin 6 (IL-6) and chemokine (C-C motif) Ligand 2 (CCL2), in hypertensive obesity. The metformin-exerted anti-hypertensive obesity action involved in metabolic regulation, inflammatory reaction. And the anti-hypertensive obesity mechanisms of metformin were revealed, including regulation of inflammatory and immunological signaling pathways for metabolic homeostasis in tissue and microenvironmental melioration in blood pressure. In conclusion, our identified findings with bioinformatics analysis have demonstrated the detailed hug and pharmacological targets, biological functions, and signaling pathways of metformin treating hypertensive obesity.

Keywords: metformin, obesity, hypertension, bioinformatics findings

Procedia PDF Downloads 122

27909 Bioinformatics Approach to Identify Physicochemical and Structural Properties Associated with Successful Cell-free Protein Synthesis

Authors: Alexander A. Tokmakov

Abstract:

Cell-free protein synthesis is widely used to synthesize recombinant proteins. It allows genome-scale expression of various polypeptides under strictly controlled uniform conditions. However, only a minor fraction of all proteins can be successfully expressed in the systems of protein synthesis that are currently used. The factors determining expression success are poorly understood. At present, the vast volume of data is accumulated in cell-free expression databases. It makes possible comprehensive bioinformatics analysis and identification of multiple features associated with successful cell-free expression. Here, we describe an approach aimed at identification of multiple physicochemical and structural properties of amino acid sequences associated with protein solubility and aggregation and highlight major correlations obtained using this approach. The developed method includes: categorical assessment of the protein expression data, calculation and prediction of multiple properties of expressed amino acid sequences, correlation of the individual properties with the expression scores, and evaluation of statistical significance of the observed correlations. Using this approach, we revealed a number of statistically significant correlations between calculated and predicted features of protein sequences and their amenability to cell-free expression. It was found that some of the features, such as protein pI, hydrophobicity, presence of signal sequences, etc., are mostly related to protein solubility, whereas the others, such as protein length, number of disulfide bonds, content of secondary structure, etc., affect mainly the expression propensity. We also demonstrated that amenability of polypeptide sequences to cell-free expression correlates with the presence of multiple sites of post-translational modifications. The correlations revealed in this study provide a plethora of important insights into protein folding and rationalization of protein production. The developed bioinformatics approach can be of practical use for predicting expression success and optimizing cell-free protein synthesis.

Keywords: bioinformatics analysis, cell-free protein synthesis, expression success, optimization, recombinant proteins

Procedia PDF Downloads 419

27908 Hsa-miR-192-5p, and Hsa-miR-129-5p Prominent Biomarkers in Regulation Glioblastoma Cancer Stem Cells Genes Microenvironment

Authors: Rasha Ahmadi

Abstract:

Glioblastoma is one of the most frequent brain malignancies, having a high mortality rate and limited survival in individuals with this malignancy. Despite different treatments and surgery, recurrence of glioblastoma cancer stem cells may arise as a subsequent tumor. For this reason, it is crucial to research the markers associated with glioblastoma stem cells and specifically their microenvironment. In this study, using bioinformatics analysis, we analyzed and nominated genes in the microenvironment pathways of glioblastoma stem cells. In this study, an appropriate database was selected for analysis by referring to the GEO database. This dataset comprised gene expression patterns in stem cells derived from glioblastoma patients. Gene clusters were divided as high and low expression. Enrichment databases such as Enrichr, STRING, and GEPIA were utilized to analyze the data appropriately. Finally, we extracted the potential genes 2700 high-expression and 1100 low-expression genes are implicated in the metabolic pathways of glioblastoma cancer progression. Cellular senescence, MAPK, TNF, hypoxia, zimosterol biosynthesis, and phosphatidylinositol metabolism pathways were substantially expressed and the metabolic pathways were downregulated. After assessing the association between protein networks, MSMP, SOX2, FGD4 ,and CNTNAP3 genes with high expression and DMKN and SBSN genes with low were selected. All of these genes were observed in the survival curve, with a survival of fewer than 10 percent over around 15 months. hsa-mir-192-5p, hsa-mir-129-5p, hsa-mir-215-5p, hsa-mir-335-5p, and hsa-mir-340-5p played key function in glioblastoma cancer stem cells microenviroments. We introduced critical genes through integrated and regular bioinformatics studies by assessing the amount of gene expression profile data that can play an important role in targeting genes involved in the energy and microenvironment of glioblastoma cancer stem cells. Have. This study indicated that hsa-mir-192-5p, and hsa-mir-129-5p are appropriate candidates for this.

Keywords: Glioblastoma, Cancer Stem Cells, Biomarker Discovery, Gene Expression Profiles, Bioinformatics Analysis, Tumor Microenvironment

Procedia PDF Downloads 144

27907 Differentially Expressed Genes in Atopic Dermatitis: Bioinformatics Analysis Of Pooled Microarray Gene Expression Datasets In Gene Expression Omnibus

Authors: Danna Jia, Bin Li

Abstract:

Background: Atopic dermatitis (AD) is a chronic and refractory inflammatory skin disease characterized by relapsing eczematous and pruritic skin lesions. The global prevalence of AD ranges from 1~ 20%, and its incidence rates are increasing. It affects individuals from infancy to adulthood, significantly impacting their daily lives and social activities. Despite its major health burden, the precise mechanisms underlying AD remain unknown. Understanding the genetic differences associated with AD is crucial for advancing diagnosis and targeted treatment development. This study aims to identify candidate genes of AD by using bioinformatics analysis. Methods: We conducted a comprehensive analysis of four pooled transcriptomic datasets (GSE16161, GSE32924, GSE130588, and GSE120721) obtained from the Gene Expression Omnibus (GEO) database. Differential gene expression analysis was performed using the R statistical language. The differentially expressed genes (DEGs) between AD patients and normal individuals were functionally analyzed using Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment. Furthermore, a protein-protein interaction (PPI) network was constructed to identify candidate genes. Results: Among the patient-level gene expression datasets, we identified 114 shared DEGs, consisting of 53 upregulated genes and 61 downregulated genes. Functional analysis using GO and KEGG revealed that the DEGs were mainly associated with the negative regulation of transcription from RNA polymerase II promoter, membrane-related functions, protein binding, and the Human papillomavirus infection pathway. Through the PPI network analysis, we identified eight core genes: CD44, STAT1, HMMR, AURKA, MKI67, and SMARCA4. Conclusion: This study elucidates key genes associated with AD, providing potential targets for diagnosis and treatment. The identified genes have the potential to contribute to the understanding and management of AD. The bioinformatics analysis conducted in this study offers new insights and directions for further research on AD. Future studies can focus on validating the functional roles of these genes and exploring their therapeutic potential in AD. While these findings will require further verification as achieved with experiments involving in vivo and in vitro models, these results provided some initial insights into dysfunctional inflammatory and immune responses associated with AD. Such information offers the potential to develop novel therapeutic targets for use in preventing and treating AD.

Keywords: atopic dermatitis, bioinformatics, biomarkers, genes

Procedia PDF Downloads 82

27906 A Systems Approach to Targeting Cyclooxygenase: Genomics, Bioinformatics and Metabolomics Analysis of COX-1 -/- and COX-2-/- Lung Fibroblasts Providing Indication of Sterile Inflammation

Authors: Abul B. M. M. K. Islam, Mandar Dave, Roderick V. Jensen, Ashok R. Amin

Abstract:

A systems approach was applied to characterize differentially expressed transcripts, bioinformatics pathways, and proteins and prostaglandins (PGs) from lung fibroblasts procured from wild-type (WT), COX-1-/- and COX-2-/- mice to understand system level control mechanism. Bioinformatics analysis of COX-2 and COX-1 ablated cells induced COX-1 and COX-2 specific signature respectively, which significantly overlapped with an 'IL-1β induced inflammatory signature'. This defined novel cross-talk signals that orchestrated coordinated activation of pathways of sterile inflammation sensed by cellular stress. The overlapping signals showed significant over-representation of shared pathways for interferon y and immune responses, T cell functions, NOD, and toll-like receptor signaling. Gene Ontology Biological Process (GOBP) and pathway enrichment analysis specifically showed an increase in mRNA expression associated with: (a) organ development and homeostasis in COX-1-/- cells and (b) oxidative stress and response, spliceosomes and proteasomes activity, mTOR and p53 signaling in COX-2-/- cells. COX-1 and COX-2 showed signs of functional pathways committed to cell cycle and DNA replication at the genomics level. As compared to WT, metabolomics analysis revealed a significant increase in COX-1 mRNA and synthesis of basal levels of eicosanoids (PGE2, PGD2, TXB2, LTB4, PGF1α, and PGF2α) in COX-2 ablated cells and increase in synthesis of PGE2, and PGF1α in COX-1 null cells. There was a compensation of PGE2 and PGF1α in COX-1-/- and COX-2-/- cells. Collectively, these results support a broader, differential and collaborative regulation of both COX-1 and COX-2 pathways at the metabolic, signaling, and genomics levels in cellular homeostasis and sterile inflammation induced by cellular stress.

Keywords: cyclooxygenases, inflammation, lung fibroblasts, systemic

Procedia PDF Downloads 292

27905 Comparison of Rumen Microbial Analysis Pipelines Based on 16s rRNA Gene Sequencing

Authors: Xiaoxing Ye

Abstract:

To investigate complex rumen microbial communities, 16S ribosomal RNA (rRNA) sequencing is widely used. Here, we evaluated the impact of bioinformatics pipelines on the observation of OTUs and taxonomic classification of 750 cattle rumen microbial samples by comparing three commonly used pipelines (LotuS, UPARSE, and QIIME) with Usearch. In LotuS-based analyses, 189 archaeal and 3894 bacterial OTUs were observed. The observed OTUs for the Usearch analysis were significantly larger than the LotuS results. We discovered 1495 OTUs for archaea and 92665 OTUs for bacteria using Usearch analysis. In addition, taxonomic assignments were made for the rumen microbial samples. All pipelines had consistent taxonomic annotations from the phylum to the genus level. A difference in relative abundance was calculated for all microbial levels, including Bacteroidetes (QIIME: 72.2%, Usearch: 74.09%), Firmicutes (QIIME: 18.3%, Usearch: 20.20%) for the bacterial phylum, Methanobacteriales (QIIME: 64.2%, Usearch: 45.7%) for the archaeal class, Methanobacteriaceae (QIIME: 35%, Usearch: 45.7%) and Methanomassiliicoccaceae (QIIME: 35%, Usearch: 31.13%) for archaeal family. However, the most prevalent archaeal class varied between these two annotation pipelines. The Thermoplasmata was the top class according to the QIIME annotation, whereas Methanobacteria was the top class according to Usearch.

Keywords: cattle rumen, rumen microbial, 16S rRNA gene sequencing, bioinformatics pipeline

Procedia PDF Downloads 88

27904 BingleSeq: A User-Friendly R Package for Single-Cell RNA-Seq Data Analysis

Authors: Quan Gu, Daniel Dimitrov

Abstract:

BingleSeq was developed as a shiny-based, intuitive, and comprehensive application that enables the analysis of single-Cell RNA-Sequencing count data. This was achieved via incorporating three state-of-the-art software packages for each type of RNA sequencing analysis, alongside functional annotation analysis and a way to assess the overlap of differential expression method results. At its current state, the functionality implemented within BingleSeq is comparable to that of other applications, also developed with the purpose of lowering the entry requirements to RNA Sequencing analyses. BingleSeq is available on GitHub and will be submitted to R/Bioconductor.

Keywords: bioinformatics, functional annotation analysis, single-cell RNA-sequencing, transcriptomics

Procedia PDF Downloads 204

27903 The Use of Network Tool for Brain Signal Data Analysis: A Case Study with Blind and Sighted Individuals

Authors: Cleiton Pons Ferreira, Diana Francisca Adamatti

Abstract:

Advancements in computers technology have allowed to obtain information for research in biology and neuroscience. In order to transform the data from these surveys, networks have long been used to represent important biological processes, changing the use of this tools from purely illustrative and didactic to more analytic, even including interaction analysis and hypothesis formulation. Many studies have involved this application, but not directly for interpretation of data obtained from brain functions, asking for new perspectives of development in neuroinformatics using existent models of tools already disseminated by the bioinformatics. This study includes an analysis of neurological data through electroencephalogram (EEG) signals, using the Cytoscape, an open source software tool for visualizing complex networks in biological databases. The data were obtained from a comparative case study developed in a research from the University of Rio Grande (FURG), using the EEG signals from a Brain Computer Interface (BCI) with 32 eletrodes prepared in the brain of a blind and a sighted individuals during the execution of an activity that stimulated the spatial ability. This study intends to present results that lead to better ways for use and adapt techniques that support the data treatment of brain signals for elevate the understanding and learning in neuroscience.

Keywords: neuroinformatics, bioinformatics, network tools, brain mapping

Procedia PDF Downloads 182

27902 Characterization of Solanum tuberosum Ammonium Transporter Gene Using Bioinformatics Approach

Authors: Adewole Tomiwa Adetunji, Francis Bayo Lewu, Richard Mundembe

Abstract:

Plants require nitrogen (N) to support desired production levels. There is a need for better understanding of N transport mechanism in order to improve N assimilation by plant root. Nitrogen is available to plants in the form of nitrate or ammonium, which are transported into the cell with the aid of various transport proteins. Ammonium transporters (AMTs) play a role in the uptake of ammonium, the form in which N is preferentially absorbed by plants. Solanum tuberosum AMT1 (StAMT1) was amplified, sequenced and characterized using molecular biology and bioinformatics methods. Nucleotide database sequences were used to design 976 base pairs AMT1-specific primers which include forward primer 5’- GCCATCGCCGCCGCCGG-3’ and reverse primer 5’-GGGTCAGATCCATACCCGC-3’. These primers were used to amplify the Solanum tuberosum AMT1 internal regions. Nucleotide sequencing, alignment and phylogenetic analysis assigned StAMT1 to the AMT1 family due to the clade and high similarity it shared with other plant AMT1 genes. The deduced amino acid sequences showed that StAMT1 is 92%, 83% and 76% similar to Solanum lycopersicum LeAMT1.1, Lotus japonicus LjAMT1.1, and Solanum lycopersicum LeAMT1.2 respectively. StAMT1 fragments were shown to correspond to the 5th-10th trans-membrane domains. Residue StAMT1 D15 is predicted to be essential for ammonium transport, while mutations of StAMT1 S76A may further enhance ammonium transport.

Keywords: ammonium transporter, bioinformatics, nitrogen, primers, Solanum tuberosum

Procedia PDF Downloads 226

27901 Characterization of (GRAS37) Gibberellin Acid Insensitive (GAI), Repressor (RGA), and Scarecrow (SCR) Gene by Using Bioinformatics Tools

Authors: Yusra Tariq

Abstract:

The Grass 37 gene is presently known in tomatoes, which are the source of healthy substances such as ascorbic acid, polyphenols, carotenoids and nutrients. It has a significant impact on the growth and development of humans. The GRASS 37 gene is a plant Transcription factor group assuming significant parts in various reactions of different Abiotic stresses such as (drought, salinity, thermal stresses, temperature, and bright waves) which could highly affect the growth. Tomatoes are very sensitive to temperature, and their growth or production occurs optimally in a temperature range from 21 C to 29.5 C during the daytime and from 18.5 C to 21 C during the night. This protein acts as a positive regulator of salt stress response and abscisic acid signaling. This study summarizes the structure characterized by molecular formula and protein-binding domains by different bioinformatics tools such as Expasy translate tool, Expasy Portparam, Swiss Prot and Inter Pro Scan, Clustal W tool regulatory procedure of GRASS gene components, also their reactions to both biotic and Abiotic stresses.

Keywords: GRAS37, gene, bioinformatics, tool

Procedia PDF Downloads 53

27900 Towards End-To-End Disease Prediction from Raw Metagenomic Data

Authors: Maxence Queyrel, Edi Prifti, Alexandre Templier, Jean-Daniel Zucker

Abstract:

Analysis of the human microbiome using metagenomic sequencing data has demonstrated high ability in discriminating various human diseases. Raw metagenomic sequencing data require multiple complex and computationally heavy bioinformatics steps prior to data analysis. Such data contain millions of short sequences read from the fragmented DNA sequences and stored as fastq files. Conventional processing pipelines consist in multiple steps including quality control, filtering, alignment of sequences against genomic catalogs (genes, species, taxonomic levels, functional pathways, etc.). These pipelines are complex to use, time consuming and rely on a large number of parameters that often provide variability and impact the estimation of the microbiome elements. Training Deep Neural Networks directly from raw sequencing data is a promising approach to bypass some of the challenges associated with mainstream bioinformatics pipelines. Most of these methods use the concept of word and sentence embeddings that create a meaningful and numerical representation of DNA sequences, while extracting features and reducing the dimensionality of the data. In this paper we present an end-to-end approach that classifies patients into disease groups directly from raw metagenomic reads: metagenome2vec. This approach is composed of four steps (i) generating a vocabulary of k-mers and learning their numerical embeddings; (ii) learning DNA sequence (read) embeddings; (iii) identifying the genome from which the sequence is most likely to come and (iv) training a multiple instance learning classifier which predicts the phenotype based on the vector representation of the raw data. An attention mechanism is applied in the network so that the model can be interpreted, assigning a weight to the influence of the prediction for each genome. Using two public real-life data-sets as well a simulated one, we demonstrated that this original approach reaches high performance, comparable with the state-of-the-art methods applied directly on processed data though mainstream bioinformatics workflows. These results are encouraging for this proof of concept work. We believe that with further dedication, the DNN models have the potential to surpass mainstream bioinformatics workflows in disease classification tasks.

Keywords: deep learning, disease prediction, end-to-end machine learning, metagenomics, multiple instance learning, precision medicine

Procedia PDF Downloads 125

27899 TAXAPRO, A Streamlined Pipeline to Analyze Shotgun Metagenomes

Authors: Sofia Sehli, Zainab El Ouafi, Casey Eddington, Soumaya Jbara, Kasambula Arthur Shem, Islam El Jaddaoui, Ayorinde Afolayan, Olaitan I. Awe, Allissa Dillman, Hassan Ghazal

Abstract:

The ability to promptly sequence whole genomes at a relatively low cost has revolutionized the way we study the microbiome. Microbiologists are no longer limited to studying what can be grown in a laboratory and instead are given the opportunity to rapidly identify the makeup of microbial communities in a wide variety of environments. Analyzing whole genome sequencing (WGS) data is a complex process that involves multiple moving parts and might be rather unintuitive for scientists that don’t typically work with this type of data. Thus, to help lower the barrier for less-computationally inclined individuals, TAXAPRO was developed at the first Omics Codeathon held virtually by the African Society for Bioinformatics and Computational Biology (ASBCB) in June 2021. TAXAPRO is an advanced metagenomics pipeline that accurately assembles organelle genomes from whole-genome sequencing data. TAXAPRO seamlessly combines WGS analysis tools to create a pipeline that automatically processes raw WGS data and presents organism abundance information in both a tabular and graphical format. TAXAPRO was evaluated using COVID-19 patient gut microbiome data. Analysis performed by TAXAPRO demonstrated a high abundance of Clostridia and Bacteroidia genera and a low abundance of Proteobacteria genera relative to others in the gut microbiome of patients hospitalized with COVID-19, consistent with the original findings derived using a different analysis methodology. This provides crucial evidence that the TAXAPRO workflow dispenses reliable organism abundance information overnight without the hassle of performing the analysis manually.

Keywords: metagenomics, shotgun metagenomic sequence analysis, COVID-19, pipeline, bioinformatics

Procedia PDF Downloads 220

27898 Intellectual Property Protection of CRISPR Related Technologies

Authors: Zheng Miao, Dennis Fernandez

Abstract:

CRISPR research has the potential to completely transform life science, agriculture, live-stock and the health care industry. The Intellectual Property derived from its research has raised significant attention in the academic as well as the biopharmaceutical industry culminating an urgent need for strategic IP protection. We review the rudimentary concepts and key competitors of CRISPR technologies as well as the paramount strategies for intellectual property protection. Further, we elaborate on prosecution issues related to CRISPR patents as well as possible solutions to various patent laws, interferences and litigation. Finally, we address how the bioinformatics of the CRISPR technology begs an inquiry into issues of privacy and a host of ethical concerns.

Keywords: bioinformatics, CRISPR, biotechnology, intellectual property

Procedia PDF Downloads 252

27897 Bioinformatics and Molecular Biological Characterization of a Hypothetical Protein SAV1226 as a Potential Drug Target for Methicillin/Vancomycin-Staphylococcus aureus Infections

Authors: Nichole Haag, Kimberly Velk, Tyler McCune, Chun Wu

Abstract:

Methicillin/multiple-resistant Staphylococcus aureus (MRSA) are infectious bacteria that are resistant to common antibiotics. A previous in silico study in our group has identified a hypothetical protein SAV1226 as one of the potential drug targets. In this study, we reported the bioinformatics characterization, as well as cloning, expression, purification and kinetic assays of hypothetical protein SAV1226 from methicillin/vancomycin-resistant Staphylococcus aureus Mu50 strain. MALDI-TOF/MS analysis revealed a low degree of structural similarity with known proteins. Kinetic assays demonstrated that hypothetical protein SAV1226 is neither a domain of an ATP dependent dihydroxyacetone kinase nor of a phosphotransferase system (PTS) dihydroxyacetone kinase, suggesting that the function of hypothetical protein SAV1226 might be misannotated on public databases such as UniProt and InterProScan 5.

Keywords: Methicillin-resistant Staphylococcus aureus, dihydroxyacetone kinase, essential genes, drug target, phosphoryl group donor

Procedia PDF Downloads 407

27896 Intra-miR-ExploreR, a Novel Bioinformatics Platform for Integrated Discovery of MiRNA:mRNA Gene Regulatory Networks

Authors: Surajit Bhattacharya, Daniel Veltri, Atit A. Patel, Daniel N. Cox

Abstract:

miRNAs have emerged as key post-transcriptional regulators of gene expression, however identification of biologically-relevant target genes for this epigenetic regulatory mechanism remains a significant challenge. To address this knowledge gap, we have developed a novel tool in R, Intra-miR-ExploreR, that facilitates integrated discovery of miRNA targets by incorporating target databases and novel target prediction algorithms, using statistical methods including Pearson and Distance Correlation on microarray data, to arrive at high confidence intragenic miRNA target predictions. We have explored the efficacy of this tool using Drosophila melanogaster as a model organism for bioinformatics analyses and functional validation. A number of putative targets were obtained which were also validated using qRT-PCR analysis. Additional features of the tool include downloadable text files containing GO analysis from DAVID and Pubmed links of literature related to gene sets. Moreover, we are constructing interaction maps of intragenic miRNAs, using both micro array and RNA-seq data, focusing on neural tissues to uncover regulatory codes via which these molecules regulate gene expression to direct cellular development.

Keywords: miRNA, miRNA:mRNA target prediction, statistical methods, miRNA:mRNA interaction network

Procedia PDF Downloads 510

27895 Finding the Longest Common Subsequence in Normal DNA and Disease Affected Human DNA Using Self Organizing Map

Authors: G. Tamilpavai, C. Vishnuppriya

Abstract:

Bioinformatics is an active research area which combines biological matter as well as computer science research. The longest common subsequence (LCSS) is one of the major challenges in various bioinformatics applications. The computation of the LCSS plays a vital role in biomedicine and also it is an essential task in DNA sequence analysis in genetics. It includes wide range of disease diagnosing steps. The objective of this proposed system is to find the longest common subsequence which presents in a normal and various disease affected human DNA sequence using Self Organizing Map (SOM) and LCSS. The human DNA sequence is collected from National Center for Biotechnology Information (NCBI) database. Initially, the human DNA sequence is separated as k-mer using k-mer separation rule. Mean and median values are calculated from each separated k-mer. These calculated values are fed as input to the Self Organizing Map for the purpose of clustering. Then obtained clusters are given to the Longest Common Sub Sequence (LCSS) algorithm for finding common subsequence which presents in every clusters. It returns nx(n-1)/2 subsequence for each cluster where n is number of k-mer in a specific cluster. Experimental outcomes of this proposed system produce the possible number of longest common subsequence of normal and disease affected DNA data. Thus the proposed system will be a good initiative aid for finding disease causing sequence. Finally, performance analysis is carried out for different DNA sequences. The obtained values show that the retrieval of LCSS is done in a shorter time than the existing system.

Keywords: clustering, k-mers, longest common subsequence, SOM

Procedia PDF Downloads 267

27894 A Single Cell Omics Experiments as Tool for Benchmarking Bioinformatics Oncology Data Analysis Tools

Authors: Maddalena Arigoni, Maria Luisa Ratto, Raffaele A. Calogero, Luca Alessandri

Abstract:

The presence of tumor heterogeneity, where distinct cancer cells exhibit diverse morphological and phenotypic profiles, including gene expression, metabolism, and proliferation, poses challenges for molecular prognostic markers and patient classification for targeted therapies. Understanding the causes and progression of cancer requires research efforts aimed at characterizing heterogeneity, which can be facilitated by evolving single-cell sequencing technologies. However, analyzing single-cell data necessitates computational methods that often lack objective validation. Therefore, the establishment of benchmarking datasets is necessary to provide a controlled environment for validating bioinformatics tools in the field of single-cell oncology. Benchmarking bioinformatics tools for single-cell experiments can be costly due to the high expense involved. Therefore, datasets used for benchmarking are typically sourced from publicly available experiments, which often lack a comprehensive cell annotation. This limitation can affect the accuracy and effectiveness of such experiments as benchmarking tools. To address this issue, we introduce omics benchmark experiments designed to evaluate bioinformatics tools to depict the heterogeneity in single-cell tumor experiments. We conducted single-cell RNA sequencing on six lung cancer tumor cell lines that display resistant clones upon treatment of EGFR mutated tumors and are characterized by driver genes, namely ROS1, ALK, HER2, MET, KRAS, and BRAF. These driver genes are associated with downstream networks controlled by EGFR mutations, such as JAK-STAT, PI3K-AKT-mTOR, and MEK-ERK. The experiment also featured an EGFR-mutated cell line. Using 10XGenomics platform with cellplex technology, we analyzed the seven cell lines together with a pseudo-immunological microenvironment consisting of PBMC cells labeled with the Biolegend TotalSeq™-B Human Universal Cocktail (CITEseq). This technology allowed for independent labeling of each cell line and single-cell analysis of the pooled seven cell lines and the pseudo-microenvironment. The data generated from the aforementioned experiments are available as part of an online tool, which allows users to define cell heterogeneity and generates count tables as an output. The tool provides the cell line derivation for each cell and cell annotations for the pseudo-microenvironment based on CITEseq data by an experienced immunologist. Additionally, we created a range of pseudo-tumor tissues using different ratios of the aforementioned cells embedded in matrigel. These tissues were analyzed using 10XGenomics (FFPE samples) and Curio Bioscience (fresh frozen samples) platforms for spatial transcriptomics, further expanding the scope of our benchmark experiments. The benchmark experiments we conducted provide a unique opportunity to evaluate the performance of bioinformatics tools for detecting and characterizing tumor heterogeneity at the single-cell level. Overall, our experiments provide a controlled and standardized environment for assessing the accuracy and robustness of bioinformatics tools for studying tumor heterogeneity at the single-cell level, which can ultimately lead to more precise and effective cancer diagnosis and treatment.

Keywords: single cell omics, benchmark, spatial transcriptomics, CITEseq

Procedia PDF Downloads 117

27893 A Comprehensive Analysis of LACK (Leishmania Homologue of Receptors for Activated C Kinase) in the Context of Visceral Leishmaniasis

Authors: Sukrat Sinha, Abhay Kumar, Shanthy Sundaram

Abstract:

The Leishmania homologue of activated C kinase (LACK) is known T cell epitope from soluble Leishmania antigens (SLA) that confers protection against Leishmania challenge. This antigen has been found to be highly conserved among Leishmania strains. LACK has been shown to be protective against L. donovani challenge. A comprehensive analysis of several LACK sequences was completed. The analysis shows a high level of conservation, lower variability and higher antigenicity in specific portions of the LACK protein. This information provides insights for the potential consideration of LACK as a putative candidate in the context of visceral Leishmaniasis vaccine target.

Keywords: bioinformatics, genome assembly, leishmania activated protein kinase c (lack), next-generation sequencing

Procedia PDF Downloads 338

27892 hsa-miR-1204 and hsa-miR-639 Prominent Role in Tamoxifen's Molecular Mechanisms on the EMT Phenomenon in Breast Cancer Patients

Authors: Mahsa Taghavi

Abstract:

In the treatment of breast cancer, tamoxifen is a regularly prescribed medication. The effect of tamoxifen on breast cancer patients' EMT pathways was studied. In this study to see if it had any effect on the cancer cells' resistance to tamoxifen and to look for specific miRNAs associated with EMT. In this work, we used continuous and integrated bioinformatics analysis to choose the optimal GEO datasets. Once we had sorted the gene expression profile, we looked at the mechanism of signaling, the ontology of genes, and the protein interaction of each gene. In the end, we used the GEPIA database to confirm the candidate genes. after that, I investigated critical miRNAs related to candidate genes. There were two gene expression profiles that were categorized into two distinct groups. Using the expression profile of genes that were lowered in the EMT pathway, the first group was examined. The second group represented the polar opposite of the first. A total of 253 genes from the first group and 302 genes from the second group were found to be common. Several genes in the first category were linked to cell death, focal adhesion, and cellular aging. Two genes in the second group were linked to cell death, focal adhesion, and cellular aging. distinct cell cycle stages were observed. Finally, proteins such as MYLK, SOCS3, and STAT5B from the first group and BIRC5, PLK1, and RAPGAP1 from the second group were selected as potential candidates linked to tamoxifen's influence on the EMT pathway. hsa-miR-1204 and hsa-miR-639 have a very close relationship with the candidates genes according to the node degrees and betweenness index. With this, the action of tamoxifen on the EMT pathway was better understood. It's important to learn more about how tamoxifen's target genes and proteins work so that we can better understand the drug.

Keywords: tamoxifen, breast cancer, bioinformatics analysis, EMT, miRNAs

Procedia PDF Downloads 129

27891 Estimation of Transition and Emission Probabilities

Authors: Aakansha Gupta, Neha Vadnere, Tapasvi Soni, M. Anbarsi

Abstract:

Protein secondary structure prediction is one of the most important goals pursued by bioinformatics and theoretical chemistry; it is highly important in medicine and biotechnology. Some aspects of protein functions and genome analysis can be predicted by secondary structure prediction. This is used to help annotate sequences, classify proteins, identify domains, and recognize functional motifs. In this paper, we represent protein secondary structure as a mathematical model. To extract and predict the protein secondary structure from the primary structure, we require a set of parameters. Any constants appearing in the model are specified by these parameters, which also provide a mechanism for efficient and accurate use of data. To estimate these model parameters there are many algorithms out of which the most popular one is the EM algorithm or called the Expectation Maximization Algorithm. These model parameters are estimated with the use of protein datasets like RS126 by using the Bayesian Probabilistic method (data set being categorical). This paper can then be extended into comparing the efficiency of EM algorithm to the other algorithms for estimating the model parameters, which will in turn lead to an efficient component for the Protein Secondary Structure Prediction. Further this paper provides a scope to use these parameters for predicting secondary structure of proteins using machine learning techniques like neural networks and fuzzy logic. The ultimate objective will be to obtain greater accuracy better than the previously achieved.

Keywords: model parameters, expectation maximization algorithm, protein secondary structure prediction, bioinformatics

Procedia PDF Downloads 480

27890 PYURF and ZED9 Have a Prominent Role in Association with Molecular Pathways in Bortezomib in Myeloma Cells in Acute Myeloid Leukemia

Authors: Atena Sadat Hosseini, Mohammadhossein Habibi

Abstract:

Acute myeloid leukemia (AML) is the most typically diagnosed leukemia. In older adults, AML imposes a dismal outcome. AML originates with a dominant mutation, then adds collaborative, transformative mutations leading to myeloid transformation and clinical/biological heterogeneity. Several chemotherapeutic drugs are used for this cancer. These drugs are naturally associated with several side effects, and finding a more accurate molecular mechanism of these drugs can have a significant impact on the selection and better candidate of drugs for treatment. In this study, we evaluated bortezomibin myeloma cells using bioinformatics analysis and evaluation of RNA-Seq data. Then investigated the molecular pathways proteins- proteins interactions associated with this chemotherapy drug. A total of 658upregulated genes and 548 downregulated genes were sorted.AUF1 (hnRNP D0) binds and destabilizes mRNA, degradation of GLI2 by the proteasome, the role of GTSE1 in G2/M progression after G2 checkpoint, TCF dependent signaling in response to WNT demonstrated in upregulated genes. Besides insulin resistance, AKT phosphorylates targets in the nucleus, cytosine methylation, Longevity regulating pathway, and Signal Transduction of S1P Receptor were related to low expression genes. With respect to this results, HIST2H2AA3, RP11-96O20.4, ZED9, PRDX1, and DOK2, according to node degrees and betweenness elements candidates from upregulated genes. in the opposite side, PYURF, NRSN1, FGF23, UPK3BL, and STAG3 were a prominent role in downregulated genes. Sum up, Using in silico analysis in the present study, we conducted a precise study ofbortezomib molecular mechanisms in myeloma cells. so that we could take further evaluation to discovermolecular cancer therapy. Naturally, more additional experimental and clinical procedures are needed in this survey.

Keywords: myeloma cells, acute myeloid leukemia, bioinformatics analysis, bortezomib

Procedia PDF Downloads 93

27889 Through 7S Model to Promote the Service Innovation Management

Authors: Cheng Fang Hsu

Abstract:

Call center is the core of building customer relationship management system. Under the strong competitive stress, it becomes a new profiting challenge for a successful enterprise. Call center is a department not only to provide customer service but also to bring business profit. This is the qualitative case study in Taiwan bank service industry which goes on deeper exploration, and analysis by business interviews and industrial analysis. This study starts from the establishment, development, and management after the reforming of the case call center. Through SWOT analysis, and industrial analysis, this study adopted 7S model to explain how the call center reforms from service oriented to profit oriented and from cost management to profit management. The results indicated how service innovation management promotes call center to be operated as a market profit competition center. The recommendations are indicated to support the call center on marketing profit by service innovation management.

Keywords: call center, 7S model, service innovation management, bioinformatics

Procedia PDF Downloads 487

27888 Microarray Data Visualization and Preprocessing Using R and Bioconductor

Authors: Ruchi Yadav, Shivani Pandey, Prachi Srivastava

Abstract:

Microarrays provide a rich source of data on the molecular working of cells. Each microarray reports on the abundance of tens of thousands of mRNAs. Virtually every human disease is being studied using microarrays with the hope of finding the molecular mechanisms of disease. Bioinformatics analysis plays an important part of processing the information embedded in large-scale expression profiling studies and for laying the foundation for biological interpretation. A basic, yet challenging task in the analysis of microarray gene expression data is the identification of changes in gene expression that are associated with particular biological conditions. Careful statistical design and analysis are essential to improve the efficiency and reliability of microarray experiments throughout the data acquisition and analysis process. One of the most popular platforms for microarray analysis is Bioconductor, an open source and open development software project based on the R programming language. This paper describes specific procedures for conducting quality assessment, visualization and preprocessing of Affymetrix Gene Chip and also details the different bioconductor packages used to analyze affymetrix microarray data and describe the analysis and outcome of each plots.

Keywords: microarray analysis, R language, affymetrix visualization, bioconductor

Procedia PDF Downloads 480

27887 Bioinformatic Study of Follicle Stimulating Hormone Receptor (FSHR) Gene in Different Buffalo Breeds

Authors: Hamid Mustafa, Adeela Ajmal, Kim EuiSoo, Noor-ul-Ain

Abstract:

World wild, buffalo production is considered as most important component of food industry. Efficient buffalo production is related with reproductive performance of this species. Lack of knowledge of reproductive efficiency and its related genes in buffalo species is a major constraint for sustainable buffalo production. In this study, we performed some bioinformatics analysis on Follicle Stimulating Hormone Receptor (FSHR) gene and explored the possible relationship of this gene among different buffalo breeds and with other farm animals. We also found the evolution pattern for this gene among these species. We investigate CDS lengths, Stop codon variation, homology search, signal peptide, isoelectic point, tertiary structure, motifs and phylogenetic tree. The results of this study indicate 4 different motif in this gene, which are Activin-recp, GS motif, STYKc Protein kinase and transmembrane. The results also indicate that this gene has very close relationship with cattle, bison, sheep and goat. Multiple alignment (MA) showed high conservation of motif which indicates constancy of this gene during evolution. The results of this study can be used and applied for better understanding of this gene for better characterization of Follicle Stimulating Hormone Receptor (FSHR) gene structure in different farm animals, which would be helpful for efficient breeding plans for animal’s production.

Keywords: buffalo, FSHR gene, bioinformatics, production

Procedia PDF Downloads 532