Search results for: bioinformatics analysis
27845 Structure, Bioinformatics Analysis and Substrate Specificity of a 6-Phospho-β-Glucosidase Glycoside Hydrolase 1 Enzyme from Bacillus licheniformis
Authors: Wayde Veldman, Ozlem T. Bishop, Igor Polikarpov
Abstract:
In bacteria, mono and disaccharides are phosphorylated during uptake into the cell via the widely used phosphoenolpyruvate (PEP)-dependent phosphotransferase transport system. As an initial step in the phosphorylated disaccharide metabolism pathway, certain glycoside hydrolase family 1 (GH1) enzymes play a crucial role in releasing phosphorylated and non-phosphorylated monosaccharides. However, structural determinants for the specificity of these enzymes still need to be clarified. GH1 enzymes are known to have a wide array of functions. According to the CAZy database, there are twenty-one different enzymatic activities in the GH1 family. Here, the structure and substrate specificity of a GH1 enzyme from Bacillus licheniformis, hereafter known as BlBglH, was investigated. The sequence of the enzyme BlBglH was compared to the sequences of other characterized GH1 enzymes using sequence alignment, sequence identity calculations, phylogenetic analysis, and motif discovery. Through these various analyses, BlBglH was found to have sequence features characteristic of the 6-phospho-β-glucosidase activity enzymes. Additionally, motif and structure comparisons of the three most commonly studied GH1 enzyme-activities revealed a shared loop amongst the different structures that consist of different sequence motifs – this loop is thought to guide specific substrates (depending on activity) towards the active-site. To further affirm BlBglH enzyme activity, molecular docking and molecular dynamics simulations were performed. Docking was carried out using 6-phospho-β-glucosidase enzyme-activity positive (p-Nitrophenyl-beta-D-glucoside-6-phosphate) and negative (p-Nitrophenyl-beta-D-galactoside-6-phosphate) control ligands, followed by 400 ns molecular dynamics simulations. The positive-control ligand maintained favourable interactions within the active site until the end of the simulation. The negative-control ligand was observed exiting the enzyme at 287 ns. Binding free energy calculations showed that the positive-control complex had a substantially more favourable binding energy compared to the negative-control complex. Jointly, the findings of this study suggest that the BlBglH enzyme possesses 6-phospho-β-glucosidase enzymatic activity.Keywords: 6-P-β-glucosidase, glycoside hydrolase 1, molecular dynamics, sequence analysis, substrate specificity
Procedia PDF Downloads 13027844 Analysis of Osmotin as Transcription Factor/Cell Signaling Modulator Using Bioinformatic Tools
Authors: Usha Kiran, M. Z. Abdin
Abstract:
Osmotin is an abundant cationic multifunctional protein discovered in cells of tobacco (Nicotiana tabacum L. var Wisconsin 38) adapted to an environment of low osmotic potential. It provides plants protection from pathogens, hence placed in the PRP family of proteins. The osmotin induced proline accumulation has been reported in plants including transgenic tomato and strawberry conferring tolerance against both biotic and abiotic stresses. The exact mechanism of induction of proline by osmotin is however, not known till date. These observations have led us to hypothesize that osmotin induced proline accumulation could be due to its involvement as transcription factor and/or cell signal pathway modulator in proline biosynthesis. The present investigation was therefore, undertaken to analyze the osmotin protein as transcription factor /cell signalling modulator using bioinformatics tools. The results of available online DNA binding motif search programs revealed that osmotin does not contain DNA-binding motifs. The alignment results of osmotin protein with the protein sequence from DATF showed the homology in the range of 0-20%, suggesting that it might not contain a DNA binding motif. Further to find unique DNA-binding domain, the superimposition of osmotin 3D structure on modeled Arabidopsis transcription factors using Chimera also suggested absence of the same. We, however, found evidence implicating osmotin in cell signaling. With these results, we concluded that osmotin is not a transcription factor but regulating proline biosynthesis and accumulation through cell signaling during abiotic stresses.Keywords: osmotin, cell signaling modulator, bioinformatic tools, protein
Procedia PDF Downloads 27227843 Glycan Analyzer: Software to Annotate Glycan Structures from Exoglycosidase Experiments
Authors: Ian Walsh, Terry Nguyen-Khuong, Christopher H. Taron, Pauline M. Rudd
Abstract:
Glycoproteins and their covalently bonded glycans play critical roles in the immune system, cell communication, disease and disease prognosis. Ultra performance liquid chromatography (UPLC) coupled with mass spectrometry is conventionally used to qualitatively and quantitatively characterise glycan structures in a given sample. Exoglycosidases are enzymes that catalyze sequential removal of monosaccharides from the non-reducing end of glycans. They naturally have specificity for a particular type of sugar, its stereochemistry (α or β anomer) and its position of attachment to an adjacent sugar on the glycan. Thus, monitoring the peak movements (both in the UPLC and MS1) after application of exoglycosidases provides a unique and effective way to annotate sugars with high detail - i.e. differentiating positional and linkage isomers. Manual annotation of an exoglycosidase experiment is difficult and time consuming. As such, with increasing sample complexity and the number of exoglycosidases, the analysis could result in manually interpreting hundreds of peak movements. Recently, we have implemented pattern recognition software for automated interpretation of UPLC-MS1 exoglycosidase digestions. In this work, we explain the software, indicate how much time it will save and provide example usage showing the annotation of positional and linkage isomers in Immunoglobulin G, apolipoprotein J, and simple glycan standards.Keywords: bioinformatics, automated glycan assignment, liquid chromatography, mass spectrometry
Procedia PDF Downloads 20027842 Investigating the Successes of in vitro Embryogenesis
Authors: Zelikha Labbani
Abstract:
The in vitro isolated microspore culture is the most powerful androgenic pathway to produce doubled haploid plants in the short time. To deviate a microspore toward embryogenesis, a number of factors, different for each species, must concur at the same time and place. Once induced, the microspore undergoes numerous changes at different levels, from overall morphology to gene expression. Induction of microspore embryogenesis not only implies the expression of an embryogenic program, but also a stress-related cellular response and a repression of the gametophytic program to revert the microspore to a totipotent status. As haploid single cells, microspore became a strategy to achieve various objectives particularly in genetic engineering. In this communication we would show the most recent advances in the producing haploid embryos via in vitro isolated microspore culture.Keywords: in vitro isolated microspore culture, success, haploid cells, bioinformatics, biomedicine
Procedia PDF Downloads 47527841 Microbial Biogeography of Greek Olive Varieties Assessed by Amplicon-Based Metagenomics Analysis
Authors: Lena Payati, Maria Kazou, Effie Tsakalidou
Abstract:
Table olives are one of the most popular fermented vegetables worldwide, which along with olive oil, have a crucial role in the world economy. They are highly appreciated by the consumers for their characteristic taste and pleasant aromas, while several health and nutritional benefits have been reported as well. Until recently, microbial biogeography, i.e., the study of microbial diversity over time and space, has been mainly associated with wine. However, nowadays, the term 'terroir' has been extended to other crops and food products so as to link the geographical origin and environmental conditions to quality aspects of fermented foods. Taking the above into consideration, the present study focuses on the microbial fingerprinting of the most important olive varieties of Greece with the state-of-the-art amplicon-based metagenomics analysis. Towards this, in 2019, 61 samples from 38 different olive varieties were collected at the final stage of ripening from 13 well spread geographical regions in Greece. For the metagenomics analysis, total DNA was extracted from the olive samples, and the 16S rRNA gene and ITS DNA region were sequenced and analyzed using bioinformatics tools for the identification of bacterial and yeasts/fungal diversity, respectively. Furthermore, principal component analysis (PCA) was also performed for data clustering based on the average microbial composition of all samples from each region of origin. According to the composition, results obtained, when samples were analyzed separately, the majority of both bacteria (such as Pantoea, Enterobacter, Roserbergiella, and Pseudomonas) and yeasts/fungi (such as Aureobasidium, Debaromyces, Candida, and Cladosporium) genera identified were found in all 61 samples. Even though interesting differences were observed at the relative abundance level of the identified genera, the bacterial genus Pantoea and the yeast/fungi genus Aureobasidium were the dominant ones in 35 and 40 samples, respectively. Of note, olive samples collected from the same region had similar fingerprint (genera identified and relative abundance level) regardless of the variety, indicating a potential association between the relative abundance of certain taxa and the geographical region. When samples were grouped by region of origin, distinct bacterial profiles per region were observed, which was also evident from the PCA analysis. This was not the case for the yeast/fungi profiles since 10 out of the 13 regions were grouped together mainly due to the dominance of the genus Aureobasidium. A second cluster was formed for the islands Crete and Rhodes, both of which are located in the Southeast Aegean Sea. These two regions clustered together mainly due to the identification of the genus Toxicocladosporium in relatively high abundances. Finally, the Agrinio region was separated from the others as it showed a completely different microbial fingerprinting. However, due to the limited number of olive samples from some regions, a subsequent PCA analysis with more samples from these regions is expected to yield in a more clear clustering. The present study is part of a bigger project, the first of its kind in Greece, with the ultimate goal to analyze a larger set of olive samples of different varieties and from different regions in Greece in order to have a reliable olives’ microbial biogeography.Keywords: amplicon-based metagenomics analysis, bacteria, microbial biogeography, olive microbiota, yeasts/fungi
Procedia PDF Downloads 11527840 Gut Microbiota in Patients with Opioid Use Disorder: A 12-week Follow up Study
Authors: Sheng-Yu Lee
Abstract:
Aim: Opioid use disorder is often characterized by repetitive drug-seeking and drug-taking behaviors with severe public health consequences. Animal model showed that opioid-induced perturbations in the gut microbiota causally relate to neuroinflammation, deficits in reward responding, and opioid tolerance, possibly due to changes in gut microbiota. Therefore, we propose that the dysbiosis of gut microbiota can be associated with pathogenesis of opioid dependence. In this current study, we explored the differences in gut microbiota between patients and normal controls and in patients before and after initiation of methadone treatment program for 12 weeks. Methods: Patients with opioid use disorder between 20 and 65 years were recruited from the methadone maintenance outpatient clinic in 2 medical centers in the Southern Taiwan. Healthy controls without any family history of major psychiatric disorders (schizophrenia, bipolar disorder and major depressive disorder) were recruited from the community. After initial screening, 15 patients with opioid use disorder joined the study for initial evaluation (Week 0), 12 of them completed the 12-week follow-up while receiving methadone treatment and ceased heroin use (Week 12). Fecal samples were collected from the patients at baseline and the end of 12th week. A one-time fecal sample was collected from the healthy controls. The microbiota of fecal samples were investigated using 16S rRNA V3V4 amplicon sequencing, followed by bioinformatics and statistical analyses. Results: We found no significant differences in species diversity in opioid dependent patients between Week 0 and Week 12, nor compared between patients at both points and controls. For beta diversity, using principal component analysis, we found no significant differences between patients at Week 0 and Week 12, however, both patient groups showed significant differences compared to control (P=0.011). Furthermore, the linear discriminant analysis effect size (LEfSe) analysis was used to identify differentially enriched bacteria between opioid use patients and healthy controls. Compared to controls, the relative abundance of Lactobacillaceae Lactobacillus (L. Lactobacillus), Megasphaera Megasphaerahexanoica (M. Megasphaerahexanoica) and Caecibacter Caecibactermassiliensis (C Caecibactermassiliensis) were increased in patients at Week 0, while Coriobacteriales Atopobiaceae (C. Atopobiaceae), Acidaminococcus Acidaminococcusintestini (A. Acidaminococcusintestini) and Tractidigestivibacter Tractidigestivibacterscatoligenes (T. Tractidigestivibacterscatoligenes) were increased in patients at Week 12. Conclusion: In conclusion, we suggest that the gut microbiome community maybe linked to opioid use disorder, such differences may not be altered even after 12-week of cessation of opioid use.Keywords: opioid use disorder, gut microbiota, methadone treatment, follow up study
Procedia PDF Downloads 10627839 Detection and Identification of Antibiotic Resistant Bacteria Using Infra-Red-Microscopy and Advanced Multivariate Analysis
Authors: Uraib Sharaha, Ahmad Salman, Eladio Rodriguez-Diaz, Elad Shufan, Klaris Riesenberg, Irving J. Bigio, Mahmoud Huleihel
Abstract:
Antimicrobial drugs have an important role in controlling illness associated with infectious diseases in animals and humans. However, the increasing resistance of bacteria to a broad spectrum of commonly used antibiotics has become a global health-care problem. Rapid determination of antimicrobial susceptibility of a clinical isolate is often crucial for the optimal antimicrobial therapy of infected patients and in many cases can save lives. The conventional methods for susceptibility testing like disk diffusion are time-consuming and other method including E-test, genotyping are relatively expensive. Fourier transform infrared (FTIR) microscopy is rapid, safe, and low cost method that was widely and successfully used in different studies for the identification of various biological samples including bacteria. The new modern infrared (IR) spectrometers with high spectral resolution enable measuring unprecedented biochemical information from cells at the molecular level. Moreover, the development of new bioinformatics analyses combined with IR spectroscopy becomes a powerful technique, which enables the detection of structural changes associated with resistivity. The main goal of this study is to evaluate the potential of the FTIR microscopy in tandem with machine learning algorithms for rapid and reliable identification of bacterial susceptibility to antibiotics in time span of few minutes. The bacterial samples, which were identified at the species level by MALDI-TOF and examined for their susceptibility by the routine assay (micro-diffusion discs), are obtained from the bacteriology laboratories in Soroka University Medical Center (SUMC). These samples were examined by FTIR microscopy and analyzed by advanced statistical methods. Our results, based on 550 E.coli samples, were promising and showed that by using infrared spectroscopic technique together with multivariate analysis, it is possible to classify the tested bacteria into sensitive and resistant with success rate higher than 85% for eight different antibiotics. Based on these preliminary results, it is worthwhile to continue developing the FTIR microscopy technique as a rapid and reliable method for identification antibiotic susceptibility.Keywords: antibiotics, E. coli, FTIR, multivariate analysis, susceptibility
Procedia PDF Downloads 26527838 Genomics of Aquatic Adaptation
Authors: Agostinho Antunes
Abstract:
The completion of the human genome sequencing in 2003 opened a new perspective into the importance of whole genome sequencing projects, and currently multiple species are having their genomes completed sequenced, from simple organisms, such as bacteria, to more complex taxa, such as mammals. This voluminous sequencing data generated across multiple organisms provides also the framework to better understand the genetic makeup of such species and related ones, allowing to explore the genetic changes underlining the evolution of diverse phenotypic traits. Here, recent results from our group retrieved from comparative evolutionary genomic analyses of selected marine animal species will be considered to exemplify how gene novelty and gene enhancement by positive selection might have been determinant in the success of adaptive radiations into diverse habitats and lifestyles.Keywords: comparative genomics, adaptive evolution, bioinformatics, phylogenetics, genome mining
Procedia PDF Downloads 53327837 Antibody Reactivity of Synthetic Peptides Belonging to Proteins Encoded by Genes Located in Mycobacterium tuberculosis-Specific Genomic Regions of Differences
Authors: Abu Salim Mustafa
Abstract:
The comparisons of mycobacterial genomes have identified several Mycobacterium tuberculosis-specific genomic regions that are absent in other mycobacteria and are known as regions of differences. Due to M. tuberculosis-specificity, the peptides encoded by these regions could be useful in the specific diagnosis of tuberculosis. To explore this possibility, overlapping synthetic peptides corresponding to 39 proteins predicted to be encoded by genes present in regions of differences were tested for antibody-reactivity with sera from tuberculosis patients and healthy subjects. The results identified four immunodominant peptides corresponding to four different proteins, with three of the peptides showing significantly stronger antibody reactivity and rate of positivity with sera from tuberculosis patients than healthy subjects. The fourth peptide was recognized equally well by the sera of tuberculosis patients as well as healthy subjects. Predication of antibody epitopes by bioinformatics analyses using ABCpred server predicted multiple linear epitopes in each peptide. Furthermore, peptide sequence analysis for sequence identity using BLAST suggested M. tuberculosis-specificity for the three peptides that had preferential reactivity with sera from tuberculosis patients, but the peptide with equal reactivity with sera of TB patients and healthy subjects showed significant identity with sequences present in nob-tuberculous mycobacteria. The three identified M. tuberculosis-specific immunodominant peptides may be useful in the serological diagnosis of tuberculosis.Keywords: genomic regions of differences, Mycobacterium tuberculossis, peptides, serodiagnosis
Procedia PDF Downloads 18327836 Genomics of Adaptation in the Sea
Authors: Agostinho Antunes
Abstract:
The completion of the human genome sequencing in 2003 opened a new perspective into the importance of whole genome sequencing projects, and currently multiple species are having their genomes completed sequenced, from simple organisms, such as bacteria, to more complex taxa, such as mammals. This voluminous sequencing data generated across multiple organisms provides also the framework to better understand the genetic makeup of such species and related ones, allowing to explore the genetic changes underlining the evolution of diverse phenotypic traits. Here, recent results from our group retrieved from comparative evolutionary genomic analyses of selected marine animal species will be considered to exemplify how gene novelty and gene enhancement by positive selection might have been determinant in the success of adaptive radiations into diverse habitats and lifestyles.Keywords: marine genomics, evolutionary bioinformatics, human genome sequencing, genomic analyses
Procedia PDF Downloads 61127835 Protein Tertiary Structure Prediction by a Multiobjective Optimization and Neural Network Approach
Authors: Alexandre Barbosa de Almeida, Telma Woerle de Lima Soares
Abstract:
Protein structure prediction is a challenging task in the bioinformatics field. The biological function of all proteins majorly relies on the shape of their three-dimensional conformational structure, but less than 1% of all known proteins in the world have their structure solved. This work proposes a deep learning model to address this problem, attempting to predict some aspects of the protein conformations. Throughout a process of multiobjective dominance, a recurrent neural network was trained to abstract the particular bias of each individual multiobjective algorithm, generating a heuristic that could be useful to predict some of the relevant aspects of the three-dimensional conformation process formation, known as protein folding.Keywords: Ab initio heuristic modeling, multiobjective optimization, protein structure prediction, recurrent neural network
Procedia PDF Downloads 20527834 Non-Invasive Pre-Implantation Genetic Assessment Using NGS in IVF Clinical Routine
Authors: Katalin Gombos, Bence Gálik, Krisztina Ildikó Kalács, Krisztina Gödöny, Ákos Várnagy, József Bódis, Attila Gyenesei, Gábor L. Kovács
Abstract:
Although non-invasive pre-implantation genetic testing for aneuploidy (NIPGT-A) is potentially appropriate to assess chromosomal ploidy of the embryo, practical application of it in a routine IVF center has not been started in the absence of a recommendation. We developed a comprehensive workflow for a clinically applicable strategy for NIPGT-A based on next-generation sequencing (NGS) technology. We performed MALBAC whole genome amplification and NGS on spent blastocyst culture media of Day 3 embryos fertilized with intra-cytoplasmic sperm injection (ICSI). Spent embryonic culture media of morphologically good quality score embryos were enrolled in further analysis with the blank culture media as background control. Chromosomal abnormalities were identified by an optimized bioinformatics pipeline applying a copy number variation (CNV) detecting algorithm. We demonstrate a comprehensive workflow covering both wet- and dry-lab procedures supporting a clinically applicable strategy for NIPGT-A. It can be carried out within 48 h which is critical for the same-cycle blastocyst transfer, but also suitable for “freeze all” and “elective frozen embryo” strategies. The described integrated approach of non-invasive evaluation of embryonic DNA content of the culture media can potentially supplement existing pre-implantation genetic screening methods.Keywords: next generation sequencing, in vitro fertilization, embryo assessment, non-invasive pre-implantation genetic testing
Procedia PDF Downloads 15627833 Effect of the Applied Bias on Mini-Band Structures in Dimer Fibonacci InAs/Ga1-XInXAs Superlattices
Authors: Z. Aziz, S. Terkhi, Y. Sefir, R. Djelti, S. Bentata
Abstract:
The effect of a uniform electric field across multi-barrier systems (InAs/InxGa1-xAs) is exhaustively explored by a computational model using exact Airy function formalism and the transfer-matrix technique. In the case of biased DFHBSL structure a strong reduction in transmission properties was observed and the width of the mini-band structure linearly decreases with the increase of the applied bias. This is due to the confinement of the states in the mini-band structure, which becomes increasingly important (Wannier-Stark Effect).Keywords: dimer fibonacci height barrier superlattices, singular extended state, exact Airy function and transfer matrix formalism, bioinformatics
Procedia PDF Downloads 28927832 A Review of Spatial Analysis as a Geographic Information Management Tool
Authors: Chidiebere C. Agoha, Armstong C. Awuzie, Chukwuebuka N. Onwubuariri, Joy O. Njoku
Abstract:
Spatial analysis is a field of study that utilizes geographic or spatial information to understand and analyze patterns, relationships, and trends in data. It is characterized by the use of geographic or spatial information, which allows for the analysis of data in the context of its location and surroundings. It is different from non-spatial or aspatial techniques, which do not consider the geographic context and may not provide as complete of an understanding of the data. Spatial analysis is applied in a variety of fields, which includes urban planning, environmental science, geosciences, epidemiology, marketing, to gain insights and make decisions about complex spatial problems. This review paper explores definitions of spatial analysis from various sources, including examples of its application and different analysis techniques such as Buffer analysis, interpolation, and Kernel density analysis (multi-distance spatial cluster analysis). It also contrasts spatial analysis with non-spatial analysis.Keywords: aspatial technique, buffer analysis, epidemiology, interpolation
Procedia PDF Downloads 31827831 Complete Genome Sequence Analysis of Pasteurella multocida Subspecies multocida Serotype A Strain PMTB2.1
Authors: Shagufta Jabeen, Faez J. Firdaus Abdullah, Zunita Zakaria, Nurulfiza M. Isa, Yung C. Tan, Wai Y. Yee, Abdul R. Omar
Abstract:
Pasteurella multocida (PM) is an important veterinary opportunistic pathogen particularly associated with septicemic pasteurellosis, pneumonic pasteurellosis and hemorrhagic septicemia in cattle and buffaloes. P. multocida serotype A has been reported to cause fatal pneumonia and septicemia. Pasteurella multocida subspecies multocida of serotype A Malaysian isolate PMTB2.1 was first isolated from buffaloes died of septicemia. In this study, the genome of P. multocida strain PMTB2.1 was sequenced using third-generation sequencing technology, PacBio RS2 system and analyzed bioinformatically via de novo analysis followed by in-depth analysis based on comparative genomics. Bioinformatics analysis based on de novo assembly of PacBio raw reads generated 3 contigs followed by gap filling of aligned contigs with PCR sequencing, generated a single contiguous circular chromosome with a genomic size of 2,315,138 bp and a GC content of approximately 40.32% (Accession number CP007205). The PMTB2.1 genome comprised of 2,176 protein-coding sequences, 6 rRNA operons and 56 tRNA and 4 ncRNAs sequences. The comparative genome sequence analysis of PMTB2.1 with nine complete genomes which include Actinobacillus pleuropneumoniae, Haemophilus parasuis, Escherichia coli and five P. multocida complete genome sequences including, PM70, PM36950, PMHN06, PM3480, PMHB01 and PMTB2.1 was carried out based on OrthoMCL analysis and Venn diagram. The analysis showed that 282 CDs (13%) are unique to PMTB2.1and 1,125 CDs with orthologs in all. This reflects overall close relationship of these bacteria and supports the classification in the Gamma subdivision of the Proteobacteria. In addition, genomic distance analysis among all nine genomes indicated that PMTB2.1 is closely related with other five Pasteurella species with genomic distance less than 0.13. Synteny analysis shows subtle differences in genetic structures among different P.multocida indicating the dynamics of frequent gene transfer events among different P. multocida strains. However, PM3480 and PM70 exhibited exceptionally large structural variation since they were swine and chicken isolates. Furthermore, genomic structure of PMTB2.1 is more resembling that of PM36950 with a genomic size difference of approximately 34,380 kb (smaller than PM36950) and strain-specific Integrative and Conjugative Elements (ICE) which was found only in PM36950 is absent in PMTB2.1. Meanwhile, two intact prophages sequences of approximately 62 kb were found to be present only in PMTB2.1. One of phage is similar to transposable phage SfMu. The phylogenomic tree was constructed and rooted with E. coli, A. pleuropneumoniae and H. parasuis based on OrthoMCL analysis. The genomes of P. multocida strain PMTB2.1 were clustered with bovine isolates of P. multocida strain PM36950 and PMHB01 and were separated from avian isolate PM70 and swine isolates PM3480 and PMHN06 and are distant from Actinobacillus and Haemophilus. Previous studies based on Single Nucleotide Polymorphism (SNPs) and Multilocus Sequence Typing (MLST) unable to show a clear phylogenetic relatedness between Pasteurella multocida and the different host. In conclusion, this study has provided insight on the genomic structure of PMTB2.1 in terms of potential genes that can function as virulence factors for future study in elucidating the mechanisms behind the ability of the bacteria in causing diseases in susceptible animals.Keywords: comparative genomics, DNA sequencing, phage, phylogenomics
Procedia PDF Downloads 18827830 New Bio-Strategies for Ochratoxin a Detoxification Using Lactic Acid Bacteria
Authors: José Maria, Vânia Laranjo, Luís Abrunhosa, António Inês
Abstract:
The occurrence of mycotoxigenic moulds such as Aspergillus, Penicillium and Fusarium in food and feed has an important impact on public health, by the appearance of acute and chronic mycotoxicoses in humans and animals, which is more severe in the developing countries due to lack of food security, poverty and malnutrition. This mould contamination also constitutes a major economic problem due the lost of crop production. A great variety of filamentous fungi is able to produce highly toxic secondary metabolites known as mycotoxins. Most of the mycotoxins are carcinogenic, mutagenic, neurotoxic and immunosuppressive, being ochratoxin A (OTA) one of the most important. OTA is toxic to animals and humans, mainly due to its nephrotoxic properties. Several approaches have been developed for decontamination of mycotoxins in foods, such as, prevention of contamination, biodegradation of mycotoxins-containing food and feed with microorganisms or enzymes and inhibition or absorption of mycotoxin content of consumed food into the digestive tract. Some group of Gram-positive bacteria named lactic acid bacteria (LAB) are able to release some molecules that can influence the mould growth, improving the shelf life of many fermented products and reducing health risks due to exposure to mycotoxins. Some LAB are capable of mycotoxin detoxification. Recently our group was the first to describe the ability of LAB strains to biodegrade OTA, more specifically, Pediococcus parvulus strains isolated from Douro wines. The pathway of this biodegradation was identified previously in other microorganisms. OTA can be degraded through the hydrolysis of the amide bond that links the L-β-phenylalanine molecule to the ochratoxin alpha (OTα) a non toxic compound. It is known that some peptidases from different origins can mediate the hydrolysis reaction like, carboxypeptidase A an enzyme from the bovine pancreas, a commercial lipase and several commercial proteases. So, we wanted to have a better understanding of this OTA degradation process when LAB are involved and identify which molecules where present in this process. For achieving our aim we used some bioinformatics tools (BLAST, CLUSTALX2, CLC Sequence Viewer 7, Finch TV). We also designed specific primers and realized gene specific PCR. The template DNA used came from LAB strains samples of our previous work, and other DNA LAB strains isolated from elderberry fruit, silage, milk and sausages. Through the employment of bioinformatics tools it was possible to identify several proteins belonging to the carboxypeptidase family that participate in the process of OTA degradation, such as serine type D-Ala-D-Ala carboxypeptidase and membrane carboxypeptidase. In conclusions, this work has identified carboxypeptidase proteins being one of the molecules present in the OTA degradation process when LAB are involved.Keywords: carboxypeptidase, lactic acid bacteria, mycotoxins, ochratoxin a.
Procedia PDF Downloads 46227829 Characterization of Transmembrane Proteins with Five Alpha-Helical Regions
Authors: Misty Attwood, Helgi Schioth
Abstract:
Transmembrane proteins are important components in many essential cell processes such as signal transduction, cell-cell signalling, transport of solutes, structural adhesion activities, and protein trafficking. Due to their involvement in diverse critical activities, transmembrane proteins are implicated in different disease pathways and hence are the focus of intense interest in understanding functional activities, their pathogenesis in disease, and their potential as pharmaceutical targets. Further, as the structure and function of proteins are correlated, investigating a group of proteins with the same tertiary structure, i.e., the same number of transmembrane regions, may give understanding about their functional roles and potential as therapeutic targets. In this in silico bioinformatics analysis, we identify and comprehensively characterize the previously unstudied group of proteins with five transmembrane-spanning regions (5TM). We classify nearly 60 5TM proteins in which 31 are members of ten families that contain two or more family members and all members are predicted to contain the 5TM architecture. Furthermore, nine singlet proteins that contain the 5TM architecture without paralogues detected in humans were also identifying, indicating the evolution of single unique proteins with the 5TM structure. Interestingly, more than half of these proteins function in localization activities through movement or tethering of cell components and more than one-third are involved in transport activities, particularly in the mitochondria. Surprisingly, no receptor activity was identified within this family in sharp contrast with other TM families. Three major 5TM families were identified and include the Tweety family, which are pore-forming subunits of the swelling-dependent volume regulated anion channel in astrocytes; the sidoreflexin family that acts as mitochondrial amino acid transporters; and the Yip1 domain family engaged in vesicle budding and intra-Golgi transport. About 30% of the proteins have enhanced expression in the brain, liver, or testis. Importantly, 60% of these proteins are identified as cancer prognostic markers, where they are associated with clinical outcomes of various tumour types, indicating further investigation into the function and expression of these proteins is important. This study provides the first comprehensive analysis of proteins with 5TM regions and provides details of the unique characteristics and application in pharmaceutical development.Keywords: 5TM, cancer prognostic marker, drug targets, transmembrane protein
Procedia PDF Downloads 11127828 Synergistic Effect of Eugenol Acetate with Betalactam Antibiotic on Betalactamase and Its Bioinformatics Analysis
Authors: Vinod Nair, C. Sadasivan
Abstract:
Beta-lactam antibiotics are the most frequently prescribed medications in modern medicine. The antibiotic resistance by the production of enzyme beta-lactamase is an important mechanism seen in microorganisms. Resistance to beta-lactams mediated by beta-lactamases can be overcome successfully with the use of beta-lactamase inhibitors. New generations of the antibiotics contain mostly synthetic compounds, and many side effects have been reported for them. Combinations of beta-lactam and beta-lactamase inhibitors have become one of the most successful antimicrobial strategies in the current scenario of bacterial infections. Plant-based drugs are very cheap and having lesser adverse effect than synthetic compounds. The synergistic effect of eugenol acetate with beta-lactams restores the activity of beta-lactams, allowing their continued clinical use. It is reported here the enhanced inhibitory effect of phytochemical, eugenol acetate, isolated from the plant Syzygium aromaticum with beta-lactams on beta-lactamase. The compound was found to have synergistic effect with the antibiotic amoxicillin against antibiotic-resistant strain of S.aureus. The enzyme was purified from the organism and incubated with the compound. The assay showed that the compound could inhibit the enzymatic activity of beta-lactamase. Modeling and molecular docking studies indicated that the compound can fit into the active site of beta-lactamase and can mask the important residue for hydrolysis of beta-lactams. The synergistic effects of eugenol acetate with beta-lactam antibiotics may justify, the use of these plant compounds for the preparation of β-lactamase inhibitors against β-lactam resistant S.aureus.Keywords: betalactamase, eugenol acetate, synergistic effect, molecular modeling
Procedia PDF Downloads 24927827 Application of Subversion Analysis in the Search for the Causes of Cracking in a Marine Engine Injector Nozzle
Authors: Leszek Chybowski, Artur Bejger, Katarzyna Gawdzińska
Abstract:
Subversion analysis is a tool used in the TRIZ (Theory of Inventive Problem Solving) methodology. This article introduces the history and describes the process of subversion analysis, as well as function analysis and analysis of the resources, used at the design stage when generating possible undesirable situations. The article charts the course of subversion analysis when applied to a fuel injection nozzle of a marine engine. The work describes the fuel injector nozzle as a technological system and presents principles of analysis for the causes of a cracked tip of the nozzle body. The system is modelled with functional analysis. A search for potential causes of the damage is undertaken and a cause-and-effect analysis for various hypotheses concerning the damage is drawn up. The importance of particular hypotheses is evaluated and the most likely causes of damage identified.Keywords: complex technical system, fuel injector, function analysis, importance analysis, resource analysis, sabotage analysis, subversion analysis, TRIZ (Theory of Inventive Problem Solving)
Procedia PDF Downloads 61727826 Identification of Hub Genes in the Development of Atherosclerosis
Authors: Jie Lin, Yiwen Pan, Li Zhang, Zhangyong Xia
Abstract:
Atherosclerosis is a chronic inflammatory disease characterized by the accumulation of lipids, immune cells, and extracellular matrix in the arterial walls. This pathological process can lead to the formation of plaques that can obstruct blood flow and trigger various cardiovascular diseases such as heart attack and stroke. The underlying molecular mechanisms still remain unclear, although many studies revealed the dysfunction of endothelial cells, recruitment and activation of monocytes and macrophages, and the production of pro-inflammatory cytokines and chemokines in atherosclerosis. This study aimed to identify hub genes involved in the progression of atherosclerosis and to analyze their biological function in silico, thereby enhancing our understanding of the disease’s molecular mechanisms. Through the analysis of microarray data, we examined the gene expression in media and neo-intima from plaques, as well as distant macroscopically intact tissue, across a cohort of 32 hypertensive patients. Initially, 112 differentially expressed genes (DEGs) were identified. Subsequent immune infiltration analysis indicated a predominant presence of 27 immune cell types in the atherosclerosis group, particularly noting an increase in monocytes and macrophages. In the Weighted gene co-expression network analysis (WGCNA), 10 modules with a minimum of 30 genes were defined as key modules, with blue, dark, Oliver green and sky-blue modules being the most significant. These modules corresponded respectively to monocyte, activated B cell, and activated CD4 T cell gene patterns, revealing a strong morphological-genetic correlation. From these three gene patterns (modules morphology), a total of 2509 key genes (Gene Significance >0.2, module membership>0.8) were extracted. Six hub genes (CD36, DPP4, HMOX1, PLA2G7, PLN2, and ACADL) were then identified by intersecting 2509 key genes, 102 DEGs with lipid-related genes from the Genecard database. The bio-functional analysis of six hub genes was estimated by a robust classifier with an area under the curve (AUC) of 0.873 in the ROC plot, indicating excellent efficacy in differentiating between the disease and control group. Moreover, PCA visualization demonstrated clear separation between the groups based on these six hub genes, suggesting their potential utility as classification features in predictive models. Protein-protein interaction (PPI) analysis highlighted DPP4 as the most interconnected gene. Within the constructed key gene-drug network, 462 drugs were predicted, with ursodeoxycholic acid (UDCA) being identified as a potential therapeutic agent for modulating DPP4 expression. In summary, our study identified critical hub genes implicated in the progression of atherosclerosis through comprehensive bioinformatic analyses. These findings not only advance our understanding of the disease but also pave the way for applying similar analytical frameworks and predictive models to other diseases, thereby broadening the potential for clinical applications and therapeutic discoveries.Keywords: atherosclerosis, hub genes, drug prediction, bioinformatics
Procedia PDF Downloads 6627825 Partial Least Square Regression for High-Dimentional and High-Correlated Data
Authors: Mohammed Abdullah Alshahrani
Abstract:
The research focuses on investigating the use of partial least squares (PLS) methodology for addressing challenges associated with high-dimensional correlated data. Recent technological advancements have led to experiments producing data characterized by a large number of variables compared to observations, with substantial inter-variable correlations. Such data patterns are common in chemometrics, where near-infrared (NIR) spectrometer calibrations record chemical absorbance levels across hundreds of wavelengths, and in genomics, where thousands of genomic regions' copy number alterations (CNA) are recorded from cancer patients. PLS serves as a widely used method for analyzing high-dimensional data, functioning as a regression tool in chemometrics and a classification method in genomics. It handles data complexity by creating latent variables (components) from original variables. However, applying PLS can present challenges. The study investigates key areas to address these challenges, including unifying interpretations across three main PLS algorithms and exploring unusual negative shrinkage factors encountered during model fitting. The research presents an alternative approach to addressing the interpretation challenge of predictor weights associated with PLS. Sparse estimation of predictor weights is employed using a penalty function combining a lasso penalty for sparsity and a Cauchy distribution-based penalty to account for variable dependencies. The results demonstrate sparse and grouped weight estimates, aiding interpretation and prediction tasks in genomic data analysis. High-dimensional data scenarios, where predictors outnumber observations, are common in regression analysis applications. Ordinary least squares regression (OLS), the standard method, performs inadequately with high-dimensional and highly correlated data. Copy number alterations (CNA) in key genes have been linked to disease phenotypes, highlighting the importance of accurate classification of gene expression data in bioinformatics and biology using regularized methods like PLS for regression and classification.Keywords: partial least square regression, genetics data, negative filter factors, high dimensional data, high correlated data
Procedia PDF Downloads 4927824 Predicting Susceptibility to Coronary Artery Disease using Single Nucleotide Polymorphisms with a Large-Scale Data Extraction from PubMed and Validation in an Asian Population Subset
Authors: K. H. Reeta, Bhavana Prasher, Mitali Mukerji, Dhwani Dholakia, Sangeeta Khanna, Archana Vats, Shivam Pandey, Sandeep Seth, Subir Kumar Maulik
Abstract:
Introduction Research has demonstrated a connection between coronary artery disease (CAD) and genetics. We did a deep literature mining using both bioinformatics and manual efforts to identify the susceptible polymorphisms in coronary artery disease. Further, the study sought to validate these findings in an Asian population. Methodology In first phase, we used an automated pipeline which organizes and presents structured information on SNPs, Population and Diseases. The information was obtained by applying Natural Language Processing (NLP) techniques to approximately 28 million PubMed abstracts. To accomplish this, we utilized Python scripts to extract and curate disease-related data, filter out false positives, and categorize them into 24 hierarchical groups using named Entity Recognition (NER) algorithms. From the extensive research conducted, a total of 466 unique PubMed Identifiers (PMIDs) and 694 Single Nucleotide Polymorphisms (SNPs) related to coronary artery disease (CAD) were identified. To refine the selection process, a thorough manual examination of all the studies was carried out. Specifically, SNPs that demonstrated susceptibility to CAD and exhibited a positive Odds Ratio (OR) were selected, and a final pool of 324 SNPs was compiled. The next phase involved validating the identified SNPs in DNA samples of 96 CAD patients and 37 healthy controls from Indian population using Global Screening Array. ResultsThe results exhibited out of 324, only 108 SNPs were expressed, further 4 SNPs showed significant difference of minor allele frequency in cases and controls. These were rs187238 of IL-18 gene, rs731236 of VDR gene, rs11556218 of IL16 gene and rs5882 of CETP gene. Prior researches have reported association of these SNPs with various pathways like endothelial damage, susceptibility of vitamin D receptor (VDR) polymorphisms, and reduction of HDL-cholesterol levels, ultimately leading to the development of CAD. Among these, only rs731236 had been studied in Indian population and that too in diabetes and vitamin D deficiency. For the first time, these SNPs were reported to be associated with CAD in Indian population. Conclusion: This pool of 324 SNP s is a unique kind of resource that can help to uncover risk associations in CAD. Here, we validated in Indian population. Further, validation in different populations may offer valuable insights and contribute to the development of a screening tool and may help in enabling the implementation of primary prevention strategies targeted at the vulnerable population.Keywords: coronary artery disease, single nucleotide polymorphism, susceptible SNP, bioinformatics
Procedia PDF Downloads 7627823 Novel Coprocessor for DNA Sequence Alignment in Resequencing Applications
Authors: Atef Ibrahim, Hamed Elsimary, Abdullah Aljumah, Fayez Gebali
Abstract:
This paper presents a novel semi-systolic array architecture for an optimized parallel sequence alignment algorithm. This architecture has the advantage that it can be modified to be reused for multiple pass processing in order to increase the number of processing elements that can be packed into a single FPGA and to increase the number of sequences that can be aligned in parallel in a single FPGA. This resolves the potential problem of many FPGA resources left unused for designs that have large values of short read length. When using the previously published conventional hardware design. FPGA implementation results show that, for large values of short read lengths (M>128), the proposed design has a slightly higher speed up and FPGA utilization over the the conventional one.Keywords: bioinformatics, genome sequence alignment, re-sequencing applications, systolic array
Procedia PDF Downloads 53127822 An Overview of Bioinformatics Methods to Detect Novel Riboswitches Highlighting the Importance of Structure Consideration
Authors: Danny Barash
Abstract:
Riboswitches are RNA genetic control elements that were originally discovered in bacteria and provide a unique mechanism of gene regulation. They work without the participation of proteins and are believed to represent ancient regulatory systems in the evolutionary timescale. One of the biggest challenges in riboswitch research is that many are found in prokaryotes but only a small percentage of known riboswitches have been found in certain eukaryotic organisms. The few examples of eukaryotic riboswitches were identified using sequence-based bioinformatics search methods that include some slight structural considerations. These pattern-matching methods were the first ones to be applied for the purpose of riboswitch detection and they can also be programmed very efficiently using a data structure called affix arrays, making them suitable for genome-wide searches of riboswitch patterns. However, they are limited by their ability to detect harder to find riboswitches that deviate from the known patterns. Several methods have been developed since then to tackle this problem. The most commonly used by practitioners is Infernal that relies on Hidden Markov Models (HMMs) and Covariance Models (CMs). Profile Hidden Markov Models were also carried out in the pHMM Riboswitch Scanner web application, independently from Infernal. Other computational approaches that have been developed include RMDetect by the use of 3D structural modules and RNAbor that utilizes Boltzmann probability of structural neighbors. We have tried to incorporate more sophisticated secondary structure considerations based on RNA folding prediction using several strategies. The first idea was to utilize window-based methods in conjunction with folding predictions by energy minimization. The moving window approach is heavily geared towards secondary structure consideration relative to sequence that is treated as a constraint. However, the method cannot be used genome-wide due to its high cost because each folding prediction by energy minimization in the moving window is computationally expensive, enabling to scan only at the vicinity of genes of interest. The second idea was to remedy the inefficiency of the previous approach by constructing a pipeline that consists of inverse RNA folding considering RNA secondary structure, followed by a BLAST search that is sequence-based and highly efficient. This approach, which relies on inverse RNA folding in general and our own in-house fragment-based inverse RNA folding program called RNAfbinv in particular, shows capability to find attractive candidates that are missed by Infernal and other standard methods being used for riboswitch detection. We demonstrate attractive candidates found by both the moving-window approach and the inverse RNA folding approach performed together with BLAST. We conclude that structure-based methods like the two strategies outlined above hold considerable promise in detecting riboswitches and other conserved RNAs of functional importance in a variety of organisms.Keywords: riboswitches, RNA folding prediction, RNA structure, structure-based methods
Procedia PDF Downloads 23427821 Assessment of DNA Sequence Encoding Techniques for Machine Learning Algorithms Using a Universal Bacterial Marker
Authors: Diego Santibañez Oyarce, Fernanda Bravo Cornejo, Camilo Cerda Sarabia, Belén Díaz Díaz, Esteban Gómez Terán, Hugo Osses Prado, Raúl Caulier-Cisterna, Jorge Vergara-Quezada, Ana Moya-Beltrán
Abstract:
The advent of high-throughput sequencing technologies has revolutionized genomics, generating vast amounts of genetic data that challenge traditional bioinformatics methods. Machine learning addresses these challenges by leveraging computational power to identify patterns and extract information from large datasets. However, biological sequence data, being symbolic and non-numeric, must be converted into numerical formats for machine learning algorithms to process effectively. So far, some encoding methods, such as one-hot encoding or k-mers, have been explored. This work proposes additional approaches for encoding DNA sequences in order to compare them with existing techniques and determine if they can provide improvements or if current methods offer superior results. Data from the 16S rRNA gene, a universal marker, was used to analyze eight bacterial groups that are significant in the pulmonary environment and have clinical implications. The bacterial genes included in this analysis are Prevotella, Abiotrophia, Acidovorax, Streptococcus, Neisseria, Veillonella, Mycobacterium, and Megasphaera. These data were downloaded from the NCBI database in Genbank file format, followed by a syntactic analysis to selectively extract relevant information from each file. For data encoding, a sequence normalization process was carried out as the first step. From approximately 22,000 initial data points, a subset was generated for testing purposes. Specifically, 55 sequences from each bacterial group met the length criteria, resulting in an initial sample of approximately 440 sequences. The sequences were encoded using different methods, including one-hot encoding, k-mers, Fourier transform, and Wavelet transform. Various machine learning algorithms, such as support vector machines, random forests, and neural networks, were trained to evaluate these encoding methods. The performance of these models was assessed using multiple metrics, including the confusion matrix, ROC curve, and F1 Score, providing a comprehensive evaluation of their classification capabilities. The results show that accuracies between encoding methods vary by up to approximately 15%, with the Fourier transform obtaining the best results for the evaluated machine learning algorithms. These findings, supported by the detailed analysis using the confusion matrix, ROC curve, and F1 Score, provide valuable insights into the effectiveness of different encoding methods and machine learning algorithms for genomic data analysis, potentially improving the accuracy and efficiency of bacterial classification and related genomic studies.Keywords: DNA encoding, machine learning, Fourier transform, Fourier transformation
Procedia PDF Downloads 2327820 Single Cell and Spatial Transcriptomics: A Beginners Viewpoint from the Conceptual Pipeline
Authors: Leo Nnamdi Ozurumba-Dwight
Abstract:
Messenger ribooxynucleic acid (mRNA) molecules are compositional, protein-based. These proteins, encoding mRNA molecules (which collectively connote the transcriptome), when analyzed by RNA sequencing (RNAseq), unveils the nature of gene expression in the RNA. The obtained gene expression provides clues of cellular traits and their dynamics in presentations. These can be studied in relation to function and responses. RNAseq is a practical concept in Genomics as it enables detection and quantitative analysis of mRNA molecules. Single cell and spatial transcriptomics both present varying avenues for expositions in genomic characteristics of single cells and pooled cells in disease conditions such as cancer, auto-immune diseases, hematopoietic based diseases, among others, from investigated biological tissue samples. Single cell transcriptomics helps conduct a direct assessment of each building unit of tissues (the cell) during diagnosis and molecular gene expressional studies. A typical technique to achieve this is through the use of a single-cell RNA sequencer (scRNAseq), which helps in conducting high throughput genomic expressional studies. However, this technique generates expressional gene data for several cells which lack presentations on the cells’ positional coordinates within the tissue. As science is developmental, the use of complimentary pre-established tissue reference maps using molecular and bioinformatics techniques has innovatively sprung-forth and is now used to resolve this set back to produce both levels of data in one shot of scRNAseq analysis. This is an emerging conceptual approach in methodology for integrative and progressively dependable transcriptomics analysis. This can support in-situ fashioned analysis for better understanding of tissue functional organization, unveil new biomarkers for early-stage detection of diseases, biomarkers for therapeutic targets in drug development, and exposit nature of cell-to-cell interactions. Also, these are vital genomic signatures and characterizations of clinical applications. Over the past decades, RNAseq has generated a wide array of information that is igniting bespoke breakthroughs and innovations in Biomedicine. On the other side, spatial transcriptomics is tissue level based and utilized to study biological specimens having heterogeneous features. It exposits the gross identity of investigated mammalian tissues, which can then be used to study cell differentiation, track cell line trajectory patterns and behavior, and regulatory homeostasis in disease states. Also, it requires referenced positional analysis to make up of genomic signatures that will be sassed from the single cells in the tissue sample. Given these two presented approaches to RNA transcriptomics study in varying quantities of cell lines, with avenues for appropriate resolutions, both approaches have made the study of gene expression from mRNA molecules interesting, progressive, developmental, and helping to tackle health challenges head-on.Keywords: transcriptomics, RNA sequencing, single cell, spatial, gene expression.
Procedia PDF Downloads 12227819 Efficient Pre-Processing of Single-Cell Assay for Transposase Accessible Chromatin with High-Throughput Sequencing Data
Authors: Fan Gao, Lior Pachter
Abstract:
The primary tool currently used to pre-process 10X Chromium single-cell ATAC-seq data is Cell Ranger, which can take very long to run on standard datasets. To facilitate rapid pre-processing that enables reproducible workflows, we present a suite of tools called scATAK for pre-processing single-cell ATAC-seq data that is 15 to 18 times faster than Cell Ranger on mouse and human samples. Our tool can also calculate chromatin interaction potential matrices, and generate open chromatin signal and interaction traces for cell groups. We use scATAK tool to explore the chromatin regulatory landscape of a healthy adult human brain and unveil cell-type specific features, and show that it provides a convenient and computational efficient approach for pre-processing single-cell ATAC-seq data.Keywords: single-cell, ATAC-seq, bioinformatics, open chromatin landscape, chromatin interactome
Procedia PDF Downloads 15527818 Efficient Reuse of Exome Sequencing Data for Copy Number Variation Callings
Authors: Chen Wang, Jared Evans, Yan Asmann
Abstract:
With the quick evolvement of next-generation sequencing techniques, whole-exome or exome-panel data have become a cost-effective way for detection of small exonic mutations, but there has been a growing desire to accurately detect copy number variations (CNVs) as well. In order to address this research and clinical needs, we developed a sequencing coverage pattern-based method not only for copy number detections, data integrity checks, CNV calling, and visualization reports. The developed methodologies include complete automation to increase usability, genome content-coverage bias correction, CNV segmentation, data quality reports, and publication quality images. Automatic identification and removal of poor quality outlier samples were made automatically. Multiple experimental batches were routinely detected and further reduced for a clean subset of samples before analysis. Algorithm improvements were also made to improve somatic CNV detection as well as germline CNV detection in trio family. Additionally, a set of utilities was included to facilitate users for producing CNV plots in focused genes of interest. We demonstrate the somatic CNV enhancements by accurately detecting CNVs in whole exome-wide data from the cancer genome atlas cancer samples and a lymphoma case study with paired tumor and normal samples. We also showed our efficient reuses of existing exome sequencing data, for improved germline CNV calling in a family of the trio from the phase-III study of 1000 Genome to detect CNVs with various modes of inheritance. The performance of the developed method is evaluated by comparing CNV calling results with results from other orthogonal copy number platforms. Through our case studies, reuses of exome sequencing data for calling CNVs have several noticeable functionalities, including a better quality control for exome sequencing data, improved joint analysis with single nucleotide variant calls, and novel genomic discovery of under-utilized existing whole exome and custom exome panel data.Keywords: bioinformatics, computational genetics, copy number variations, data reuse, exome sequencing, next generation sequencing
Procedia PDF Downloads 25727817 An Efficient Algorithm for Global Alignment of Protein-Protein Interaction Networks
Authors: Duc Dong Do, Ngoc Ha Tran, Thanh Hai Dang, Cao Cuong Dang, Xuan Huan Hoang
Abstract:
Global aligning two protein-protein interaction networks is an essentially important task in bioinformatics/computational biology field of study. It is a challenging and widely studied research topic in recent years. Accurately aligned networks allow us to identify functional modules of proteins and/ororthologous proteins from which unknown functions of a protein can be inferred. We here introduce a novel efficient heuristic global network alignment algorithm called FASTAn, including two phases: the first to construct an initial alignment and the second to improve such alignment by exerting a local optimization repeated procedure. The experimental results demonstrated that FASTAn outperformed the state-of-the-art global network alignment algorithm namely SPINAL in terms of both commonly used objective scores and the run-time.Keywords: FASTAn, Heuristic algorithm, biological network alignment, protein-protein interaction networks
Procedia PDF Downloads 60427816 DeepOmics: Deep Learning for Understanding Genome Functioning and the Underlying Genetic Causes of Disease
Authors: Vishnu Pratap Singh Kirar, Madhuri Saxena
Abstract:
Advancement in sequence data generation technologies is churning out voluminous omics data and posing a massive challenge to annotate the biological functional features. With so much data available, the use of machine learning methods and tools to make novel inferences has become obvious. Machine learning methods have been successfully applied to a lot of disciplines, including computational biology and bioinformatics. Researchers in computational biology are interested to develop novel machine learning frameworks to classify the huge amounts of biological data. In this proposal, it plan to employ novel machine learning approaches to aid the understanding of how apparently innocuous mutations (in intergenic DNA and at synonymous sites) cause diseases. We are also interested in discovering novel functional sites in the genome and mutations in which can affect a phenotype of interest.Keywords: genome wide association studies (GWAS), next generation sequencing (NGS), deep learning, omics
Procedia PDF Downloads 97