Search results for: non-coding genome
227 The First Complete Mitochondrial Genome of Melon Thrips, Thrips palmi (Thripinae: Thysanoptera): Vector for Tospoviruses
Authors: Kaomud Tyagi, Rajasree Chakraborty, Shantanu Kundu, Devkant Singha, Kailash Chandra, Vikas Kumar
Abstract:
The melon thrips, Thrips palmi is a serious pest of a wide range of agriculture crops and also act as vectors for plant viruses (genus Tospovirus, family Bunyaviridae). More molecular data on this species is required to understand the cryptic speciation and evolutionary affiliations. Mitochondrial genomes have been widely used in phylogenetic and evolutionary studies in insect. So far, mitogenomes of five thrips species (Anaphothrips obscurus, Frankliniella intonsa, Frankliniella occidentalis, Scirtothrips dorsalis and Thrips imaginis) is available in the GenBank database. In this study, we sequenced the first complete mitogenome T. palmi and compared it with available thrips mitogenomes. We assembled the mitogenome from the whole genome sequencing data generated using Illumina Hiseq2500. Annotation was performed using MITOS web-server to estimate the location of protein coding genes (PCGs), transfer RNA (tRNAs), ribosomal RNAs (rRNAs) and their secondary structures. The boundaries of PCGs and rRNAs was confirmed manually in NCBI. Phylogenetic analyses were performed using the 13 PCGs data using maximum likelihood (ML) in PAUP, and Bayesian inference (BI) in MrBayes 3.2. The complete mitogenome of T. palmi was 15,333 base pairs (bp), which was greater than the genomes of A. obscurus (14,890bp), F. intonsa (15,215 bp), F. occidentalis (14,889 bp) and S. dorsalis South Asia strain (SA1) (14,283 bp), but smaller than the genomes of T. imaginis (15,407 bp) and S. dorsalis East Asia strain (EA1) (15,343bp). Like in other thrips species, the mitochondrial genome of T. palmi was represented by 37 genes, including 13 PCGs, large and small ribosomal RNA (rrnL and rrnS) genes, 22 transfer RNA (tRNAs) genes (with one extra gene for trn-Serine) and two A+T-rich control regions (CR1 and CR2). Thirty one genes were observed on heavy (H) strand and six genes on the light (L) strand. The six tRNA genes (trnG,trnK, trnY, trnW, trnF, and trnH) were found to be conserved in all thrips species mitogenomes in their locations relative to a protein-coding or rRNA gene upstream or downstream. The gene arrangements of T. palmi is very close to T. imaginis except the rearrangements in tRNAs genes: trnR (arginine), and trnE (glutamic acid) were found to be located between cox3 and CR2 in T. imaginis which were translocated between atp6 and CR1 in T. palmi; trnL1 (Leucine) and trnS1(Serine) were located between atp6 and CR1 in T. imaginis which were translocated between cox3 and CR2 in T. palmi. The location of CR1 upstream of nad5 gene was suggested to be ancestral condition of the thrips species in subfamily Thripinae, was also observed in T. palmi. Both the Maximum likelihood (ML) and Bayesian Inference (BI) phylogenetic trees generated resulted in similar topologies. The T. palmi was clustered with T. imaginis. We concluded that more molecular data on the diverse thrips species from different hierarchical level is needed, to understand the phylogenetic and evolutionary relationships among them.Keywords: thrips, comparative mitogenomics, gene rearrangements, phylogenetic analysis
Procedia PDF Downloads 168226 Identification of Candidate Congenital Heart Defects Biomarkers by Applying a Random Forest Approach on DNA Methylation Data
Authors: Kan Yu, Khui Hung Lee, Eben Afrifa-Yamoah, Jing Guo, Katrina Harrison, Jack Goldblatt, Nicholas Pachter, Jitian Xiao, Guicheng Brad Zhang
Abstract:
Background and Significance of the Study: Congenital Heart Defects (CHDs) are the most common malformation at birth and one of the leading causes of infant death. Although the exact etiology remains a significant challenge, epigenetic modifications, such as DNA methylation, are thought to contribute to the pathogenesis of congenital heart defects. At present, no existing DNA methylation biomarkers are used for early detection of CHDs. The existing CHD diagnostic techniques are time-consuming and costly and can only be used to diagnose CHDs after an infant was born. The present study employed a machine learning technique to analyse genome-wide methylation data in children with and without CHDs with the aim to find methylation biomarkers for CHDs. Methods: The Illumina Human Methylation EPIC BeadChip was used to screen the genome‐wide DNA methylation profiles of 24 infants diagnosed with congenital heart defects and 24 healthy infants without congenital heart defects. Primary pre-processing was conducted by using RnBeads and limma packages. The methylation levels of top 600 genes with the lowest p-value were selected and further investigated by using a random forest approach. ROC curves were used to analyse the sensitivity and specificity of each biomarker in both training and test sample sets. The functionalities of selected genes with high sensitivity and specificity were then assessed in molecular processes. Major Findings of the Study: Three genes (MIR663, FGF3, and FAM64A) were identified from both training and validating data by random forests with an average sensitivity and specificity of 85% and 95%. GO analyses for the top 600 genes showed that these putative differentially methylated genes were primarily associated with regulation of lipid metabolic process, protein-containing complex localization, and Notch signalling pathway. The present findings highlight that aberrant DNA methylation may play a significant role in the pathogenesis of congenital heart defects.Keywords: biomarker, congenital heart defects, DNA methylation, random forest
Procedia PDF Downloads 158225 Phosphate Use Efficiency in Plants: A GWAS Approach to Identify the Pathways Involved
Authors: Azizah M. Nahari, Peter Doerner
Abstract:
Phosphate (Pi) is one of the essential macronutrients in plant growth and development, and it plays a central role in metabolic processes in plants, particularly photosynthesis and respiration. Limitation of crop productivity by Pi is widespread and is likely to increase in the future. Applications of Pi fertilizers have improved soil Pi fertility and crop production; however, they have also caused environmental damage. Therefore, in order to reduce dependence on unsustainable Pi fertilizers, a better understanding of phosphate use efficiency (PUE) is required for engineering nutrient-efficient crop plants. Enhanced Pi efficiency can be achieved by improved productivity per unit Pi taken up. We aim to identify, by using association mapping, general features of the most important loci that contribute to increased PUE to allow us to delineate the physiological pathways involved in defining this trait in the model plant Arabidopsis. As PUE is in part determined by the efficiency of uptake, we designed a hydroponic system to avoid confounding effects due to differences in root system architecture leading to differences in Pi uptake. In this system, 18 parental lines and 217 lines of the MAGIC population (a Multiparent Advanced Generation Inter-Cross) grown in high and low Pi availability conditions. The results showed revealed a large variation of PUE in the parental lines, indicating that the MAGIC population was well suited to identify PUE loci and pathways. 2 of 18 parental lines had the highest PUE in low Pi while some lines responded strongly and increased PUE with increased Pi. Having examined the 217 MAGIC population, considerable variance in PUE was found. A general feature was the trend of most lines to exhibit higher PUE when grown in low Pi conditions. Association mapping is currently in progress, but initial observations indicate that a wide variety of physiological processes are involved in influencing PUE in Arabidopsis. The combination of hydroponic growth methods and genome-wide association mapping is a powerful tool to identify the physiological pathways underpinning complex quantitative traits in plants.Keywords: hydroponic system growth, phosphate use efficiency (PUE), Genome-wide association mapping, MAGIC population
Procedia PDF Downloads 321224 Wide Dissemination of CTX-M-Type Extended-Spectrum β-Lactamases in Korean Swine Farms
Authors: Young Ah Kim, Hyunsoo Kim, Eun-Jeong Yoon, Young Hee Seo, Kyungwon Lee
Abstract:
Extended-spectrum β-lactamase (ESBL)-producing Escherichia coli from food animals are considered as a reservoir for transmission of ESBL genes to human. The aim of this study is to assess the prevalence and molecular epidemiology of ESBL-producing E. coli colonization in pigs, farm workers, and farm environments to elucidate the transmission of multidrug-resistant clones from animal to human. Nineteen pig farms were enrolled across the country in Korea from August to December 2017. ESBL-producing E. coli isolates were detected in 190 pigs, 38 farm workers, and 112 sites of farm environments using ChromID ESBL (bioMerieux, Marcy l'Etoile, France), directly (stool or perirectal swab) or after enrichment (sewage). Antimicrobial susceptibility tests were done with disk diffusion methods and blaTEM, blaSHV, and blaCTX-M were detected with PCR and sequencing. The genomes of the four CTX-M-55-producing E. coli isolates from various sources in one farm were entirely sequenced to assess the relatedness of the strains. Whole genome sequencing (WGS) was performed with PacBio RS II system (Pacific Biosciences, Menlo Park, CA, USA). ESBL genotypes were 85 CTX-M-1 group (one CTX-M-3, 23 CTX-M-15, one CTX-M-28, 59 CTX-M-55, one CTX-M-69) and 60 CTX-M-9 group (41 CTX-M-14, one CTX-M-17, one CTX-M-27, 13 CTX-M-65, 4 CTX-M-102) in total 145 isolates. The rectal colonization rates were 53.2% (101/190) in pigs and 39.5% (15/38) in farm workers. In WGS, sequence types (STs) were determined as ST69 (E. coli PJFH115 isolate from a human carrier), ST457 (two E. coli isolates PJFE101 recovered from a fence and PJFA1104 from a pig) and ST5899 (E. coli PJFA173 isolate from the other pig). The four plasmids encoding CTX-M-55 (88,456 to 149, 674 base pair), whether it belonged to IncFIB or IncFIC-IncFIB type, shared IncF backbone furnishing the conjugal elements, suggesting of genes originated from same ancestor. In conclusion, the prevalence of ESBL-producing E. coli in swine farms was surprisingly high, and many of them shared common ESBL genotypes of clinical isolates such as CTX-M-14, 15, and 55 in Korea. It could spread by horizontal transfer between isolates from different reservoirs (human-animal-environment).Keywords: Escherichia coli, extended-spectrum β-lactamase, prevalence, whole genome sequencing
Procedia PDF Downloads 203223 PARP1 Links Transcription of a Subset of RBL2-Dependent Genes with Cell Cycle Progression
Authors: Ewelina Wisnik, Zsolt Regdon, Kinga Chmielewska, Laszlo Virag, Agnieszka Robaszkiewicz
Abstract:
Apart from protecting genome, PARP1 has been documented to regulate many intracellular processes inter alia gene transcription by physically interacting with chromatin bound proteins and by their ADP-ribosylation. Our recent findings indicate that expression of PARP1 decreases during the differentiation of human CD34+ hematopoietic stem cells to monocytes as a consequence of differentiation-associated cell growth arrest and formation of E2F4-RBL2-HDAC1-SWI/SNF repressive complex at the promoter of this gene. Since the RBL2 complexes repress genes in a E2F-dependent manner and are widespread in the genome in G0 arrested cells, we asked (a) if RBL2 directly contributes to defining monocyte phenotype and function by targeting gene promoters and (b) if RBL2 controls gene transcription indirectly by repressing PARP1. For identification of genes controlled by RBL2 and/or PARP1,we used primer libraries for surface receptors and TLR signaling mediators, genes were silenced by siRNA or shRNA, analysis of gene promoter occupation by selected proteins was carried out by ChIP-qPCR, while statistical analysis in GraphPad Prism 5 and STATISTICA, ChIP-Seq data were analysed in Galaxy 2.5.0.0. On the list of 28 genes regulated by RBL2, we identified only four solely repressed by RBL2-E2F4-HDAC1-BRM complex. Surprisingly, 24 out of 28 emerged genes controlled by RBL2 were co-regulated by PARP1 in six different manners. In one mode of RBL2/PARP1 co-operation, represented by MAP2K6 and MAPK3, PARP1 was found to associate with gene promoters upon RBL2 silencing, which was previously shown to restore PARP1 expression in monocytes. PARP1 effect on gene transcription was observed only in the presence of active EP300, which acetylated gene promoters and activated transcription. Further analysis revealed that PARP1 binding to MA2K6 and MAPK3 promoters enabled recruitment of EP300 in monocytes, while in proliferating cancer cell lines, which actively transcribe PARP1, this protein maintained EP300 at the promoters of MA2K6 and MAPK3. Genome-wide analysis revealed a similar distribution of PARP1 and EP300 around transcription start sites and the co-occupancy of some gene promoters by PARP1 and EP300 in cancer cells. Here, we described a new RBL2/PARP1/EP300 axis which controls gene transcription regardless of the cell type. In this model cell, cycle-dependent transcription of PARP1 regulates expression of some genes repressed by RBL2 upon cell cycle limitation. Thus, RBL2 may indirectly regulate transcription of some genes by controlling the expression of EP300-recruiting PARP1. Acknowledgement: This work was financed by Polish National Science Centre grants nr DEC-2013/11/D/NZ2/00033 and DEC-2015/19/N/NZ2/01735. L.V. is funded by the National Research, Development and Innovation Office grants GINOP-2.3.2-15-2016-00020 TUMORDNS, GINOP-2.3.2-15-2016-00048-STAYALIVE and OTKA K112336. AR is supported by Polish Ministry of Science and Higher Education 776/STYP/11/2016.Keywords: retinoblastoma transcriptional co-repressor like 2 (RBL2), poly(ADP-ribose) polymerase 1 (PARP1), E1A binding protein p300 (EP300), monocytes
Procedia PDF Downloads 209222 Deleterious SNP’s Detection Using Machine Learning
Authors: Hamza Zidoum
Abstract:
This paper investigates the impact of human genetic variation on the function of human proteins using machine-learning algorithms. Single-Nucleotide Polymorphism represents the most common form of human genome variation. We focus on the single amino-acid polymorphism located in the coding region as they can affect the protein function leading to pathologic phenotypic change. We use several supervised Machine Learning methods to identify structural properties correlated with increased risk of the missense mutation being damaging. SVM associated with Principal Component Analysis give the best performance.Keywords: single-nucleotide polymorphism, machine learning, feature selection, SVM
Procedia PDF Downloads 377221 In Silico Screening, Identification and Validation of Cryptosporidium hominis Hypothetical Protein and Virtual Screening of Inhibitors as Therapeutics
Authors: Arpit Kumar Shrivastava, Subrat Kumar, Rajani Kanta Mohapatra, Priyadarshi Soumyaranjan Sahu
Abstract:
Computational approaches to predict structure, function and other biological characteristics of proteins are becoming more common in comparison to the traditional methods in drug discovery. Cryptosporidiosis is a major zoonotic diarrheal disease particularly in children, which is caused primarily by Cryptosporidium hominis and Cryptosporidium parvum. Currently, there are no vaccines for cryptosporidiosis and recommended drugs are not effective. With the availability of complete genome sequence of C. hominis, new targets have been recognized for the development of effective and better drugs and/or vaccines. We identified a unique hypothetical epitopic protein in C. hominis genome through BLASTP analysis. A 3D model of the hypothetical protein was generated using I-Tasser server through threading methodology. The quality of the model was validated through Ramachandran plot by PROCHECK server. The functional annotation of the hypothetical protein through DALI server revealed structural similarity with human Transportin 3. Phylogenetic analysis for this hypothetical protein also showed C. hominis hypothetical protein (CUV04613) was the closely related to human transportin 3 protein. The 3D protein model is further subjected to virtual screening study with inhibitors from the Zinc Database by using Dock Blaster software. Docking study reported N-(3-chlorobenzyl) ethane-1,2-diamine as the best inhibitor in terms of docking score. Docking analysis elucidated that Leu 525, Ile 526, Glu 528, Glu 529 are critical residues for ligand–receptor interactions. The molecular dynamic simulation was done to access the reliability of the binding pose of inhibitor and protein complex using GROMACS software at 10ns time point. Trajectories were analyzed at each 2.5 ns time interval, among which, H-bond with LEU-525 and GLY- 530 are significantly present in MD trajectories. Furthermore, antigenic determinants of the protein were determined with the help of DNA Star software. Our study findings showed a great potential in order to provide insights in the development of new drug(s) or vaccine(s) for control as well as prevention of cryptosporidiosis among humans and animals.Keywords: cryptosporidium hominis, hypothetical protein, molecular docking, molecular dynamics simulation
Procedia PDF Downloads 365220 Genomic Characterisation of Equine Sarcoid-derived Bovine Papillomavirus Type 1 and 2 Using Nanopore-Based Sequencing
Authors: Lien Gysens, Bert Vanmechelen, Maarten Haspeslagh, Piet Maes, Ann Martens
Abstract:
Bovine papillomavirus (BPV) types 1 and 2 play a central role in the etiology of the most common neoplasm in horses, the equine sarcoid. The unknown mechanism behind the unique variety in a clinical presentation on the one hand and the host-dependent clinical outcome of BPV-1 infection, on the other hand, indicate the involvement of additional factors. Earlier studies have reported the potential functional significance of intratypic sequence variants, along with the existence of sarcoid-sourced BPV variants. Therefore, intratypic sequence variation seems to be an important emerging viral factor. This study aimed to give a broad insight in sarcoid-sourced BPV variation and explore its potential association with disease presentation. In order to do this, a nanopore sequencing approach was successfully optimized for screening a wide spectrum of clinical samples. Specimens of each tumour were initially screened for BPV-1/-2 by quantitative real-time PCR. A custom-designed primer set was used on BPV-positive samples to amplify the complete viral genome in two multiplex PCR reactions, resulting in a set of overlapping amplicons. For phylogenetic analysis, separate alignments were made of all available complete genome sequences for BPV-1/-2. The resulting alignments were used to infer Bayesian phylogenetic trees. We found substantial genetic variation among sarcoid-derived BPV-1, although this variation could not be linked to disease severity. Several of the BPV-1 genomes had multiple major deletions. Remarkably, the majority of the cluster within the region coding for late viral genes. Together with the extensiveness (up to 603 nucleotides) of the described deletions, this suggests an altered function of L1/L2 in disease pathogenesis. By generating a significant amount of complete-length BPV genomes, we succeeded in introducing next-generation sequencing into veterinary research focusing on the equine sarcoid, thus facilitating the first report of both nanopore-based sequencing of complete sarcoid-sourced BPV-1/-2 and the simultaneous nanopore sequencing of multiple complete genomes originating from a single clinical sample.Keywords: Bovine papillomavirus, equine sarcoid, horse, nanopore sequencing, phylogenetic analysis
Procedia PDF Downloads 178219 Transcriptomic Analysis of Acanthamoeba castellanii Virulence Alteration by Epigenetic DNA Methylation
Authors: Yi-Hao Wong, Li-Li Chan, Chee-Onn Leong, Stephen Ambu, Joon-Wah Mak, Priyasashi Sahu
Abstract:
Background: Acanthamoeba is a genus of amoebae which lives as a free-living in nature or as a human pathogen that causes severe brain and eye infections. Virulence potential of Acanthamoeba is not constant and can change with growth conditions. DNA methylation, an epigenetic process which adds methyl groups to DNA, is used by eukaryotic cells, including several human parasites to control their gene expression. We used qPCR, siRNA gene silencing, and RNA sequencing (RNA-Seq) to study DNA-methyltransferase gene family (DNMT) in order to indicate the possibility of its involvement in programming Acanthamoeba virulence potential. Methods: A virulence-attenuated Acanthamoeba isolate (designation: ATCC; original isolate: ATCC 50492) was subjected to mouse passages to restore its pathogenicity; a virulence-reactivated isolate (designation: AC/5) was generated. Several established factors associated with Acanthamoeba virulence phenotype were examined to confirm the succession of reactivation process. Differential gene expression of DNMT between ATCC and AC/5 isolates was performed by qPCR. Silencing on DNMT gene expression in AC/5 isolate was achieved by siRNA duplex. Total RNAs extracted from ATCC, AC/5, and siRNA-treated (designation: si-146) were subjected to RNA-Seq for comparative transcriptomic analysis in order to identify the genome-wide effect of DNMT in regulating Acanthamoeba gene expression. qPCR was performed to validate the RNA-Seq results. Results: Physiological and cytophatic assays demonstrated an increased in virulence potential of AC/5 isolate after mouse passages. DNMT gene expression was significantly higher in AC/5 compared to ATCC isolate (p ≤ 0.01) by qPCR. si-146 duplex reduced DNMT gene expression in AC/5 isolate by 30%. Comparative transcriptome analysis identified the differentially expressed genes, with 3768 genes in AC/5 vs ATCC isolate; 2102 genes in si-146 vs AC/5 isolate and 3422 genes in si-146 vs ATCC isolate, respectively (fold-change of ≥ 2 or ≤ 0.5, p-value adjusted (padj) < 0.05). Of these, 840 and 1262 genes were upregulated and downregulated, respectively, in si-146 vs AC/5 isolate. Eukaryotic orthologous group (KOG) assignments revealed a higher percentage of downregulated gene expression in si-146 compared to AC/5 isolate, were related to posttranslational modification, signal transduction and energy production. Gene Ontology (GO) terms for those downregulated genes shown were associated with transport activity, oxidation-reduction process, and metabolic process. Among these downregulated genes were putative genes encoded for heat shock proteins, transporters, ubiquitin-related proteins, proteins for vesicular trafficking (small GTPases), and oxidoreductases. Functional analysis of similar predicted proteins had been described in other parasitic protozoa for their survival and pathogenicity. Decreased expression of these genes in si146-treated isolate may account in part for Acanthamoeba reduced pathogenicity. qPCR on 6 selected genes upregulated in AC/5 compared to ATCC isolate corroborated the RNA sequencing findings, indicating a good concordance between these two analyses. Conclusion: To the best of our knowledge, this study represents the first genome-wide analysis of DNA methylation and its effects on gene expression in Acanthamoeba spp. The present data indicate that DNA methylation has substantial effect on global gene expression, allowing further dissection of the genome-wide effects of DNA-methyltransferase gene in regulating Acanthamoeba pathogenicity.Keywords: Acanthamoeba, DNA methylation, RNA sequencing, virulence
Procedia PDF Downloads 196218 Prevalence and Mechanisms of Antibiotic Resistance in Escherichia coli Isolated from Mastitic Dairy Cattle in Canada
Authors: Satwik Majumder, Dongyun Jung, Jennifer Ronholm, Saji George
Abstract:
Bovine mastitis is the most common infectious disease in dairy cattle, with major economic implications for the dairy industry worldwide. Continuous monitoring for the emergence of antimicrobial resistance (AMR) among bacterial isolates from dairy farms is vital not only for animal husbandry but also for public health. In this study, the prevalence of AMR in 113 Escherichia coli isolates from cases of bovine clinical mastitis in Canada was investigated. Kirby-Bauer disk diffusion test with 18 antibiotics and microdilution method with three heavy metals (copper, zinc, and silver) was performed to determine the antibiotic and heavy-metal susceptibility. Resistant strains were assessed for efflux and ß-lactamase activities besides assessing biofilm formation and hemolysis. Whole-genome sequences for each of the isolates were examined to detect the presence of genes corresponding to the observed AMR and virulence factors. Phenotypic analysis revealed that 32 isolates were resistant to one or more antibiotics, and 107 showed resistance against at least one heavy metal. Quinolones and silver were the most efficient against the tested isolates. Among the AMR isolates, AcrAB-TolC efflux activity and ß-lactamase enzyme activities were detected in 13 and 14 isolates, respectively. All isolates produced biofilm but with different capacities, and 33 isolates showed α-hemolysin activity. A positive correlation (Pearson r = +0.89) between efflux pump activity and quantity of biofilm was observed. Genes associated with aggregation, adhesion, cyclic di-GMP, quorum sensing were detected in the AMR isolates, corroborating phenotype observations. This investigation showed the prevalence of AMR in E. coli isolates from bovine clinical mastitis. The results also suggest the inadequacy of antimicrobials with a single mode of action to curtail AMR bacteria with multiple mechanisms of resistance and virulence factors. Therefore, it calls for combinatorial therapy for the effective management of AMR infections in dairy farms and combats its potential transmission to the food supply chain through milk and dairy products.Keywords: antimicrobial resistance, E. coli, bovine mastitis, antibiotics, heavy-metals, efflux pump, ß-lactamase enzyme, biofilm, whole-genome sequencing
Procedia PDF Downloads 216217 Towards End-To-End Disease Prediction from Raw Metagenomic Data
Authors: Maxence Queyrel, Edi Prifti, Alexandre Templier, Jean-Daniel Zucker
Abstract:
Analysis of the human microbiome using metagenomic sequencing data has demonstrated high ability in discriminating various human diseases. Raw metagenomic sequencing data require multiple complex and computationally heavy bioinformatics steps prior to data analysis. Such data contain millions of short sequences read from the fragmented DNA sequences and stored as fastq files. Conventional processing pipelines consist in multiple steps including quality control, filtering, alignment of sequences against genomic catalogs (genes, species, taxonomic levels, functional pathways, etc.). These pipelines are complex to use, time consuming and rely on a large number of parameters that often provide variability and impact the estimation of the microbiome elements. Training Deep Neural Networks directly from raw sequencing data is a promising approach to bypass some of the challenges associated with mainstream bioinformatics pipelines. Most of these methods use the concept of word and sentence embeddings that create a meaningful and numerical representation of DNA sequences, while extracting features and reducing the dimensionality of the data. In this paper we present an end-to-end approach that classifies patients into disease groups directly from raw metagenomic reads: metagenome2vec. This approach is composed of four steps (i) generating a vocabulary of k-mers and learning their numerical embeddings; (ii) learning DNA sequence (read) embeddings; (iii) identifying the genome from which the sequence is most likely to come and (iv) training a multiple instance learning classifier which predicts the phenotype based on the vector representation of the raw data. An attention mechanism is applied in the network so that the model can be interpreted, assigning a weight to the influence of the prediction for each genome. Using two public real-life data-sets as well a simulated one, we demonstrated that this original approach reaches high performance, comparable with the state-of-the-art methods applied directly on processed data though mainstream bioinformatics workflows. These results are encouraging for this proof of concept work. We believe that with further dedication, the DNN models have the potential to surpass mainstream bioinformatics workflows in disease classification tasks.Keywords: deep learning, disease prediction, end-to-end machine learning, metagenomics, multiple instance learning, precision medicine
Procedia PDF Downloads 125216 CRISPR/Cas9 Based Gene Stacking in Plants for Virus Resistance Using Site-Specific Recombinases
Authors: Sabin Aslam, Sultan Habibullah Khan, James G. Thomson, Abhaya M. Dandekar
Abstract:
Losses due to viral diseases are posing a serious threat to crop production. A quick breakdown of resistance to viruses like Cotton Leaf Curl Virus (CLCuV) demands the application of a proficient technology to engineer durable resistance. Gene stacking has recently emerged as a potential approach for integrating multiple genes in crop plants. In the present study, recombinase technology has been used for site-specific gene stacking. A target vector (pG-Rec) was designed for engineering a predetermined specific site in the plant genome whereby genes can be stacked repeatedly. Using Agrobacterium-mediated transformation, the pG-Rec was transformed into Coker-312 along with Nicotiana tabacum L. cv. Xanthi and Nicotiana benthamiana. The transgene analysis of target lines was conducted through junction PCR. The transgene positive target lines were used for further transformations to site-specifically stack two genes of interest using Bxb1 and PhiC31 recombinases. In the first instance, Cas9 driven by multiplex gRNAs (for Rep gene of CLCuV) was site-specifically integrated into the target lines and determined by the junction PCR and real-time PCR. The resulting plants were subsequently used to stack the second gene of interest (AVP3 gene from Arabidopsis for enhancing cotton plant growth). The addition of the genes is simultaneously achieved with the removal of marker genes for recycling with the next round of gene stacking. Consequently, transgenic marker-free plants were produced with two genes stacked at the specific site. These transgenic plants can be potential germplasm to introduce resistance against various strains of cotton leaf curl virus (CLCuV) and abiotic stresses. The results of the research demonstrate gene stacking in crop plants, a technology that can be used to introduce multiple genes sequentially at predefined genomic sites. The current climate change scenario highlights the use of such technologies so that gigantic environmental issues can be tackled by several traits in a single step. After evaluating virus resistance in the resulting plants, the lines can be a primer to initiate stacking of further genes in Cotton for other traits as well as molecular breeding with elite cotton lines.Keywords: cotton, CRISPR/Cas9, gene stacking, genome editing, recombinases
Procedia PDF Downloads 155215 Modeling Competition Between Subpopulations with Variable DNA Content in Resource-Limited Microenvironments
Authors: Parag Katira, Frederika Rentzeperis, Zuzanna Nowicka, Giada Fiandaca, Thomas Veith, Jack Farinhas, Noemi Andor
Abstract:
Resource limitations shape the outcome of competitions between genetically heterogeneous pre-malignant cells. One example of such heterogeneity is in the ploidy (DNA content) of pre-malignant cells. A whole-genome duplication (WGD) transforms a diploid cell into a tetraploid one and has been detected in 28-56% of human cancers. If a tetraploid subclone expands, it consistently does so early in tumor evolution, when cell density is still low, and competition for nutrients is comparatively weak – an observation confirmed for several tumor types. WGD+ cells need more resources to synthesize increasing amounts of DNA, RNA, and proteins. To quantify resource limitations and how they relate to ploidy, we performed a PAN cancer analysis of WGD, PET/CT, and MRI scans. Segmentation of >20 different organs from >900 PET/CT scans were performed with MOOSE. We observed a strong correlation between organ-wide population-average estimates of Oxygen and the average ploidy of cancers growing in the respective organ (Pearson R = 0.66; P= 0.001). In-vitro experiments using near-diploid and near-tetraploid lineages derived from a breast cancer cell line supported the hypothesis that DNA content influences Glucose- and Oxygen-dependent proliferation-, death- and migration rates. To model how subpopulations with variable DNA content compete in the resource-limited environment of the human brain, we developed a stochastic state-space model of the brain (S3MB). The model discretizes the brain into voxels, whereby the state of each voxel is defined by 8+ variables that are updated over time: stiffness, Oxygen, phosphate, glucose, vasculature, dead cells, migrating cells and proliferating cells of various DNA content, and treat conditions such as radiotherapy and chemotherapy. Well-established Fokker-Planck partial differential equations govern the distribution of resources and cells across voxels. We applied S3MB on sequencing and imaging data obtained from a primary GBM patient. We performed whole genome sequencing (WGS) of four surgical specimens collected during the 1ˢᵗ and 2ⁿᵈ surgeries of the GBM and used HATCHET to quantify its clonal composition and how it changes between the two surgeries. HATCHET identified two aneuploid subpopulations of ploidy 1.98 and 2.29, respectively. The low-ploidy clone was dominant at the time of the first surgery and became even more dominant upon recurrence. MRI images were available before and after each surgery and registered to MNI space. The S3MB domain was initiated from 4mm³ voxels of the MNI space. T1 post and T2 flair scan acquired after the 1ˢᵗ surgery informed tumor cell densities per voxel. Magnetic Resonance Elastography scans and PET/CT scans informed stiffness and Glucose access per voxel. We performed a parameter search to recapitulate the GBM’s tumor cell density and ploidy composition before the 2ⁿᵈ surgery. Results suggest that the high-ploidy subpopulation had a higher Glucose-dependent proliferation rate (0.70 vs. 0.49), but a lower Glucose-dependent death rate (0.47 vs. 1.42). These differences resulted in spatial differences in the distribution of the two subpopulations. Our results contribute to a better understanding of how genomics and microenvironments interact to shape cell fate decisions and could help pave the way to therapeutic strategies that mimic prognostically favorable environments.Keywords: tumor evolution, intra-tumor heterogeneity, whole-genome doubling, mathematical modeling
Procedia PDF Downloads 72214 Integration of Microarray Data into a Genome-Scale Metabolic Model to Study Flux Distribution after Gene Knockout
Authors: Mona Heydari, Ehsan Motamedian, Seyed Abbas Shojaosadati
Abstract:
Prediction of perturbations after genetic manipulation (especially gene knockout) is one of the important challenges in systems biology. In this paper, a new algorithm is introduced that integrates microarray data into the metabolic model. The algorithm was used to study the change in the cell phenotype after knockout of Gss gene in Escherichia coli BW25113. Algorithm implementation indicated that gene deletion resulted in more activation of the metabolic network. Growth yield was more and less regulating gene were identified for mutant in comparison with the wild-type strain.Keywords: metabolic network, gene knockout, flux balance analysis, microarray data, integration
Procedia PDF Downloads 579213 Inbreeding Study Using Runs of Homozygosity in Nelore Beef Cattle
Authors: Priscila A. Bernardes, Marcos E. Buzanskas, Luciana C. A. Regitano, Ricardo V. Ventura, Danisio P. Munari
Abstract:
The best linear unbiased predictor (BLUP) is a method commonly used in genetic evaluations of breeding programs. However, this approach can lead to higher inbreeding coefficients in the population due to the intensive use of few bulls with higher genetic potential, usually presenting some degree of relatedness. High levels of inbreeding are associated to low genetic viability, fertility, and performance for some economically important traits and therefore, should be constantly monitored. Unreliable pedigree data can also lead to misleading results. Genomic information (i.e., single nucleotide polymorphism – SNP) is a useful tool to estimate the inbreeding coefficient. Runs of homozygosity have been used to evaluate homozygous segments inherited due to direct or collateral inbreeding and allows inferring population selection history. This study aimed to evaluate runs of homozygosity (ROH) and inbreeding in a population of Nelore beef cattle. A total of 814 animals were genotyped with the Illumina BovineHD BeadChip and the quality control was carried out excluding SNPs located in non-autosomal regions, with unknown position, with a p-value in the Hardy-Weinberg equilibrium lower than 10⁻⁵, call rate lower than 0.98 and samples with the call rate lower than 0.90. After the quality control, 809 animals and 509,107 SNPs remained for analyses. For the ROH analysis, PLINK software was used considering segments with at least 50 SNPs with a minimum length of 1Mb in each animal. The inbreeding coefficient was calculated using the ratio between the sum of all ROH sizes and the size of the whole genome (2,548,724kb). A total of 25.711 ROH were observed, presenting mean, median, minimum, and maximum length of 3.34Mb, 2Mb, 1Mb, and 80.8Mb, respectively. The number of SNPs present in ROH segments varied from 50 to 14.954. The longest ROH length was observed in one animal, which presented a length of 634Mb (24.88% of the genome). Four bulls were among the 10 animals with the longest extension of ROH, presenting 11% of ROH with length higher than 10Mb. Segments longer than 10Mb indicate recent inbreeding. Therefore, the results indicate an intensive use of few sires in the studied data. The distribution of ROH along the chromosomes showed that chromosomes 5 and 6 presented a large number of segments when compared to other chromosomes. The mean, median, minimum, and maximum inbreeding coefficients were 5.84%, 5.40%, 0.00%, and 24.88%, respectively. Although the mean inbreeding was considered low, the ROH indicates a recent and intensive use of few sires, which should be avoided for the genetic progress of breed.Keywords: autozygosity, Bos taurus indicus, genomic information, single nucleotide polymorphism
Procedia PDF Downloads 150212 PCR Based DNA Analysis in Detecting P53 Mutation in Human Breast Cancer (MDA-468)
Authors: Debbarma Asis, Guha Chandan
Abstract:
Tumor Protein-53 (P53) is one of the tumor suppressor proteins. P53 regulates the cell cycle that conserves stability by preventing genome mutation. It is named so as it runs as 53-kilodalton (kDa) protein on Polyacrylamide gel electrophoresis although the actual mass is 43.7 kDa. Experimental evidence has indicated that P53 cancer mutants loses tumor suppression activity and subsequently gain oncogenic activities to promote tumourigenesis. Tumor-specific DNA has recently been detected in the plasma of breast cancer patients. Detection of tumor-specific genetic materials in cancer patients may provide a unique and valuable tumor marker for diagnosis and prognosis. Commercially available MDA-468 breast cancer cell line was used for the proposed study.Keywords: tumor protein (P53), cancer mutants, MDA-468, tumor suppressor gene
Procedia PDF Downloads 478211 Association of Nuclear – Mitochondrial Epistasis with BMI in Type 1 Diabetes Mellitus Patients
Authors: Agnieszka H. Ludwig-Slomczynska, Michal T. Seweryn, Przemyslaw Kapusta, Ewelina Pitera, Katarzyna Cyganek, Urszula Mantaj, Lucja Dobrucka, Ewa Wender-Ozegowska, Maciej T. Malecki, Pawel Wolkow
Abstract:
Obesity results from an imbalance between energy intake and its expenditure. Genome-Wide Association Study (GWAS) analyses have led to discovery of only about 100 variants influencing body mass index (BMI), which explain only a small portion of genetic variability. Analysis of gene epistasis gives a chance to discover another part. Since it was shown that interaction and communication between nuclear and mitochondrial genome are indispensable for normal cell function, we have looked for epistatic interactions between the two genomes to find their correlation with BMI. Methods: The analysis was performed on 366 T1DM patients using Illumina Infinium OmniExpressExome-8 chip and followed by imputation on Michigan Imputation Server. Only genes which influence mitochondrial functioning (listed in Human MitoCarta 2.0) were included in the analysis – variants of nuclear origin (MAF > 5%) in 1140 genes and 42 mitochondrial variants (MAF > 1%). Gene expression analysis was performed on GTex data. Association analysis between genetic variants and BMI was performed with the use of Linear Mixed Models as implemented in the package 'GENESIS' in R. Analysis of association between mRNA expression and BMI was performed with the use of linear models and standard significance tests in R. Results: Among variants involved in epistasis between mitochondria and nucleus we have identified one in mitochondrial transcription factor, TFB2M (rs6701836). It interacted with mitochondrial variants localized to MT-RNR1 (p=0.0004, MAF=15%), MT-ND2 (p=0.07, MAF=5%) and MT-ND4 (p=0.01, MAF=1.1%). Analysis of the interaction between nuclear variant rs6701836 (nuc) and rs3021088 localized to MT-ND2 mitochondrial gene (mito) has shown that the combination of the two led to BMI decrease (p=0.024). Each of the variants on its own does not correlate with higher BMI [p(nuc)=0.856, p(mito)=0.116)]. Although rs6701836 is intronic, it influences gene expression in the thyroid (p=0.000037). rs3021088 is a missense variant that leads to alanine to threonine substitution in the MT-ND2 gene which belongs to complex I of the electron transport chain. The analysis of the influence of genetic variants on gene expression has confirmed the trend explained above – the interaction of the two genes leads to BMI decrease (p=0.0308). Each of the mRNAs on its own is associated with higher BMI (p(mito)=0.0244 and p(nuc)=0.0269). Conclusıons: Our results show that nuclear-mitochondrial epistasis can influence BMI in T1DM patients. The correlation between transcription factor expression and mitochondrial genetic variants will be subject to further analysis.Keywords: body mass index, epistasis, mitochondria, type 1 diabetes
Procedia PDF Downloads 174210 A Comprehensive Analysis of LACK (Leishmania Homologue of Receptors for Activated C Kinase) in the Context of Visceral Leishmaniasis
Authors: Sukrat Sinha, Abhay Kumar, Shanthy Sundaram
Abstract:
The Leishmania homologue of activated C kinase (LACK) is known T cell epitope from soluble Leishmania antigens (SLA) that confers protection against Leishmania challenge. This antigen has been found to be highly conserved among Leishmania strains. LACK has been shown to be protective against L. donovani challenge. A comprehensive analysis of several LACK sequences was completed. The analysis shows a high level of conservation, lower variability and higher antigenicity in specific portions of the LACK protein. This information provides insights for the potential consideration of LACK as a putative candidate in the context of visceral Leishmaniasis vaccine target.Keywords: bioinformatics, genome assembly, leishmania activated protein kinase c (lack), next-generation sequencing
Procedia PDF Downloads 338209 In Silico Analysis of Deleterious nsSNPs (Missense) of Dihydrolipoamide Branched-Chain Transacylase E2 Gene Associated with Maple Syrup Urine Disease Type II
Authors: Zainab S. Ahmed, Mohammed S. Ali, Nadia A. Elshiekh, Sami Adam Ibrahim, Ghada M. El-Tayeb, Ahmed H. Elsadig, Rihab A. Omer, Sofia B. Mohamed
Abstract:
Maple syrup urine (MSUD) is an autosomal recessive disease that causes a deficiency in the enzyme branched-chain alpha-keto acid (BCKA) dehydrogenase. The development of disease has been associated with SNPs in the DBT gene. Despite that, the computational analysis of SNPs in coding and noncoding and their functional impacts on protein level still remains unknown. Hence, in this study, we carried out a comprehensive in silico analysis of missense that was predicted to have a harmful influence on DBT structure and function. In this study, eight different in silico prediction algorithms; SIFT, PROVEAN, MutPred, SNP&GO, PhD-SNP, PANTHER, I-Mutant 2.0 and MUpo were used for screening nsSNPs in DBT including. Additionally, to understand the effect of mutations in the strength of the interactions that bind protein together the ELASPIC servers were used. Finally, the 3D structure of DBT was formed using Mutation3D and Chimera servers respectively. Our result showed that a total of 15 nsSNPs confirmed by 4 software (R301C, R376H, W84R, S268F, W84C, F276C, H452R, R178H, I355T, V191G, M444T, T174A, I200T, R113H, and R178C) were found damaging and can lead to a shift in DBT gene structure. Moreover, we found 7 nsSNPs located on the 2-oxoacid_dh catalytic domain, 5 nsSNPs on the E_3 binding domain and 3 nsSNPs on the Biotin Domain. So these nsSNPs may alter the putative structure of DBT’s domain. Furthermore, we detected all these nsSNPs are on the core residues of the protein and have the ability to change the stability of the protein. Additionally, we found W84R, S268F, and M444T have high significance, and they affected Leucine, Isoleucine, and Valine, which reduces or disrupt the function of BCKD complex, E2-subunit which the DBT gene encodes. In conclusion, based on our extensive in-silico analysis, we report 15 nsSNPs that have possible association with protein deteriorating and disease-causing abilities. These candidate SNPs can aid in future studies on Maple Syrup Urine Disease type II base in the genetic level.Keywords: DBT gene, ELASPIC, in silico analysis, UCSF chimer
Procedia PDF Downloads 201208 Physicians’ Knowledge and Perception of Gene Profiling in Malaysia: A Pilot Study
Authors: Farahnaz Amini, Woo Yun Kin, Lazwani Kolandaiveloo
Abstract:
Availability of different genetic tests after completion of Human Genome Project increases the physicians’ responsibility to keep themselves update on the potential implementation of these genetic tests in their daily practice. However, due to numbers of barriers, still many of physicians are not either aware of these tests or are not willing to offer or refer their patients for genetic tests. This study was conducted an anonymous, cross-sectional, mailed-based survey to develop a primary data of Malaysian physicians’ level of knowledge and perception of gene profiling. Questionnaire had 29 questions. Total scores on selected questions were used to assess the level of knowledge. The highest possible score was 11. Descriptive statistics, one way ANOVA and chi-squared test was used for statistical analysis. Sixty three completed questionnaires was returned by 27 general practitioners (GPs) and 36 medical specialists. Responders’ age range from 24 to 55 years old (mean 30.2 ± 6.4). About 40% of the participants rated themselves as having poor level of knowledge in genetics in general whilst 60% believed that they have fair level of knowledge. However, almost half (46%) of the respondents felt that they were not knowledgeable about available genetic tests. A majority (94%) of the responders were not aware of any lab or company which is offering gene profiling services in Malaysia. Only 4% of participants were aware of using gene profiling for detection of dosage of some drugs. Respondents perceived greater utility of gene profiling for breast cancer (38%) compared to the colorectal familial cancer (3%). The score of knowledge ranged from 2 to 8 (mean 4.38 ± 1.67). Non-significant differences between score of knowledge of GPs and specialists were observed, with score of 4.19 and 4.58 respectively. There was no significant association between any demographic factors and level of knowledge. However, those who graduated between years 2001 to 2005 had higher level of knowledge. Overall, 83% of participants showed relatively high level of perception on value of gene profiling to detect patient’s risk of disease. However, low perception was observed for both statements of using gene profiling for general population in order to alter their lifestyle (25%) as well as having the full sequence of a patient genome for the purpose of determining a patient’s best match for treatment (18%). The lack of clinical guidelines, limited provider knowledge and awareness, lack of time and resources to educate patients, lack of evidence-based clinical information and cost of tests were the most barriers of ordering gene profiling mentioned by physicians. In conclusion Malaysian physicians who participate in this study had mediocre level of knowledge and awareness in gene profiling. The low exposure to the genetic questions and problems might be a key predictor of lack of awareness and knowledge on available genetic tests. Educational and training workshop might be useful in helping Malaysian physicians incorporate genetic profiling into practice for eligible patients.Keywords: gene profiling, knowledge, Malaysia, physician
Procedia PDF Downloads 326207 Identifying Protein-Coding and Non-Coding Regions in Transcriptomes
Authors: Angela U. Makolo
Abstract:
Protein-coding and Non-coding regions determine the biology of a sequenced transcriptome. Research advances have shown that Non-coding regions are important in disease progression and clinical diagnosis. Existing bioinformatics tools have been targeted towards Protein-coding regions alone. Therefore, there are challenges associated with gaining biological insights from transcriptome sequence data. These tools are also limited to computationally intensive sequence alignment, which is inadequate and less accurate to identify both Protein-coding and Non-coding regions. Alignment-free techniques can overcome the limitation of identifying both regions. Therefore, this study was designed to develop an efficient sequence alignment-free model for identifying both Protein-coding and Non-coding regions in sequenced transcriptomes. Feature grouping and randomization procedures were applied to the input transcriptomes (37,503 data points). Successive iterations were carried out to compute the gradient vector that converged the developed Protein-coding and Non-coding Region Identifier (PNRI) model to the approximate coefficient vector. The logistic regression algorithm was used with a sigmoid activation function. A parameter vector was estimated for every sample in 37,503 data points in a bid to reduce the generalization error and cost. Maximum Likelihood Estimation (MLE) was used for parameter estimation by taking the log-likelihood of six features and combining them into a summation function. Dynamic thresholding was used to classify the Protein-coding and Non-coding regions, and the Receiver Operating Characteristic (ROC) curve was determined. The generalization performance of PNRI was determined in terms of F1 score, accuracy, sensitivity, and specificity. The average generalization performance of PNRI was determined using a benchmark of multi-species organisms. The generalization error for identifying Protein-coding and Non-coding regions decreased from 0.514 to 0.508 and to 0.378, respectively, after three iterations. The cost (difference between the predicted and the actual outcome) also decreased from 1.446 to 0.842 and to 0.718, respectively, for the first, second and third iterations. The iterations terminated at the 390th epoch, having an error of 0.036 and a cost of 0.316. The computed elements of the parameter vector that maximized the objective function were 0.043, 0.519, 0.715, 0.878, 1.157, and 2.575. The PNRI gave an ROC of 0.97, indicating an improved predictive ability. The PNRI identified both Protein-coding and Non-coding regions with an F1 score of 0.970, accuracy (0.969), sensitivity (0.966), and specificity of 0.973. Using 13 non-human multi-species model organisms, the average generalization performance of the traditional method was 74.4%, while that of the developed model was 85.2%, thereby making the developed model better in the identification of Protein-coding and Non-coding regions in transcriptomes. The developed Protein-coding and Non-coding region identifier model efficiently identified the Protein-coding and Non-coding transcriptomic regions. It could be used in genome annotation and in the analysis of transcriptomes.Keywords: sequence alignment-free model, dynamic thresholding classification, input randomization, genome annotation
Procedia PDF Downloads 68206 Cloning and Characterization of UDP-Glucose Pyrophosphorylases from Lactobacillus kefiranofaciens and Rhodococcus wratislaviensis
Authors: Mesfin Angaw Tesfay
Abstract:
Uridine-5’-diphosphate (UDP)-glucose is one of the most versatile building blocks within the metabolism of prokaryotes and eukaryotes, serving as an activated sugar donor during the glycosylation of natural products. It is formed by the enzyme UDP-glucose pyrophosphorylase (UGPase) using uridine-5′-triphosphate (UTP) and α-d-glucose 1-phosphate as a substrate. Herein, two UGPase genes from Lactobacillus kefiranofaciens ZW3 (LkUGPase) and Rhodococcus wratislaviensis IFP 2016 (RwUGPase) were identified through genome mining approaches. The LkUGPase and RwUGPase have 299 and 306 amino acids, respectively. Both UGPase has the conserved UTP binding site (G-X-G-T-R-X-L-P) and the glucose -1-phosphate binding site (V-E-K-P). The LkUGPase and RwUGPase were cloned in E. coli, and SDS-PAGE analysis showed the expression of both enzymes forming about 36 KDa of protein band after induction. LkUGPase and RwUGPase have an activity of 1549.95 and 671.53 U/mg, respectively. Currently, their kinetic properties are under investigation.Keywords: UGPase, LkUGPase, RwUGPase, UDP-glucose, glycosylation
Procedia PDF Downloads 23205 Computational Investigation on Structural and Functional Impact of Oncogenes and Tumor Suppressor Genes on Cancer
Authors: Abdoulie K. Ceesay
Abstract:
Within the sequence of the whole genome, it is known that 99.9% of the human genome is similar, whilst our difference lies in just 0.1%. Among these minor dissimilarities, the most common type of genetic variations that occurs in a population is SNP, which arises due to nucleotide substitution in a protein sequence that leads to protein destabilization, alteration in dynamics, and other physio-chemical properties’ distortions. While causing variations, they are equally responsible for our difference in the way we respond to a treatment or a disease, including various cancer types. There are two types of SNPs; synonymous single nucleotide polymorphism (sSNP) and non-synonymous single nucleotide polymorphism (nsSNP). sSNP occur in the gene coding region without causing a change in the encoded amino acid, while nsSNP is deleterious due to its replacement of a nucleotide residue in the gene sequence that results in a change in the encoded amino acid. Predicting the effects of cancer related nsSNPs on protein stability, function, and dynamics is important due to the significance of phenotype-genotype association of cancer. In this thesis, Data of 5 oncogenes (ONGs) (AKT1, ALK, ERBB2, KRAS, BRAF) and 5 tumor suppressor genes (TSGs) (ESR1, CASP8, TET2, PALB2, PTEN) were retrieved from ClinVar. Five common in silico tools; Polyphen, Provean, Mutation Assessor, Suspect, and FATHMM, were used to predict and categorize nsSNPs as deleterious, benign, or neutral. To understand the impact of each variation on the phenotype, Maestro, PremPS, Cupsat, and mCSM-NA in silico structural prediction tools were used. This study comprises of in-depth analysis of 10 cancer gene variants downloaded from Clinvar. Various analysis of the genes was conducted to derive a meaningful conclusion from the data. Research done indicated that pathogenic variants are more common among ONGs. Our research also shows that pathogenic and destabilizing variants are more common among ONGs than TSGs. Moreover, our data indicated that ALK(409) and BRAF(86) has higher benign count among ONGs; whilst among TSGs, PALB2(1308) and PTEN(318) genes have higher benign counts. Looking at the individual cancer genes predisposition or frequencies of causing cancer according to our research data, KRAS(76%), BRAF(55%), and ERBB2(36%) among ONGs; and PTEN(29%) and ESR1(17%) among TSGs have higher tendencies of causing cancer. Obtained results can shed light to the future research in order to pave new frontiers in cancer therapies.Keywords: tumor suppressor genes (TSGs), oncogenes (ONGs), non synonymous single nucleotide polymorphism (nsSNP), single nucleotide polymorphism (SNP)
Procedia PDF Downloads 86204 Cloning and Characterization of Uridine-5’-Diphosphate -Glucose Pyrophosphorylases from Lactobacillus Kefiranofaciens and Rhodococcus Wratislaviensis
Authors: Mesfin Angaw Tesfay
Abstract:
Uridine-5’-diphosphate (UDP)-glucose is one of the most versatile building blocks within the metabolism of prokaryotes and eukaryotes serving as an activated sugar donor during the glycosylation of natural products. It is formed by the enzyme UDP-glucose pyrophosphorylase (UGPase) using uridine-5′-triphosphate (UTP) and α-d-glucose 1-phosphate as a substrate. Herein two UGPase genes from Lactobacillus kefiranofaciens ZW3 (LkUGPase) and Rhodococcus wratislaviensis IFP 2016 (RwUGPase) were identified through genome mining approaches. The LkUGPase and RwUGPase have 299 and 306 amino acids, respectively. Both UGPase has the conserved UTP binding site (G-X-G-T-R-X-L-P) and the glucose -1-phosphate binding site (V-E-K-P). The LkUGPase and RwUGPase were cloned in E. coli and SDS-PAGE analysis showed the expression of both enzymes forming about 36 KDa of protein band after induction. LkUGPase and RwUGPase have an activity of 1549.95 and 671.53 U/mg respectively. Currently, their kinetic properties are under investigation.Keywords: UGPase, LkUGPase, RwUGPase, UDP-glucose, Glycosylation
Procedia PDF Downloads 19203 From Primer Generation to Chromosome Identification: A Primer Generation Genotyping Method for Bacterial Identification and Typing
Authors: Wisam H. Benamer, Ehab A. Elfallah, Mohamed A. Elshaari, Farag A. Elshaari
Abstract:
A challenge for laboratories is to provide bacterial identification and antibiotic sensitivity results within a short time. Hence, advancement in the required technology is desirable to improve timing, accuracy and quality. Even with the current advances in methods used for both phenotypic and genotypic identification of bacteria the need is there to develop method(s) that enhance the outcome of bacteriology laboratories in accuracy and time. The hypothesis introduced here is based on the assumption that the chromosome of any bacteria contains unique sequences that can be used for its identification and typing. The outcome of a pilot study designed to test this hypothesis is reported in this manuscript. Methods: The complete chromosome sequences of several bacterial species were downloaded to use as search targets for unique sequences. Visual basic and SQL server (2014) were used to generate a complete set of 18-base long primers, a process started with reverse translation of randomly chosen 6 amino acids to limit the number of the generated primers. In addition, the software used to scan the downloaded chromosomes using the generated primers for similarities was designed, and the resulting hits were classified according to the number of similar chromosomal sequences, i.e., unique or otherwise. Results: All primers that had identical/similar sequences in the selected genome sequence(s) were classified according to the number of hits in the chromosomes search. Those that were identical to a single site on a single bacterial chromosome were referred to as unique. On the other hand, most generated primers sequences were identical to multiple sites on a single or multiple chromosomes. Following scanning, the generated primers were classified based on ability to differentiate between medically important bacterial and the initial results looks promising. Conclusion: A simple strategy that started by generating primers was introduced; the primers were used to screen bacterial genomes for match. Primer(s) that were uniquely identical to specific DNA sequence on a specific bacterial chromosome were selected. The identified unique sequence can be used in different molecular diagnostic techniques, possibly to identify bacteria. In addition, a single primer that can identify multiple sites in a single chromosome can be exploited for region or genome identification. Although genomes sequences draft of isolates of organism DNA enable high throughput primer design using alignment strategy, and this enhances diagnostic performance in comparison to traditional molecular assays. In this method the generated primers can be used to identify an organism before the draft sequence is completed. In addition, the generated primers can be used to build a bank for easy access of the primers that can be used to identify bacteria.Keywords: bacteria chromosome, bacterial identification, sequence, primer generation
Procedia PDF Downloads 192202 Fat-Tail Test of Regulatory DNA Sequences
Authors: Jian-Jun Shu
Abstract:
The statistical properties of CRMs are explored by estimating similar-word set occurrence distribution. It is observed that CRMs tend to have a fat-tail distribution for similar-word set occurrence. Thus, the fat-tail test with two fatness coefficients is proposed to distinguish CRMs from non-CRMs, especially from exons. For the first fatness coefficient, the separation accuracy between CRMs and exons is increased as compared with the existing content-based CRM prediction method – fluffy-tail test. For the second fatness coefficient, the computing time is reduced as compared with fluffy-tail test, making it very suitable for long sequences and large data-base analysis in the post-genome time. Moreover, these indexes may be used to predict the CRMs which have not yet been observed experimentally. This can serve as a valuable filtering process for experiment.Keywords: statistical approach, transcription factor binding sites, cis-regulatory modules, DNA sequences
Procedia PDF Downloads 289201 Blackcurrant-Associated Rhabdovirus: New Pathogen for Blackcurrants in the Baltic Sea Region
Authors: Gunta Resevica, Nikita Zrelovs, Ivars Silamikelis, Ieva Kalnciema, Helvijs Niedra, Gunārs Lācis, Toms Bartulsons, Inga Moročko-Bičevska, Arturs Stalažs, Kristīne Drevinska, Andris Zeltins, Ina Balke
Abstract:
Newly discovered viruses provide novel knowledge for basic phytovirus research, serve as tools for biotechnology and can be helpful in identification of epidemic outbreaks. Blackcurrant-associated rhabdovirus (BCaRV) have been discovered in USA germplasm collection samples from Russia and France. As it was reported in one accession originating from France it is unclear whether the material was already infected when it entered in the USA or it became infected while in collection in the USA. Due to that BCaRV was definite as non-EU viruses. According to ICTV classification BCaRV is representative of Blackcurrant betanucleorhabdovirus specie in genus Betanucleorhabdovirus (family Rhabdoviridae). Nevertheless, BCaRV impact on the host, transmission mechanisms and vectors are still unknown. In RNA-seq data pool from Ribes plants resistance gene study by high throughput sequencing (HTS) we observed differences between sample group gene transcript heat maps. Additional analysis of the whole data pool (total 393660492 of 150 bp long read pairs) by rnaSPAdes v 3.13.1 resulted into 14424 bases long contig with an average coverage of 684x with shared 99.5% identity to the previously reported first complete genome of BCaRV (MF543022.1) using EMBOSS Needle. This finding proved BCaRV presence in EU and indicated that it might be relevant pathogen. In this study leaf tissue from twelve asymptomatic blackcurrant cv. Mara Eglite plants (negatively tested for blackcurrant reversion virus (BRV)) from Dobele, Latvia (56°36'31.9"N, 23°18'13.6"E) was collected and used for total RNA isolation with RNeasy Plant Mini Kit with minor modifications, followed by plant rRNA removal by a RiboMinus Plant Kit for RNA-Seq. HTS libraries were prepared using MGI Easy RNA Directional Library Prep Set for 16 reactions to obtain 150 bp pair-end reads. Libraries were pooled, circularized and cleaned and sequenced on DNBSEQ-G400 using PE150 flow cell. Additionally, all samples were tested by RT-PCR, and amplicons were directly sequenced by Sanger-based method. The contig representing the genome of BCaRV isolate Mara Eglite was deposited at European Nucleotide Archive under accession number OU015520. Those findings indicate a second evidence on the presence of this particular virus in the EU and further research on BCaRV prevalence in Ribes from other geographical areas should be performed. As there are no information on BCaRV impact on the host this should be investigated, regarding the fact that mixed infections with BRV and nucleorhabdoviruses are reported.Keywords: BCaRV, Betanucleorhabdovirus, Ribes, RNA-seq
Procedia PDF Downloads 184200 Molecular Epidemiologic Distribution of HDV Genotypes among Different Ethnic Groups in Iran: A Systematic Review
Authors: Khabat Barkhordari
Abstract:
Hepatitis delta virus (HDV) is a RNA virus that needs the function of hepatitis B virus (HBV) for its propagation and assembly. Infection by HDV can occur spontaneously with HBV infection and cause acute hepatitis or develop as secondary infection in HBV suffering patients. Based on genome sequence analysis, HDV has several genotypes which show broad geographic and diverse clinical features. The aim of current study is determine the molecular epidemiology of hepatitis delta virus genotype in patients with positive HBsAg among different ethnic groups of Iran. This systematic review study reviews the results of different studies which examined 2000 Iranian patients with HBV infection from 2010 to 2015. Among 2000 patients in this study, 16.75 % were containing anti-HDV antibody and HDV RNA was found in just 1.75% cases. All of positive cases also have genotype I.Keywords: HDV, genotype, epidemiology, distribution
Procedia PDF Downloads 275199 Novel Coprocessor for DNA Sequence Alignment in Resequencing Applications
Authors: Atef Ibrahim, Hamed Elsimary, Abdullah Aljumah, Fayez Gebali
Abstract:
This paper presents a novel semi-systolic array architecture for an optimized parallel sequence alignment algorithm. This architecture has the advantage that it can be modified to be reused for multiple pass processing in order to increase the number of processing elements that can be packed into a single FPGA and to increase the number of sequences that can be aligned in parallel in a single FPGA. This resolves the potential problem of many FPGA resources left unused for designs that have large values of short read length. When using the previously published conventional hardware design. FPGA implementation results show that, for large values of short read lengths (M>128), the proposed design has a slightly higher speed up and FPGA utilization over the the conventional one.Keywords: bioinformatics, genome sequence alignment, re-sequencing applications, systolic array
Procedia PDF Downloads 531198 Improving the Biocontrol of the Argentine Stem Weevil; Using the Parasitic Wasp Microctonus hyperodae
Authors: John G. Skelly, Peter K. Dearden, Thomas W. R. Harrop, Sarah N. Inwood, Joseph Guhlin
Abstract:
The Argentine stem weevil (ASW; L. bonariensis) is an economically important pasture pest in New Zealand, which causes about $200 million of damage per annum. Microctonus hyperodae (Mh), a parasite of the ASW in its natural range in South America, was introduced into New Zealand to curb the pasture damage caused by the ASW. Mh is an endoparasitic wasp that lays its eggs in the ASW halting its reproduction. Mh was initially successful at preventing ASW proliferation and reducing pasture damage. The effectiveness of Mh has since declined due to decreased parasitism rates and has resulted in increased pasture damage. Although the mechanism through which ASW has developed resistance to Mh has not been discovered, it has been proposed to be due to the different reproductive modes used by Mh and the ASW in New Zealand. The ASW reproduces sexually, whereas Mh reproduces asexually, which has been hypothesised to have allowed the ASW to ‘out evolve’ Mh. Other species within the Microctonus genus reproduce both sexually and asexually. Strains of Microctonus aethiopoides (Ma), a species closely related to Mh, reproduce either by sexual or asexual reproduction. Comparing the genomes of sexual and asexual Microctonus may allow for the identification of the mechanism of asexual reproduction and other characteristics that may improve Mh as a biocontrol agent. The genomes of Mh and three strains of Ma, two of which reproduce sexually and one reproduces asexually, have been sequenced and annotated. The French (MaFR) and Moroccan (MaMO) reproduce sexually, whereas the Irish strain (MaIR) reproduces asexually. Like Mh, The Ma strains are also used as biocontrol agents, but for different weevil species. The genomes of Mh and MaIR were subsequently upgraded using Hi-C, resulting in a set of high quality, highly contiguous genomes. A subset of the genes involved in mitosis and meiosis, which have been identified though the use of Hidden Markov Models generated from genes involved in these processes in other Hymenoptera, have been catalogued in Mh and the strains of Ma. Meiosis and mitosis genes were broadly conserved in both sexual and asexual Microctonus species. This implies that either the asexual species have retained a subset of the molecular components required for sexual reproduction or that the molecular mechanisms of mitosis and meiosis are different or differently regulated in Microctonus to other insect species in which these mechanisms are more broadly characterised. Bioinformatic analysis of the chemoreceptor compliment in Microctonus has revealed some variation in the number of olfactory receptors, which may be related to host preference. Phylogenetic analysis of olfactory receptors highlights variation, which may be able to explain different host range preferences in the Microctonus. Hi-C clustering implies that Mh has 12 chromosomes, and MaIR has 8. Hence there may be variation in gene regulation between species. Genome alignment of Mh and MaIR implies that there may be large scale genome structural variation. Greater insight into the genetics of these agriculturally important group of parasitic wasps may be beneficial in restoring or maintaining their biocontrol efficacy.Keywords: argentine stem weevil, asexual, genomics, Microctonus hyperodae
Procedia PDF Downloads 156