Search results for: genomic data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24334

Search results for: genomic data

24304 Single Cell and Spatial Transcriptomics: A Beginners Viewpoint from the Conceptual Pipeline

Authors: Leo Nnamdi Ozurumba-Dwight

Abstract:

Messenger ribooxynucleic acid (mRNA) molecules are compositional, protein-based. These proteins, encoding mRNA molecules (which collectively connote the transcriptome), when analyzed by RNA sequencing (RNAseq), unveils the nature of gene expression in the RNA. The obtained gene expression provides clues of cellular traits and their dynamics in presentations. These can be studied in relation to function and responses. RNAseq is a practical concept in Genomics as it enables detection and quantitative analysis of mRNA molecules. Single cell and spatial transcriptomics both present varying avenues for expositions in genomic characteristics of single cells and pooled cells in disease conditions such as cancer, auto-immune diseases, hematopoietic based diseases, among others, from investigated biological tissue samples. Single cell transcriptomics helps conduct a direct assessment of each building unit of tissues (the cell) during diagnosis and molecular gene expressional studies. A typical technique to achieve this is through the use of a single-cell RNA sequencer (scRNAseq), which helps in conducting high throughput genomic expressional studies. However, this technique generates expressional gene data for several cells which lack presentations on the cells’ positional coordinates within the tissue. As science is developmental, the use of complimentary pre-established tissue reference maps using molecular and bioinformatics techniques has innovatively sprung-forth and is now used to resolve this set back to produce both levels of data in one shot of scRNAseq analysis. This is an emerging conceptual approach in methodology for integrative and progressively dependable transcriptomics analysis. This can support in-situ fashioned analysis for better understanding of tissue functional organization, unveil new biomarkers for early-stage detection of diseases, biomarkers for therapeutic targets in drug development, and exposit nature of cell-to-cell interactions. Also, these are vital genomic signatures and characterizations of clinical applications. Over the past decades, RNAseq has generated a wide array of information that is igniting bespoke breakthroughs and innovations in Biomedicine. On the other side, spatial transcriptomics is tissue level based and utilized to study biological specimens having heterogeneous features. It exposits the gross identity of investigated mammalian tissues, which can then be used to study cell differentiation, track cell line trajectory patterns and behavior, and regulatory homeostasis in disease states. Also, it requires referenced positional analysis to make up of genomic signatures that will be sassed from the single cells in the tissue sample. Given these two presented approaches to RNA transcriptomics study in varying quantities of cell lines, with avenues for appropriate resolutions, both approaches have made the study of gene expression from mRNA molecules interesting, progressive, developmental, and helping to tackle health challenges head-on.

Keywords: transcriptomics, RNA sequencing, single cell, spatial, gene expression.

Procedia PDF Downloads 97
24303 Copy Number Variants in Children with Non-Syndromic Congenital Heart Diseases from Mexico

Authors: Maria Lopez-Ibarra, Ana Velazquez-Wong, Lucelli Yañez-Gutierrez, Maria Araujo-Solis, Fabio Salamanca-Gomez, Alfonso Mendez-Tenorio, Haydeé Rosas-Vargas

Abstract:

Congenital heart diseases (CHD) are the most common congenital abnormalities. These conditions can occur as both an element of distinct chromosomal malformation syndromes or as non-syndromic forms. Their etiology is not fully understood. Genetic variants such copy number variants have been associated with CHD. The aim of our study was to analyze these genomic variants in peripheral blood from Mexican children diagnosed with non-syndromic CHD. We included 16 children with atrial and ventricular septal defects and 5 healthy subjects without heart malformations as controls. To exclude the most common heart disease-associated syndrome alteration, we performed a fluorescence in situ hybridization test to identify the 22q11.2, responsible for congenital heart abnormalities associated with Di-George Syndrome. Then, a microarray based comparative genomic hybridization was used to identify global copy number variants. The identification of copy number variants resulted from the comparison and analysis between our results and data from main genetic variation databases. We identified copy number variants gain in three chromosomes regions from pediatric patients, 4q13.2 (31.25%), 9q34.3 (25%) and 20q13.33 (50%), where several genes associated with cellular, biosynthetic, and metabolic processes are located, UGT2B15, UGT2B17, SNAPC4, SDCCAG3, PMPCA, INPP6E, C9orf163, NOTCH1, C20orf166, and SLCO4A1. In addition, after a hierarchical cluster analysis based on the fluorescence intensity ratios from the comparative genomic hybridization, two congenital heart disease groups were generated corresponding to children with atrial or ventricular septal defects. Further analysis with a larger sample size is needed to corroborate these copy number variants as possible biomarkers to differentiate between heart abnormalities. Interestingly, the 20q13.33 gain was present in 50% of children with these CHD which could suggest that alterations in both coding and non-coding elements within this chromosomal region may play an important role in distinct heart conditions.

Keywords: aCGH, bioinformatics, congenital heart diseases, copy number variants, fluorescence in situ hybridization

Procedia PDF Downloads 261
24302 Breeding Cotton for Annual Growth Habit: Remobilizing End-of-season Perennial Reserves for Increased Yield

Authors: Salman Naveed, Nitant Gandhi, Grant Billings, Zachary Jones, B. Todd Campbell, Michael Jones, Sachin Rustgi

Abstract:

Cotton (Gossypium spp.) is the primary source of natural fiber in the U.S. and a major crop in the Southeastern U.S. Despite constant efforts to increase the cotton fiber yield, the yield gain has stagnated. Therefore, we undertook a novel approach to improve the cotton fiber yield by altering its growth habit from perennial to annual. In this effort, we identified genotypes with high-expression alleles of five floral induction and meristem identity genes (FT, SOC1, FUL, LFY, and AP1) from an upland cotton mini-core collection and crossed them in various combinations to develop cotton lines with annual growth habit, optimal flowering time and enhanced productivity. To facilitate the characterization of genotypes with the desired combinations of stacked alleles, we identified markers associated with the gene expression traits via genome-wide association analysis using a 63K SNP Array (Hulse-Kemp et al. 2015 G3 5:1187). Over 14,500 SNPs showed polymorphism and were used for association analysis. A total of 396 markers showed association with expression traits. Out of these 396 markers, 159 mapped to genes, 50 to untranslated regions, and 187 to random genomic regions. Biased genomic distribution of associated markers was observed where more trait-associated markers mapped to the cotton D sub-genome. Many quantitative trait loci coincided at specific genomic regions. This observation has implications as these traits could be bred together. The analysis also allowed the identification of candidate regulators of the expression patterns of these floral induction and meristem identity genes whose functions will be validated via virus-induced gene silencing.

Keywords: cotton, GWAS, QTL, expression traits

Procedia PDF Downloads 124
24301 Genome Sequencing, Assembly and Annotation of Gelidium Pristoides from Kenton-on-Sea, South Africa

Authors: Sandisiwe Mangali, Graeme Bradley

Abstract:

Genome is complete set of the organism's hereditary information encoded as either deoxyribonucleic acid or ribonucleic acid in most viruses. The three different types of genomes are nuclear, mitochondrial and the plastid genome and their sequences which are uncovered by genome sequencing are known as an archive for all genetic information and enable researchers to understand the composition of a genome, regulation of gene expression and also provide information on how the whole genome works. These sequences enable researchers to explore the population structure, genetic variations, and recent demographic events in threatened species. Particularly, genome sequencing refers to a process of figuring out the exact arrangement of the basic nucleotide bases of a genome and the process through which all the afore-mentioned genomes are sequenced is referred to as whole or complete genome sequencing. Gelidium pristoides is South African endemic Rhodophyta species which has been harvested in the Eastern Cape since the 1950s for its high economic value which is one motivation for its sequencing. Its endemism further motivates its sequencing for conservation biology as endemic species are more vulnerable to anthropogenic activities endangering a species. As sequencing, mapping and annotating the Gelidium pristoides genome is the aim of this study. To accomplish this aim, the genomic DNA was extracted and quantified using the Nucleospin Plank Kit, Qubit 2.0 and Nanodrop. Thereafter, the Ion Plus Fragment Library was used for preparation of a 600bp library which was then sequenced through the Ion S5 sequencing platform for two runs. The produced reads were then quality-controlled and assembled through the SPAdes assembler with default parameters and the genome assembly was quality assessed through the QUAST software. From this assembly, the plastid and the mitochondrial genomes were then sampled out using Gelidiales organellar genomes as search queries and ordered according to them using the Geneious software. The Qubit and the Nanodrop instruments revealed an A260/A280 and A230/A260 values of 1.81 and 1.52 respectively. A total of 30792074 reads were obtained and produced a total of 94140 contigs with resulted into a sequence length of 217.06 Mbp with N50 value of 3072 bp and GC content of 41.72%. A total length of 179281bp and 25734 bp was obtained for plastid and mitochondrial respectively. Genomic data allows a clear understanding of the genomic constituent of an organism and is valuable as foundation information for studies of individual genes and resolving the evolutionary relationships between organisms including Rhodophytes and other seaweeds.

Keywords: Gelidium pristoides, genome, genome sequencing and assembly, Ion S5 sequencing platform

Procedia PDF Downloads 123
24300 An Intelligent Search and Retrieval System for Mining Clinical Data Repositories Based on Computational Imaging Markers and Genomic Expression Signatures for Investigative Research and Decision Support

Authors: David J. Foran, Nhan Do, Samuel Ajjarapu, Wenjin Chen, Tahsin Kurc, Joel H. Saltz

Abstract:

The large-scale data and computational requirements of investigators throughout the clinical and research communities demand an informatics infrastructure that supports both existing and new investigative and translational projects in a robust, secure environment. In some subspecialties of medicine and research, the capacity to generate data has outpaced the methods and technology used to aggregate, organize, access, and reliably retrieve this information. Leading health care centers now recognize the utility of establishing an enterprise-wide, clinical data warehouse. The primary benefits that can be realized through such efforts include cost savings, efficient tracking of outcomes, advanced clinical decision support, improved prognostic accuracy, and more reliable clinical trials matching. The overarching objective of the work presented here is the development and implementation of a flexible Intelligent Retrieval and Interrogation System (IRIS) that exploits the combined use of computational imaging, genomics, and data-mining capabilities to facilitate clinical assessments and translational research in oncology. The proposed System includes a multi-modal, Clinical & Research Data Warehouse (CRDW) that is tightly integrated with a suite of computational and machine-learning tools to provide insight into the underlying tumor characteristics that are not be apparent by human inspection alone. A key distinguishing feature of the System is a configurable Extract, Transform and Load (ETL) interface that enables it to adapt to different clinical and research data environments. This project is motivated by the growing emphasis on establishing Learning Health Systems in which cyclical hypothesis generation and evidence evaluation become integral to improving the quality of patient care. To facilitate iterative prototyping and optimization of the algorithms and workflows for the System, the team has already implemented a fully functional Warehouse that can reliably aggregate information originating from multiple data sources including EHR’s, Clinical Trial Management Systems, Tumor Registries, Biospecimen Repositories, Radiology PAC systems, Digital Pathology archives, Unstructured Clinical Documents, and Next Generation Sequencing services. The System enables physicians to systematically mine and review the molecular, genomic, image-based, and correlated clinical information about patient tumors individually or as part of large cohorts to identify patterns that may influence treatment decisions and outcomes. The CRDW core system has facilitated peer-reviewed publications and funded projects, including an NIH-sponsored collaboration to enhance the cancer registries in Georgia, Kentucky, New Jersey, and New York, with machine-learning based classifications and quantitative pathomics, feature sets. The CRDW has also resulted in a collaboration with the Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC) at the U.S. Department of Veterans Affairs to develop algorithms and workflows to automate the analysis of lung adenocarcinoma. Those studies showed that combining computational nuclear signatures with traditional WHO criteria through the use of deep convolutional neural networks (CNNs) led to improved discrimination among tumor growth patterns. The team has also leveraged the Warehouse to support studies to investigate the potential of utilizing a combination of genomic and computational imaging signatures to characterize prostate cancer. The results of those studies show that integrating image biomarkers with genomic pathway scores is more strongly correlated with disease recurrence than using standard clinical markers.

Keywords: clinical data warehouse, decision support, data-mining, intelligent databases, machine-learning.

Procedia PDF Downloads 89
24299 Non-Mammalian Pattern Recognition Receptor from Rock Bream (Oplegnathus fasciatus): Genomic Characterization and Transcriptional Profile upon Bacterial and Viral Inductions

Authors: Thanthrige Thiunuwan Priyathilaka, Don Anushka Sandaruwan Elvitigala, Bong-Soo Lim, Hyung-Bok Jeong, Jehee Lee

Abstract:

Toll like receptors (TLRs) are a phylogeneticaly conserved family of pattern recognition receptors, which participates in the host immune responses against various pathogens and pathogen derived mitogen. TLR21, a non-mammalian type, is almost restricted to the fish species even though those can be identified rarely in avians and amphibians. Herein, this study was carried out to identify and characterize TLR21 from rock bream (Oplegnathus fasciatus) designated as RbTLR21, at transcriptional and genomic level. In this study, the full length cDNA and genomic sequence of RbTLR21 was identified using previously constructed cDNA sequence database and BAC library, respectively. Identified RbTLR21 sequence was characterized using several bioinformatics tools. The quantitative real time PCR (qPCR) experiment was conducted to determine tissue specific expressional distribution of RbTLR21. Further, transcriptional modulation of RbTLR21 upon the stimulation with Streptococcus iniae (S. iniae), rock bream iridovirus (RBIV) and Edwardsiella tarda (E. tarda) was analyzed in spleen tissues. The complete coding sequence of RbTLR21 was 2919 bp in length which can encode a protein consisting of 973 amino acid residues with molecular mass of 112 kDa and theoretical isoelectric point of 8.6. The anticipated protein sequence resembled a typical TLR domain architecture including C-terminal ectodomain with 16 leucine rich repeats, a transmembrane domain, cytoplasmic TIR domain and signal peptide with 23 amino acid residues. Moreover, protein folding pattern prediction of RbTLR21 exhibited well-structured and folded ectodomain, transmembrane domain and cytoplasmc TIR domain. According to the pair wise sequence analysis data, RbTLR21 showed closest homology with orange-spotted grouper (Epinephelus coioides) TLR21with 76.9% amino acid identity. Furthermore, our phylogenetic analysis revealed that RbTLR21 shows a close evolutionary relationship with its ortholog from Danio rerio. Genomic structure of RbTLR21 consisted of single exon similar to its ortholog of zebra fish. Sevaral putative transcription factor binding sites were also identified in 5ʹ flanking region of RbTLR21. The RBTLR 21 was ubiquitously expressed in all the tissues we tested. Relatively, high expression levels were found in spleen, liver and blood tissues. Upon induction with rock bream iridovirus, RbTLR21 expression was upregulated at the early phase of post induction period even though RbTLR21 expression level was fluctuated at the latter phase of post induction period. Post Edwardsiella tarda injection, RbTLR transcripts were upregulated throughout the experiment. Similarly, Streptococcus iniae induction exhibited significant upregulations of RbTLR21 mRNA expression in the spleen tissues. Collectively, our findings suggest that RbTLR21 is indeed a homolog of TLR21 family members and RbTLR21 may be involved in host immune responses against bacterial and DNA viral infections.

Keywords: rock bream, toll like receptor 21 (TLR21), pattern recognition receptor, genomic characterization

Procedia PDF Downloads 519
24298 Mobile Genetic Elements in Trematode Himasthla Elongata Clonal Polymorphism

Authors: Anna Solovyeva, Ivan Levakin, Nickolai Galaktionov, Olga Podgornaya

Abstract:

Animals that reproduce asexually were thought to have the same genotypes within generations for a long time. However, some refuting examples were found, and mobile genetic elements (MGEs) or transposons are considered to be the most probable source of genetic instability. Dispersed nature and the ability to change their genomic localization enables MGEs to be efficient mutators. Hence the study of MGEs genomic impact requires an appropriate object which comprehends both representative amounts of various MGEs and options to evaluate the genomic influence of MGEs. Animals that reproduce asexually seem to be a decent model to study MGEs impact in genomic variability. We found a small marine trematode Himasthla elongata (Himasthlidae) to be a good model for such investigation as it has a small genome size, diverse MGEs and parthenogenetic stages in the lifecycle. In the current work, clonal diversity of cercaria was traced with an AFLP (Amplified fragment length polymorphism) method, diverse zones from electrophoretic patterns were cloned, and the nature of the fragments explored. Polymorphic patterns of individual cercariae AFLP-based fingerprints are enriched with retrotransposons of different families. The bulk of those sequences are represented by open reading frames of non-Long Terminal Repeats containing elements(non-LTR) yet Long-Terminal Repeats containing elements (LTR), to a lesser extent in variable figments of AFLP array. The CR1 elements expose both in polymorphic and conservative patterns are remarkably more frequent than the other non-LTR retrotransposons. This data was confirmed with shotgun sequencing-based on Illumina HiSeq 2500 platform. Individual cercaria of the same clone (i.e., originated from a single miracidium and inhabiting one host) has a various distribution of MGE families detected in sequenced AFLP patterns. The most numerous are CR1 and RTE-Bov retrotransposons, typical for trematode genomes. Also, we identified LTR-retrotransposons of Pao and Gypsy families among DNA transposons of CMC-EnSpm, Tc1/Mariner, MuLE-MuDR and Merlin families. We detected many of them in H. elongata transcriptome. Such uneven MGEs distribution in AFLP sequences’ sets reflects the different patterns of transposons spreading in cercarial genomes as transposons affect the genome in many ways (ectopic recombination, gene structure interruption, epigenetic silencing). It is considered that they play a key role in the origins of trematode clonal polymorphism. The authors greatly appreciate the help received at the Kartesh White Sea Biological Station of the Russian Academy of Sciences Zoological Institute. This work is funded with RSF 19-74-20102 and RFBR 17-04-02161 grants and the research program of the Zoological Institute of the Russian Academy of Sciences (project number AAAA-A19-119020690109-2).

Keywords: AFLP, clonal polymorphism, Himasthla elongata, mobile genetic elements, NGS

Procedia PDF Downloads 96
24297 Applying EzRAD Method for SNPs Discovery in Population Genetics of Freshwater and Marine Fish in the South of Vietnam

Authors: Quyen Vu Dang Ha, Oanh Truong Thi, Thuoc Tran Linh, Kent Carpenter, Thinh Doan Vu, Binh Dang Thuy

Abstract:

Enzyme restriction site associated DNA (EzRAD) has recently emerged as a promising genomic approach for exploring fish genetic diversity on a genome-wide scale. This is a simplified method for genomic genotyping in non-model organisms and applied for SNPs discovery in the population genetics of freshwater and marine fish in the South of Vietnam. The observations of regional-scale differentiation of commercial freshwater fish (smallscale croakers Boesemania microlepis) and marine fish (emperor Lethrinus lentjan) are clarified. Samples were collected along Hau River and coastal area in the south and center Vietnam. 52 DNA samples from Tra Vinh, An Giang Province for Boesemania microlepis and 34 DNA samples of Lethrinus lentjan from Phu Quoc, Nha Trang, Da Nang Province were used to prepare EzRAD libraries from genomic DNA digested with MboI and Sau3AI. A pooled sample of regional EzRAD libraries was sequenced using the HiSeq 2500 Illumina platform. For Boesemania microlepis, the small scale population different from upstream to downstream of Hau river were detected, An Giang population exhibited less genetic diversity (SNPs per individual from 14 to 926), in comparison to Tra Vinh population (from 11 to 2172). For Lethrinus lentjan, the result showed the minor difference between populations in the Northern and the Southern Mekong River. The numbers of contigs and SNPs vary from 1315 to 2455 and from 7122 to 8594, respectively (P ≤ 0.01). The current preliminary study reveals regional scale population disconnection probably reflecting environmental changing. Additional sampling and EzRad libraries need to be implemented for resource management in the Mekong Delta.

Keywords: Boesemania microlepis, EzRAD, Lethrinus lentjan, SNPs

Procedia PDF Downloads 468
24296 Integrative Omics-Portrayal Disentangles Molecular Heterogeneity and Progression Mechanisms of Cancer

Authors: Binder Hans

Abstract:

Cancer is no longer seen as solely a genetic disease where genetic defects such as mutations and copy number variations affect gene regulation and eventually lead to aberrant cell functioning which can be monitored by transcriptome analysis. It has become obvious that epigenetic alterations represent a further important layer of (de-)regulation of gene activity. For example, aberrant DNA methylation is a hallmark of many cancer types, and methylation patterns were successfully used to subtype cancer heterogeneity. Hence, unraveling the interplay between different omics levels such as genome, transcriptome and epigenome is inevitable for a mechanistic understanding of molecular deregulation causing complex diseases such as cancer. This objective requires powerful downstream integrative bioinformatics methods as an essential prerequisite to discover the whole genome mutational, transcriptome and epigenome landscapes of cancer specimen and to discover cancer genesis, progression and heterogeneity. Basic challenges and tasks arise ‘beyond sequencing’ because of the big size of the data, their complexity, the need to search for hidden structures in the data, for knowledge mining to discover biological function and also systems biology conceptual models to deduce developmental interrelations between different cancer states. These tasks are tightly related to cancer biology as an (epi-)genetic disease giving rise to aberrant genomic regulation under micro-environmental control and clonal evolution which leads to heterogeneous cellular states. Machine learning algorithms such as self organizing maps (SOM) represent one interesting option to tackle these bioinformatics tasks. The SOMmethod enables recognizing complex patterns in large-scale data generated by highthroughput omics technologies. It portrays molecular phenotypes by generating individualized, easy to interpret images of the data landscape in combination with comprehensive analysis options. Our image-based, reductionist machine learning methods provide one interesting perspective how to deal with massive data in the discovery of complex diseases, gliomas, melanomas and colon cancer on molecular level. As an important new challenge, we address the combined portrayal of different omics data such as genome-wide genomic, transcriptomic and methylomic ones. The integrative-omics portrayal approach is based on the joint training of the data and it provides separate personalized data portraits for each patient and data type which can be analyzed by visual inspection as one option. The new method enables an integrative genome-wide view on the omics data types and the underlying regulatory modes. It is applied to high and low-grade gliomas and to melanomas where it disentangles transversal and longitudinal molecular heterogeneity in terms of distinct molecular subtypes and progression paths with prognostic impact.

Keywords: integrative bioinformatics, machine learning, molecular mechanisms of cancer, gliomas and melanomas

Procedia PDF Downloads 121
24295 Complete Genome Sequence Analysis of Pasteurella multocida Subspecies multocida Serotype A Strain PMTB2.1

Authors: Shagufta Jabeen, Faez J. Firdaus Abdullah, Zunita Zakaria, Nurulfiza M. Isa, Yung C. Tan, Wai Y. Yee, Abdul R. Omar

Abstract:

Pasteurella multocida (PM) is an important veterinary opportunistic pathogen particularly associated with septicemic pasteurellosis, pneumonic pasteurellosis and hemorrhagic septicemia in cattle and buffaloes. P. multocida serotype A has been reported to cause fatal pneumonia and septicemia. Pasteurella multocida subspecies multocida of serotype A Malaysian isolate PMTB2.1 was first isolated from buffaloes died of septicemia. In this study, the genome of P. multocida strain PMTB2.1 was sequenced using third-generation sequencing technology, PacBio RS2 system and analyzed bioinformatically via de novo analysis followed by in-depth analysis based on comparative genomics. Bioinformatics analysis based on de novo assembly of PacBio raw reads generated 3 contigs followed by gap filling of aligned contigs with PCR sequencing, generated a single contiguous circular chromosome with a genomic size of 2,315,138 bp and a GC content of approximately 40.32% (Accession number CP007205). The PMTB2.1 genome comprised of 2,176 protein-coding sequences, 6 rRNA operons and 56 tRNA and 4 ncRNAs sequences. The comparative genome sequence analysis of PMTB2.1 with nine complete genomes which include Actinobacillus pleuropneumoniae, Haemophilus parasuis, Escherichia coli and five P. multocida complete genome sequences including, PM70, PM36950, PMHN06, PM3480, PMHB01 and PMTB2.1 was carried out based on OrthoMCL analysis and Venn diagram. The analysis showed that 282 CDs (13%) are unique to PMTB2.1and 1,125 CDs with orthologs in all. This reflects overall close relationship of these bacteria and supports the classification in the Gamma subdivision of the Proteobacteria. In addition, genomic distance analysis among all nine genomes indicated that PMTB2.1 is closely related with other five Pasteurella species with genomic distance less than 0.13. Synteny analysis shows subtle differences in genetic structures among different P.multocida indicating the dynamics of frequent gene transfer events among different P. multocida strains. However, PM3480 and PM70 exhibited exceptionally large structural variation since they were swine and chicken isolates. Furthermore, genomic structure of PMTB2.1 is more resembling that of PM36950 with a genomic size difference of approximately 34,380 kb (smaller than PM36950) and strain-specific Integrative and Conjugative Elements (ICE) which was found only in PM36950 is absent in PMTB2.1. Meanwhile, two intact prophages sequences of approximately 62 kb were found to be present only in PMTB2.1. One of phage is similar to transposable phage SfMu. The phylogenomic tree was constructed and rooted with E. coli, A. pleuropneumoniae and H. parasuis based on OrthoMCL analysis. The genomes of P. multocida strain PMTB2.1 were clustered with bovine isolates of P. multocida strain PM36950 and PMHB01 and were separated from avian isolate PM70 and swine isolates PM3480 and PMHN06 and are distant from Actinobacillus and Haemophilus. Previous studies based on Single Nucleotide Polymorphism (SNPs) and Multilocus Sequence Typing (MLST) unable to show a clear phylogenetic relatedness between Pasteurella multocida and the different host. In conclusion, this study has provided insight on the genomic structure of PMTB2.1 in terms of potential genes that can function as virulence factors for future study in elucidating the mechanisms behind the ability of the bacteria in causing diseases in susceptible animals.

Keywords: comparative genomics, DNA sequencing, phage, phylogenomics

Procedia PDF Downloads 154
24294 Transcriptional Evidence for the Involvement of MyD88 in Flagellin Recognition: Genomic Identification of Rock Bream MyD88 and Comparative Analysis

Authors: N. Umasuthan, S. D. N. K. Bathige, W. S. Thulasitha, I. Whang, J. Lee

Abstract:

The MyD88 is an evolutionarily conserved host-expressed adaptor protein that is essential for proper TLR/ IL1R immune-response signaling. A previously identified complete cDNA (1626 bp) of OfMyD88 comprised an ORF of 867 bp encoding a protein of 288 amino acids (32.9 kDa). The gDNA (3761 bp) of OfMyD88 revealed a quinquepartite genome organization composed of 5 exons (with the sizes of 310, 132, 178, 92 and 155 bp) separated by 4 introns. All the introns displayed splice signals consistent with the consensus GT/AG rule. A bipartite domain structure with two domains namely death domain (24-103) coded by 1st exon, and TIR domain (151-288) coded by last 3 exons were identified through in silico analysis. Moreover, homology modeling of these two domains revealed a similar quaternary folding nature between human and rock bream homologs. A comprehensive comparison of vertebrate MyD88 genes showed that they possess a 5-exonic structure. In this structure, the last three exons were strongly conserved, and this suggests that a rigid structure has been maintained during vertebrate evolution. A cluster of TATA box-like sequences were found 0.25 kb upstream of cDNA starting position. In addition, putative 5'-flanking region of OfMyD88 was predicted to have TFBS implicated with TLR signaling, including copies of NFB1, APRF/ STAT3, Sp1, IRF1 and 2 and Stat1/2. Using qPCR technique, a ubiquitous mRNA expression was detected in liver and blood. Furthermore, a significantly up-regulated transcriptional expression of OfMyD88 was detected in head kidney (12-24 h; >2-fold), spleen (6 h; 1.5-fold), liver (3 h; 1.9-fold) and intestine (24 h; ~2-fold) post-Fla challenge. These data suggest a crucial role for MyD88 in antibacterial immunity of teleosts.

Keywords: MyD88, innate immunity, flagellin, genomic analysis

Procedia PDF Downloads 388
24293 Genomic Imprinting as a Possible Epigenetic Cause of Esophageal Atresia

Authors: M. Błoch, P. Karpiński, P. Gasperowicz, R. Płoski, A. Lebioda, P. Skiba, A. Rozensztrauch, D. Patkowski, R. Śmigiel

Abstract:

Introduction: The cause of the isolated form of esophageal atresia has been yet unknown. Objectives: The primary objective of this study was to indicate epigenetic factors which may play an important role in the etiopathogenesis of esophageal atresia. Methods: We recruited a group of 6 pairs of twins, among whom one of the twins developed EA. The selection of such a group for testing allows for excluding external factors (e.g., infections, drugs, toxins) as the cause of the birth defect. The analyzes were performed with the use of genetic material isolated from the whole blood and esophagus tissue of a patient with EA. The reduced representation bisulphite sequencing (RRBS) technique was used to study the change in the genomic imprinting -a change in the expression of genes, which may be the epigenetic cause of EA. Results: In the course of the analyzes, significant hypomethylation and hypermethylation regions were identified. 65 genes with probably increased expression and 65 with decreased expression were selected. These genes have not been marked in literature as possibly pathogenic in esophageal atresia. However, their participation in the pathogenesis of esophageal atresia cannot be clearly excluded. Conclusion: We suggest a role of hypomethylation or hypermethylation of selected genes as one of the possible epigenetic factors in EA pathogenesis. The use of the RRBS technique in the search for the cause of EA is pioneer research; therefore, it seems necessary to extend the research group to new patients with EA. Acknowledgment: The work was supported by the National Science Centre, Poland, under research project 2016/21/N/NZ5/01927.

Keywords: esophageal atresia, epigenetics, embryonic development, surgery, genes expression, twins

Procedia PDF Downloads 48
24292 Hyper Tuned RBF SVM: Approach for the Prediction of the Breast Cancer

Authors: Surita Maini, Sanjay Dhanka

Abstract:

Machine learning (ML) involves developing algorithms and statistical models that enable computers to learn and make predictions or decisions based on data without being explicitly programmed. Because of its unlimited abilities ML is gaining popularity in medical sectors; Medical Imaging, Electronic Health Records, Genomic Data Analysis, Wearable Devices, Disease Outbreak Prediction, Disease Diagnosis, etc. In the last few decades, many researchers have tried to diagnose Breast Cancer (BC) using ML, because early detection of any disease can save millions of lives. Working in this direction, the authors have proposed a hybrid ML technique RBF SVM, to predict the BC in earlier the stage. The proposed method is implemented on the Breast Cancer UCI ML dataset with 569 instances and 32 attributes. The authors recorded performance metrics of the proposed model i.e., Accuracy 98.24%, Sensitivity 98.67%, Specificity 97.43%, F1 Score 98.67%, Precision 98.67%, and run time 0.044769 seconds. The proposed method is validated by K-Fold cross-validation.

Keywords: breast cancer, support vector classifier, machine learning, hyper parameter tunning

Procedia PDF Downloads 43
24291 The Identification of Combined Genomic Expressions as a Diagnostic Factor for Oral Squamous Cell Carcinoma

Authors: Ki-Yeo Kim

Abstract:

Trends in genetics are transforming in order to identify differential coexpressions of correlated gene expression rather than the significant individual gene. Moreover, it is known that a combined biomarker pattern improves the discrimination of a specific cancer. The identification of the combined biomarker is also necessary for the early detection of invasive oral squamous cell carcinoma (OSCC). To identify the combined biomarker that could improve the discrimination of OSCC, we explored an appropriate number of genes in a combined gene set in order to attain the highest level of accuracy. After detecting a significant gene set, including the pre-defined number of genes, a combined expression was identified using the weights of genes in a gene set. We used the Principal Component Analysis (PCA) for the weight calculation. In this process, we used three public microarray datasets. One dataset was used for identifying the combined biomarker, and the other two datasets were used for validation. The discrimination accuracy was measured by the out-of-bag (OOB) error. There was no relation between the significance and the discrimination accuracy in each individual gene. The identified gene set included both significant and insignificant genes. One of the most significant gene sets in the classification of normal and OSCC included MMP1, SOCS3 and ACOX1. Furthermore, in the case of oral dysplasia and OSCC discrimination, two combined biomarkers were identified. The combined genomic expression achieved better performance in the discrimination of different conditions than in a single significant gene. Therefore, it could be expected that accurate diagnosis for cancer could be possible with a combined biomarker.

Keywords: oral squamous cell carcinoma, combined biomarker, microarray dataset, correlated genes

Procedia PDF Downloads 393
24290 Inbreeding Study Using Runs of Homozygosity in Nelore Beef Cattle

Authors: Priscila A. Bernardes, Marcos E. Buzanskas, Luciana C. A. Regitano, Ricardo V. Ventura, Danisio P. Munari

Abstract:

The best linear unbiased predictor (BLUP) is a method commonly used in genetic evaluations of breeding programs. However, this approach can lead to higher inbreeding coefficients in the population due to the intensive use of few bulls with higher genetic potential, usually presenting some degree of relatedness. High levels of inbreeding are associated to low genetic viability, fertility, and performance for some economically important traits and therefore, should be constantly monitored. Unreliable pedigree data can also lead to misleading results. Genomic information (i.e., single nucleotide polymorphism – SNP) is a useful tool to estimate the inbreeding coefficient. Runs of homozygosity have been used to evaluate homozygous segments inherited due to direct or collateral inbreeding and allows inferring population selection history. This study aimed to evaluate runs of homozygosity (ROH) and inbreeding in a population of Nelore beef cattle. A total of 814 animals were genotyped with the Illumina BovineHD BeadChip and the quality control was carried out excluding SNPs located in non-autosomal regions, with unknown position, with a p-value in the Hardy-Weinberg equilibrium lower than 10⁻⁵, call rate lower than 0.98 and samples with the call rate lower than 0.90. After the quality control, 809 animals and 509,107 SNPs remained for analyses. For the ROH analysis, PLINK software was used considering segments with at least 50 SNPs with a minimum length of 1Mb in each animal. The inbreeding coefficient was calculated using the ratio between the sum of all ROH sizes and the size of the whole genome (2,548,724kb). A total of 25.711 ROH were observed, presenting mean, median, minimum, and maximum length of 3.34Mb, 2Mb, 1Mb, and 80.8Mb, respectively. The number of SNPs present in ROH segments varied from 50 to 14.954. The longest ROH length was observed in one animal, which presented a length of 634Mb (24.88% of the genome). Four bulls were among the 10 animals with the longest extension of ROH, presenting 11% of ROH with length higher than 10Mb. Segments longer than 10Mb indicate recent inbreeding. Therefore, the results indicate an intensive use of few sires in the studied data. The distribution of ROH along the chromosomes showed that chromosomes 5 and 6 presented a large number of segments when compared to other chromosomes. The mean, median, minimum, and maximum inbreeding coefficients were 5.84%, 5.40%, 0.00%, and 24.88%, respectively. Although the mean inbreeding was considered low, the ROH indicates a recent and intensive use of few sires, which should be avoided for the genetic progress of breed.

Keywords: autozygosity, Bos taurus indicus, genomic information, single nucleotide polymorphism

Procedia PDF Downloads 126
24289 Neuroblastoma in Children and the Potential Involvement of Viruses in Its Pathogenesis

Authors: Ugo Rovigatti

Abstract:

Neuroblastoma (NBL) has epitomized for at least 40 years our understanding of cancer cellular and molecular biology and its potential applications to novel therapeutic strategies. This includes the discovery of the very first oncogene aberrations and tumorigenesis suppression by differentiation in the 80s; the potential role of suppressor genes in the 90s; the relevance of immunotherapy in the millennium first, and the discovery of additional mutations by NGS technology in the millennium second decade. Similar discoveries were achieved in the majority of human cancers, and similar therapeutic interventions were obtained subsequently to NBL discoveries. Unfortunately, targeted therapies suggested by specific mutations (such as MYCN amplification –MNA- present in ¼ or 1/5 of cases) have not elicited therapeutic successes in aggressive NBL, where the prognosis is still dismal. The reasons appear to be linked to Tumor Heterogeneity, which is particularly evident in NBL but also a clear hallmark of aggressive human cancers generally. The new avenue of cancer immunotherapy (CIT) provided new hopes for cancer patients, but we still ignore the cellular or molecular targets. CIT is emblematic of high-risk disease (HR-NBL) since the mentioned GD2 passive immunotherapy is still providing better survival. We recently critically reviewed and evaluated the literature depicting the genomic landscapes of HR-NBL, coming to the qualified conclusion that among hundreds of affected genes, potential targets, or chromosomal sites, none correlated with anti-GD2 sensitivity. A better explanation is provided by the Micro-Foci inducing Virus (MFV) model, which predicts that neuroblasts infection with the MFV, an RNA virus isolated from a cancer-cluster (space-time association) of HR-NBL cases, elicits the appearance of MNA and additional genomic aberrations with mechanisms resembling chromothripsis. Neuroblasts infected with low titers of MFV amplified MYCN up to 100 folds and became highly transformed and malignant, thus causing neuroblastoma in young rat pups of strains SD and Fisher-344 and larger tumor masses in nu/nu mice. An association was discovered with GD2 since this glycosphingolipid is also the receptor for the family of MFV virus (dsRNA viruses). It is concluded that a dsRNA virus, MFV, appears to provide better explicatory mechanisms for the genesis of i) specific genomic aberrations such as MNA; ii) extensive tumor heterogeneity and chromothripsis; iii) the effects of passive immunotherapy with anti-GD2 monoclonals and that this and similar models should be further investigated in both pediatric and adult cancers.

Keywords: neuroblastoma, MYCN, amplification, viruses, GD2

Procedia PDF Downloads 79
24288 A Hybrid Feature Selection and Deep Learning Algorithm for Cancer Disease Classification

Authors: Niousha Bagheri Khulenjani, Mohammad Saniee Abadeh

Abstract:

Learning from very big datasets is a significant problem for most present data mining and machine learning algorithms. MicroRNA (miRNA) is one of the important big genomic and non-coding datasets presenting the genome sequences. In this paper, a hybrid method for the classification of the miRNA data is proposed. Due to the variety of cancers and high number of genes, analyzing the miRNA dataset has been a challenging problem for researchers. The number of features corresponding to the number of samples is high and the data suffer from being imbalanced. The feature selection method has been used to select features having more ability to distinguish classes and eliminating obscures features. Afterward, a Convolutional Neural Network (CNN) classifier for classification of cancer types is utilized, which employs a Genetic Algorithm to highlight optimized hyper-parameters of CNN. In order to make the process of classification by CNN faster, Graphics Processing Unit (GPU) is recommended for calculating the mathematic equation in a parallel way. The proposed method is tested on a real-world dataset with 8,129 patients, 29 different types of tumors, and 1,046 miRNA biomarkers, taken from The Cancer Genome Atlas (TCGA) database.

Keywords: cancer classification, feature selection, deep learning, genetic algorithm

Procedia PDF Downloads 88
24287 Genomic and Proteomic Variation in Glycine Max Genotypes towards Salinity

Authors: Faheema Khan

Abstract:

In order to investigate the influence of genetic background on salt tolerance in Soybean (Glycine max) ten soybean genotypes released/notified in India were selected. (Pusa-20, Pusa-40, Pusa-37, Pusa-16, Pusa-24, Pusa-22, BRAGG, PK-416, PK-1042, and DS-9712). The 10-day-old seedlings were subjected to 0, 25, 50, 75, 100, 125, and 150 mM NaCl for 15 days. Plant growth, leaf osmotic adjustment, and RAPD analysis were studied. In comparison to control plants, the plant growth in all genotypes was decreased by salt stress, respectively. Salt stress decreased leaf osmotic potential in all genotypes however the maximum reduction was observed in genotype Pusa-24 followed by PK-416 and Pusa-20. The difference in osmotic adjustment between all the genotypes was correlated with the concentrations of ion examined such as Na+ and the leaf proline concentration. These results suggest that the genotypic variation for salt tolerance can be partially accounted for by plant physiological measures. The genetic polymorphisms between soybean genotypes differing in response to salt stress were characterized using 25 RAPD primers. These primers generated a total of 1640 amplification products, among which 1615 were found to be polymorphic. A very high degree of polymorphism (98.30%) was observed. UPGMA cluster analysis of genetic similarity indices grouped all the genotypes into two major clusters. Intra-clustering within the two clusters precisely grouped the 10 genotypes in sub-cluster as expected from their physiological findings. Our results show that RAPD technique is a sensitive, precise and efficient tool for genomic analysis in soybean genotypes.

Keywords: glycine max, NaCl, RAPD, proteomics

Procedia PDF Downloads 555
24286 Genomics of Aquatic Adaptation

Authors: Agostinho Antunes

Abstract:

The completion of the human genome sequencing in 2003 opened a new perspective into the importance of whole genome sequencing projects, and currently multiple species are having their genomes completed sequenced, from simple organisms, such as bacteria, to more complex taxa, such as mammals. This voluminous sequencing data generated across multiple organisms provides also the framework to better understand the genetic makeup of such species and related ones, allowing to explore the genetic changes underlining the evolution of diverse phenotypic traits. Here, recent results from our group retrieved from comparative evolutionary genomic analyses of selected marine animal species will be considered to exemplify how gene novelty and gene enhancement by positive selection might have been determinant in the success of adaptive radiations into diverse habitats and lifestyles.

Keywords: comparative genomics, adaptive evolution, bioinformatics, phylogenetics, genome mining

Procedia PDF Downloads 504
24285 Genomic Identification of Anisakis Simplex Larvae by PCR-RAPD

Authors: Fumiko Kojima, Shuji Fujimoto

Abstract:

Anisakiasis is a disease caused by infection with an anisakid larvae, mostly Anisakis simplex. The larvae commonly infect in marine fish and the disease is frequently reported in areas of the world where fish is consumed raw, lightly pickled or salted. In Japan, people have the habit of eating raw fish such as ‘sushi’ or ‘sashimi’, so they have more chance of infection with larvae of anisakid nematodes. There are three sibling species in A. simplex larvae, namely, A. simplex sensu stricto (Asss), A. pegreffii (Ap) and A. simplex C. It was revealed that Ap is dominant among the larvae from fish (Scomber japonics) in the Japan Sea side and Asss is dominant among those of the Pacific Ocean side conversely. Although anisakiasis has happened in Japan among both the Japan Sea side area and the Pacific Ocean side area. The aim of this study was to investigate genetic variations between the siblings (Asss and Ap) and within the same sibling species by random amplified polymorphic DNA (RAPD) technique. In order to investigate the genetic difference among the each A. simplex larvae, we used RAPD technique to differentiate individuals of A. simplex obtained from Scomber japonics fish those were caught in the Japan sea (Goto Islands in Nagasaki Prefecture) and the cost of Pacific Ocean (Kanagawa Prefecture). The RAPD patterns of the control DNA (Genus Raphidascaris) were markedly different from those of the A. simplex. There were differences in amplification patterns between Asss and Ap. The RAPD patterns for larvae obtained from fish of the same sea were somewhat different and variations were detected even among larvae from the same fish. These results suggest the considerable high genetic variability between Asss and Ap and the possible existence of genetic variation within the sibling species.

Keywords: Anisakiasis in Japan, Anisakis simplex, genomic identification, PCR-RAPD

Procedia PDF Downloads 157
24284 Deepnic, A Method to Transform Each Variable into Image for Deep Learning

Authors: Nguyen J. M., Lucas G., Brunner M., Ruan S., Antonioli D.

Abstract:

Deep learning based on convolutional neural networks (CNN) is a very powerful technique for classifying information from an image. We propose a new method, DeepNic, to transform each variable of a tabular dataset into an image where each pixel represents a set of conditions that allow the variable to make an error-free prediction. The contrast of each pixel is proportional to its prediction performance and the color of each pixel corresponds to a sub-family of NICs. NICs are probabilities that depend on the number of inputs to each neuron and the range of coefficients of the inputs. Each variable can therefore be expressed as a function of a matrix of 2 vectors corresponding to an image whose pixels express predictive capabilities. Our objective is to transform each variable of tabular data into images into an image that can be analysed by CNNs, unlike other methods which use all the variables to construct an image. We analyse the NIC information of each variable and express it as a function of the number of neurons and the range of coefficients used. The predictive value and the category of the NIC are expressed by the contrast and the color of the pixel. We have developed a pipeline to implement this technology and have successfully applied it to genomic expressions on an Affymetrix chip.

Keywords: tabular data, deep learning, perfect trees, NICS

Procedia PDF Downloads 56
24283 Genomic Surveillance of Bacillus Anthracis in South Africa Revealed a Unique Genetic Cluster of B- Clade Strains

Authors: Kgaugelo Lekota, Ayesha Hassim, Henriette Van Heerden

Abstract:

Bacillus anthracis is the causative agent of anthrax that is composed of three genetic groups, namely A, B, and C. Clade-A is distributed world-wide, while sub-clades B has been identified in Kruger National Park (KNP), South Africa. KNP is one of the endemic anthrax regions in South Africa with distinctive genetic diversity. Genomic surveillance of KNP B. anthracis strains was employed on the historical culture collection isolates (n=67) dated from the 1990’s to 2015 using a whole genome sequencing approach. Whole genome single nucleotide polymorphism (SNPs) and pan-genomics analysis were used to define the B. anthracis genetic population structure. This study showed that KNP has heterologous B. anthracis strains grouping in the A-clade with more prominent ABr.005/006 (Ancient A) SNP lineage. The 2012 and 2015 anthrax isolates are dispersed amongst minor sub-clades that prevail in non-stabilized genetic evolution strains. This was augmented with non-parsimony informative SNPs of the B. anthracis strains across minor sub-clades of the Ancient A clade. Pan-genomics of B. anthracis showed a clear distinction between A and B-clade genomes with 11 374 predicted clusters of protein coding genes. Unique accessory genes of B-clade genomes that included biosynthetic cell wall genes and multidrug resistant of Fosfomycin. South Africa consists of diverse B. anthracis strains with unique defined SNPs. The sequenced B. anthracis strains in this study will serve as a means to further trace the dissemination of B. anthracis outbreaks globally and especially in South Africa.

Keywords: bacillus anthracis, whole genome single nucleotide polymorphisms, pangenomics, kruger national park

Procedia PDF Downloads 108
24282 Predicting Open Chromatin Regions in Cell-Free DNA Whole Genome Sequencing Data by Correlation Clustering  

Authors: Fahimeh Palizban, Farshad Noravesh, Amir Hossein Saeidian, Mahya Mehrmohamadi

Abstract:

In the recent decade, the emergence of liquid biopsy has significantly improved cancer monitoring and detection. Dying cells, including those originating from tumors, shed their DNA into the blood and contribute to a pool of circulating fragments called cell-free DNA. Accordingly, identifying the tissue origin of these DNA fragments from the plasma can result in more accurate and fast disease diagnosis and precise treatment protocols. Open chromatin regions are important epigenetic features of DNA that reflect cell types of origin. Profiling these features by DNase-seq, ATAC-seq, and histone ChIP-seq provides insights into tissue-specific and disease-specific regulatory mechanisms. There have been several studies in the area of cancer liquid biopsy that integrate distinct genomic and epigenomic features for early cancer detection along with tissue of origin detection. However, multimodal analysis requires several types of experiments to cover the genomic and epigenomic aspects of a single sample, which will lead to a huge amount of cost and time. To overcome these limitations, the idea of predicting OCRs from WGS is of particular importance. In this regard, we proposed a computational approach to target the prediction of open chromatin regions as an important epigenetic feature from cell-free DNA whole genome sequence data. To fulfill this objective, local sequencing depth will be fed to our proposed algorithm and the prediction of the most probable open chromatin regions from whole genome sequencing data can be carried out. Our method integrates the signal processing method with sequencing depth data and includes count normalization, Discrete Fourie Transform conversion, graph construction, graph cut optimization by linear programming, and clustering. To validate the proposed method, we compared the output of the clustering (open chromatin region+, open chromatin region-) with previously validated open chromatin regions related to human blood samples of the ATAC-DB database. The percentage of overlap between predicted open chromatin regions and the experimentally validated regions obtained by ATAC-seq in ATAC-DB is greater than 67%, which indicates meaningful prediction. As it is evident, OCRs are mostly located in the transcription start sites (TSS) of the genes. In this regard, we compared the concordance between the predicted OCRs and the human genes TSS regions obtained from refTSS and it showed proper accordance around 52.04% and ~78% with all and the housekeeping genes, respectively. Accurately detecting open chromatin regions from plasma cell-free DNA-seq data is a very challenging computational problem due to the existence of several confounding factors, such as technical and biological variations. Although this approach is in its infancy, there has already been an attempt to apply it, which leads to a tool named OCRDetector with some restrictions like the need for highly depth cfDNA WGS data, prior information about OCRs distribution, and considering multiple features. However, we implemented a graph signal clustering based on a single depth feature in an unsupervised learning manner that resulted in faster performance and decent accuracy. Overall, we tried to investigate the epigenomic pattern of a cell-free DNA sample from a new computational perspective that can be used along with other tools to investigate genetic and epigenetic aspects of a single whole genome sequencing data for efficient liquid biopsy-related analysis.

Keywords: open chromatin regions, cancer, cell-free DNA, epigenomics, graph signal processing, correlation clustering

Procedia PDF Downloads 112
24281 Genome-Wide Mining of Potential Guide RNAs for Streptococcus pyogenes and Neisseria meningitides CRISPR-Cas Systems for Genome Engineering

Authors: Farahnaz Sadat Golestan Hashemi, Mohd Razi Ismail, Mohd Y. Rafii

Abstract:

Clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated protein (Cas) system can facilitate targeted genome editing in organisms. Dual or single guide RNA (gRNA) can program the Cas9 nuclease to cut target DNA in particular areas; thus, introducing concise mutations either via error-prone non-homologous end-joining repairing or via incorporating foreign DNAs by homologous recombination between donor DNA and target area. In spite of high demand of such promising technology, developing a well-organized procedure in order for reliable mining of potential target sites for gRNAs in large genomic data is still challenging. Hence, we aimed to perform high-throughput detection of target sites by specific PAMs for not only common Streptococcus pyogenes (SpCas9) but also for Neisseria meningitides (NmCas9) CRISPR-Cas systems. Previous research confirmed the successful application of such RNA-guided Cas9 orthologs for effective gene targeting and subsequently genome manipulation. However, Cas9 orthologs need their particular PAM sequence for DNA cleavage activity. Activity levels are based on the sequence of the protospacer and specific combinations of favorable PAM bases. Therefore, based on the specific length and sequence of PAM followed by a constant length of the target site for the two orthogonals of Cas9 protein, we created a reliable procedure to explore possible gRNA sequences. To mine CRISPR target sites, four different searching modes of sgRNA binding to target DNA strand were applied. These searching modes are as follows i) coding strand searching, ii) anti-coding strand searching, iii) both strand searching, and iv) paired-gRNA searching. Finally, a complete list of all potential gRNAs along with their locations, strands, and PAMs sequence orientation can be provided for both SpCas9 as well as another potential Cas9 ortholog (NmCas9). The artificial design of potential gRNAs in a genome of interest can accelerate functional genomic studies. Consequently, the application of such novel genome editing tool (CRISPR/Cas technology) will enhance by presenting increased versatility and efficiency.

Keywords: CRISPR/Cas9 genome editing, gRNA mining, SpCas9, NmCas9

Procedia PDF Downloads 228
24280 Efficient Reuse of Exome Sequencing Data for Copy Number Variation Callings

Authors: Chen Wang, Jared Evans, Yan Asmann

Abstract:

With the quick evolvement of next-generation sequencing techniques, whole-exome or exome-panel data have become a cost-effective way for detection of small exonic mutations, but there has been a growing desire to accurately detect copy number variations (CNVs) as well. In order to address this research and clinical needs, we developed a sequencing coverage pattern-based method not only for copy number detections, data integrity checks, CNV calling, and visualization reports. The developed methodologies include complete automation to increase usability, genome content-coverage bias correction, CNV segmentation, data quality reports, and publication quality images. Automatic identification and removal of poor quality outlier samples were made automatically. Multiple experimental batches were routinely detected and further reduced for a clean subset of samples before analysis. Algorithm improvements were also made to improve somatic CNV detection as well as germline CNV detection in trio family. Additionally, a set of utilities was included to facilitate users for producing CNV plots in focused genes of interest. We demonstrate the somatic CNV enhancements by accurately detecting CNVs in whole exome-wide data from the cancer genome atlas cancer samples and a lymphoma case study with paired tumor and normal samples. We also showed our efficient reuses of existing exome sequencing data, for improved germline CNV calling in a family of the trio from the phase-III study of 1000 Genome to detect CNVs with various modes of inheritance. The performance of the developed method is evaluated by comparing CNV calling results with results from other orthogonal copy number platforms. Through our case studies, reuses of exome sequencing data for calling CNVs have several noticeable functionalities, including a better quality control for exome sequencing data, improved joint analysis with single nucleotide variant calls, and novel genomic discovery of under-utilized existing whole exome and custom exome panel data.

Keywords: bioinformatics, computational genetics, copy number variations, data reuse, exome sequencing, next generation sequencing

Procedia PDF Downloads 235
24279 Prediction of Ionizing Radiation Doses in Irradiated red Pepper (Capsicum annuum) and Mint (Mentha piperita) by Gel Electrophoresis

Authors: Şeyma Özçirak Ergün, Ergün Şakalar, Emrah Yalazi̇, Nebahat Şahi̇n

Abstract:

Food irradiation is a usage of exposing food to ionising radiation (IR) such as gamma rays. IR has been used to decrease the number of harmful microorganisms in the food such as spices. Excessive usage of IR can cause damage to both food and people who consuming food. And also it causes to damages on food DNA. Generally, IR detection techniques were utilized in literature for spices are Electron Spin Resonance (ESR), Thermos Luminescence (TL). Storage creates negative effect on IR detection method then analyses of samples have been performed without storage in general. In the experimental part, red pepper (Capsicum annuum) and mint (Mentha piperita) as spices were exposed to 0, 0.272, 0.497, 1.06, 3.64, 8.82, and 17.42 kGy ionize radiation. ESR was applied to samples irradiated. DNA isolation from irradiated samples was performed using GIDAGEN Multi Fast DNA isolation kit. The DNA concentration was measured using a microplate reader spectrophotometer (Infinite® 200 PRO-Life Science–Tecan). The concentration of each DNA was adjusted to 50 ng/µL. Genomic DNA was imaged by UV transilluminator (Gel Doc XR System, Bio-Rad) for the estimation of genomic DNA bp-fragment size after IR. Thus, agarose gel profiles of irradiated spices were obtained to determine the change of band profiles. Besides, samples were examined at three different time periods (0, 3, 6 months storage) to show the feasibility of developed method. Results of gel electrophoresis showed especially degradation of DNA of irradiated samples. In conclusion, this study with gel electrophoresis can be used as a basis for the identification of the dose of irradiation by looking at degradation profiles at specific amounts of irradiation. Agarose gel results of irradiated samples were confirmed with ESR analysis. This method can be applied widely to not only food products but also all biological materials containing DNA to predict radiation-induced damage of DNA.

Keywords: DNA, electrophoresis, gel electrophoresis, ionizeradiation

Procedia PDF Downloads 232
24278 Isolation and Characterisation of Novel Environmental Bacteriophages Which Target the Escherichia coli Lamb Outer Membrane Protein

Authors: Ziyue Zeng

Abstract:

Bacteriophages are viruses which infect bacteria specifically. Over the past decades, phage λ has been extensively studied, especially its interaction with the Escherichia coli LamB (EcLamB) protein receptor. Nonetheless, despite the enormous numbers and near-ubiquity of environmental phages, aside from phage λ, there is a paucity of information on other phages which target EcLamB as a receptor. In this study, to answer the question of whether there are other EcLamB-targeting phages in the natural environment, a simple and convenient method was developed and used for isolating environmental phages which target a particular surface structure of a particular bacterium; in this case, the EcLamB outer membrane protein. From the enrichments with the engineered bacterial hosts, a collection of EcLamB-targeting phages (ΦZZ phages) were easily isolated. Intriguingly, unlike phage λ, an obligate EcLamB-dependent phage in the Siphoviridae family, the newly isolated ΦZZ phages alternatively recognised EcLamB or E. coli OmpC (EcOmpC) as a receptor when infecting E. coli. Furthermore, ΦZZ phages were suggested to represent new species in the Tequatrovirus genus in the Myoviridae family, based on phage morphology and genomic sequences. Most phages are thought to have a narrow host range due to their exquisite specificity in receptor recognition. With the ability to optionally recognise two receptors, ΦZZ phages were considered relatively promiscuous. Via the heterologous expression of EcLamB on the bacterial cell surface, the host range of ΦZZ phages was further extended to three different enterobacterial genera. Besides, an interesting selection of evolved phage mutants with a broader host range was isolated, and the key mutations involved in their evolution to adapt to new hosts were investigated by genomic analysis. Finally, and importantly, two ΦZZ phages were found to be putative generalised transducers, which could be exploited as tools for DNA manipulations.

Keywords: environmental microbiology, phage, microbe-host interactions, microbial ecology

Procedia PDF Downloads 66
24277 Assessing Significance of Correlation with Binomial Distribution

Authors: Vijay Kumar Singh, Pooja Kushwaha, Prabhat Ranjan, Krishna Kumar Ojha, Jitendra Kumar

Abstract:

Present day high-throughput genomic technologies, NGS/microarrays, are producing large volume of data that require improved analysis methods to make sense of the data. The correlation between genes and samples has been regularly used to gain insight into many biological phenomena including, but not limited to, co-expression/co-regulation, gene regulatory networks, clustering and pattern identification. However, presence of outliers and violation of assumptions underlying Pearson correlation is frequent and may distort the actual correlation between the genes and lead to spurious conclusions. Here, we report a method to measure the strength of association between genes. The method assumes that the expression values of a gene are Bernoulli random variables whose outcome depends on the sample being probed. The method considers the two genes as uncorrelated if the number of sample with same outcome for both the genes (Ns) is equal to certainly expected number (Es). The extent of correlation depends on how far Ns can deviate from the Es. The method does not assume normality for the parent population, fairly unaffected by the presence of outliers, can be applied to qualitative data and it uses the binomial distribution to assess the significance of association. At this stage, we would not claim about the superiority of the method over other existing correlation methods, but our method could be another way of calculating correlation in addition to existing methods. The method uses binomial distribution, which has not been used until yet, to assess the significance of association between two variables. We are evaluating the performance of our method on NGS/microarray data, which is noisy and pierce by the outliers, to see if our method can differentiate between spurious and actual correlation. While working with the method, it has not escaped our notice that the method could also be generalized to measure the association of more than two variables which has been proven difficult with the existing methods.

Keywords: binomial distribution, correlation, microarray, outliers, transcriptome

Procedia PDF Downloads 382
24276 Towards End-To-End Disease Prediction from Raw Metagenomic Data

Authors: Maxence Queyrel, Edi Prifti, Alexandre Templier, Jean-Daniel Zucker

Abstract:

Analysis of the human microbiome using metagenomic sequencing data has demonstrated high ability in discriminating various human diseases. Raw metagenomic sequencing data require multiple complex and computationally heavy bioinformatics steps prior to data analysis. Such data contain millions of short sequences read from the fragmented DNA sequences and stored as fastq files. Conventional processing pipelines consist in multiple steps including quality control, filtering, alignment of sequences against genomic catalogs (genes, species, taxonomic levels, functional pathways, etc.). These pipelines are complex to use, time consuming and rely on a large number of parameters that often provide variability and impact the estimation of the microbiome elements. Training Deep Neural Networks directly from raw sequencing data is a promising approach to bypass some of the challenges associated with mainstream bioinformatics pipelines. Most of these methods use the concept of word and sentence embeddings that create a meaningful and numerical representation of DNA sequences, while extracting features and reducing the dimensionality of the data. In this paper we present an end-to-end approach that classifies patients into disease groups directly from raw metagenomic reads: metagenome2vec. This approach is composed of four steps (i) generating a vocabulary of k-mers and learning their numerical embeddings; (ii) learning DNA sequence (read) embeddings; (iii) identifying the genome from which the sequence is most likely to come and (iv) training a multiple instance learning classifier which predicts the phenotype based on the vector representation of the raw data. An attention mechanism is applied in the network so that the model can be interpreted, assigning a weight to the influence of the prediction for each genome. Using two public real-life data-sets as well a simulated one, we demonstrated that this original approach reaches high performance, comparable with the state-of-the-art methods applied directly on processed data though mainstream bioinformatics workflows. These results are encouraging for this proof of concept work. We believe that with further dedication, the DNN models have the potential to surpass mainstream bioinformatics workflows in disease classification tasks.

Keywords: deep learning, disease prediction, end-to-end machine learning, metagenomics, multiple instance learning, precision medicine

Procedia PDF Downloads 97
24275 Potyviruses Genomic Analysis and Complete Evaluation

Authors: Narin Salehiyan, Ramin Ghasemi Shayan

Abstract:

The largest genus of plant viruses, the potyvirus, is responsible for significant crop losses. Potyviruses are aphid sent in a nonpersistent way, and some of them are likewise seed communicated. As significant microorganisms, potyviruses are substantially more examined than other plant infections having a place with different genera, and their review covers numerous parts of plant virology, like utilitarian portrayal of viral proteins, sub-atomic communication with hosts and vectors, structure, scientific classification, development, the study of disease transmission, and determination. Biotechnological utilizations of potyviruses are likewise being investigated. During this last ten years, significant advances have been made in the comprehension of the sub-atomic science of these infections and the elements of their different proteins. Potyvirus multiplication, movement, and transmission, as well as potyvirus/plant compatible interactions, including pathogenicity and symptom determinants, are updated following a general overview of the family Potyviridae and the potyviral proteins. it end the survey giving data on biotechnological uses of potyviruses.

Keywords: virology, poty, virus, genome, genetic

Procedia PDF Downloads 42