Search results for: human genome sequencing
9019 Genomics of Adaptation in the Sea
Authors: Agostinho Antunes
Abstract:
The completion of the human genome sequencing in 2003 opened a new perspective into the importance of whole genome sequencing projects, and currently multiple species are having their genomes completed sequenced, from simple organisms, such as bacteria, to more complex taxa, such as mammals. This voluminous sequencing data generated across multiple organisms provides also the framework to better understand the genetic makeup of such species and related ones, allowing to explore the genetic changes underlining the evolution of diverse phenotypic traits. Here, recent results from our group retrieved from comparative evolutionary genomic analyses of selected marine animal species will be considered to exemplify how gene novelty and gene enhancement by positive selection might have been determinant in the success of adaptive radiations into diverse habitats and lifestyles.Keywords: marine genomics, evolutionary bioinformatics, human genome sequencing, genomic analyses
Procedia PDF Downloads 6119018 The Role and Importance of Genome Sequencing in Prediction of Cancer Risk
Authors: M. Sadeghi, H. Pezeshk, R. Tusserkani, A. Sharifi Zarchi, A. Malekpour, M. Foroughmand, S. Goliaei, M. Totonchi, N. Ansari–Pour
Abstract:
The role and relative importance of intrinsic and extrinsic factors in the development of complex diseases such as cancer still remains a controversial issue. Determining the amount of variation explained by these factors needs experimental data and statistical models. These models are nevertheless based on the occurrence and accumulation of random mutational events during stem cell division, thus rendering cancer development a stochastic outcome. We demonstrate that not only individual genome sequencing is uninformative in determining cancer risk, but also assigning a unique genome sequence to any given individual (healthy or affected) is not meaningful. Current whole-genome sequencing approaches are therefore unlikely to realize the promise of personalized medicine. In conclusion, since genome sequence differs from cell to cell and changes over time, it seems that determining the risk factor of complex diseases based on genome sequence is somewhat unrealistic, and therefore, the resulting data are likely to be inherently uninformative.Keywords: cancer risk, extrinsic factors, genome sequencing, intrinsic factors
Procedia PDF Downloads 2709017 Genomics of Aquatic Adaptation
Authors: Agostinho Antunes
Abstract:
The completion of the human genome sequencing in 2003 opened a new perspective into the importance of whole genome sequencing projects, and currently multiple species are having their genomes completed sequenced, from simple organisms, such as bacteria, to more complex taxa, such as mammals. This voluminous sequencing data generated across multiple organisms provides also the framework to better understand the genetic makeup of such species and related ones, allowing to explore the genetic changes underlining the evolution of diverse phenotypic traits. Here, recent results from our group retrieved from comparative evolutionary genomic analyses of selected marine animal species will be considered to exemplify how gene novelty and gene enhancement by positive selection might have been determinant in the success of adaptive radiations into diverse habitats and lifestyles.Keywords: comparative genomics, adaptive evolution, bioinformatics, phylogenetics, genome mining
Procedia PDF Downloads 5339016 Evolutionary Genomic Analysis of Adaptation Genomics
Authors: Agostinho Antunes
Abstract:
The completion of the human genome sequencing in 2003 opened a new perspective into the importance of whole genome sequencing projects, and currently multiple species are having their genomes completed sequenced, from simple organisms, such as bacteria, to more complex taxa, such as mammals. This voluminous sequencing data generated across multiple organisms provides also the framework to better understand the genetic makeup of such species and related ones, allowing to explore the genetic changes underlining the evolution of diverse phenotypic traits. Here, recent results from our group retrieved from comparative evolutionary genomic analyses of varied species will be considered to exemplify how gene novelty and gene enhancement by positive selection might have been determinant in the success of adaptive radiations into diverse habitats and lifestyles.Keywords: adaptation, animals, evolution, genomics
Procedia PDF Downloads 4299015 Genome Sequencing, Assembly and Annotation of Gelidium Pristoides from Kenton-on-Sea, South Africa
Authors: Sandisiwe Mangali, Graeme Bradley
Abstract:
Genome is complete set of the organism's hereditary information encoded as either deoxyribonucleic acid or ribonucleic acid in most viruses. The three different types of genomes are nuclear, mitochondrial and the plastid genome and their sequences which are uncovered by genome sequencing are known as an archive for all genetic information and enable researchers to understand the composition of a genome, regulation of gene expression and also provide information on how the whole genome works. These sequences enable researchers to explore the population structure, genetic variations, and recent demographic events in threatened species. Particularly, genome sequencing refers to a process of figuring out the exact arrangement of the basic nucleotide bases of a genome and the process through which all the afore-mentioned genomes are sequenced is referred to as whole or complete genome sequencing. Gelidium pristoides is South African endemic Rhodophyta species which has been harvested in the Eastern Cape since the 1950s for its high economic value which is one motivation for its sequencing. Its endemism further motivates its sequencing for conservation biology as endemic species are more vulnerable to anthropogenic activities endangering a species. As sequencing, mapping and annotating the Gelidium pristoides genome is the aim of this study. To accomplish this aim, the genomic DNA was extracted and quantified using the Nucleospin Plank Kit, Qubit 2.0 and Nanodrop. Thereafter, the Ion Plus Fragment Library was used for preparation of a 600bp library which was then sequenced through the Ion S5 sequencing platform for two runs. The produced reads were then quality-controlled and assembled through the SPAdes assembler with default parameters and the genome assembly was quality assessed through the QUAST software. From this assembly, the plastid and the mitochondrial genomes were then sampled out using Gelidiales organellar genomes as search queries and ordered according to them using the Geneious software. The Qubit and the Nanodrop instruments revealed an A260/A280 and A230/A260 values of 1.81 and 1.52 respectively. A total of 30792074 reads were obtained and produced a total of 94140 contigs with resulted into a sequence length of 217.06 Mbp with N50 value of 3072 bp and GC content of 41.72%. A total length of 179281bp and 25734 bp was obtained for plastid and mitochondrial respectively. Genomic data allows a clear understanding of the genomic constituent of an organism and is valuable as foundation information for studies of individual genes and resolving the evolutionary relationships between organisms including Rhodophytes and other seaweeds.Keywords: Gelidium pristoides, genome, genome sequencing and assembly, Ion S5 sequencing platform
Procedia PDF Downloads 1509014 Isolate-Specific Variations among Clinical Isolates of Brucella Identified by Whole-Genome Sequencing, Bioinformatics and Comparative Genomics
Authors: Abu S. Mustafa, Mohammad W. Khan, Faraz Shaheed Khan, Nazima Habibi
Abstract:
Brucellosis is a zoonotic disease of worldwide prevalence. There are at least four species and several strains of Brucella that cause human disease. Brucella genomes have very limited variation across strains, which hinder strain identification using classical molecular techniques, including PCR and 16 S rDNA sequencing. The aim of this study was to perform whole genome sequencing of clinical isolates of Brucella and perform bioinformatics and comparative genomics analyses to determine the existence of genetic differences across the isolates of a single Brucella species and strain. The draft sequence data were generated from 15 clinical isolates of Brucella melitensis (biovar 2 strain 63/9) using MiSeq next generation sequencing platform. The generated reads were used for further assembly and analysis. All the analysis was performed using Bioinformatics work station (8 core i7 processor, 8GB RAM with Bio-Linux operating system). FastQC was used to determine the quality of reads and low quality reads were trimmed or eliminated using Fastx_trimmer. Assembly was done by using Velvet and ABySS softwares. The ordering of assembled contigs was performed by Mauve. An online server RAST was employed to annotate the contigs assembly. Annotated genomes were compared using Mauve and ACT tools. The QC score for DNA sequence data, generated by MiSeq, was higher than 30 for 80% of reads with more than 100x coverage, which suggested that data could be utilized for further analysis. However when analyzed by FastQC, quality of four reads was not good enough for creating a complete genome draft so remaining 11 samples were used for further analysis. The comparative genome analyses showed that despite sharing same gene sets, single nucleotide polymorphisms and insertions/deletions existed across different genomes, which provided a variable extent of diversity to these bacteria. In conclusion, the next generation sequencing, bioinformatics, and comparative genome analysis can be utilized to find variations (point mutations, insertions and deletions) across different genomes of Brucella within a single strain. This information could be useful in surveillance and epidemiological studies supported by Kuwait University Research Sector grants MI04/15 and SRUL02/13.Keywords: brucella, bioinformatics, comparative genomics, whole genome sequencing
Procedia PDF Downloads 3839013 The Cleavage of DNA by the Anti-Tumor Drug Bleomycin at the Transcription Start Sites of Human Genes Using Genome-Wide Techniques
Authors: Vincent Murray
Abstract:
The glycopeptide bleomycin is used in the treatment of testicular cancer, Hodgkin's lymphoma, and squamous cell carcinoma. Bleomycin damages and cleaves DNA in human cells, and this is considered to be the main mode of action for bleomycin's anti-tumor activity. In particular, double-strand breaks are thought to be the main mechanism for the cellular toxicity of bleomycin. Using Illumina next-generation DNA sequencing techniques, the genome-wide sequence specificity of bleomycin-induced double-strand breaks was determined in human cells. The degree of bleomycin cleavage was also assessed at the transcription start sites (TSSs) of actively transcribed genes and compared with non-transcribed genes. It was observed that bleomycin preferentially cleaved at the TSSs of actively transcribed human genes. There was a correlation between the degree of this enhanced cleavage at TSSs and the level of transcriptional activity. Bleomycin cleavage is also affected by chromatin structure and at TSSs, the peaks of bleomycin cleavage were approximately 200 bp apart. This indicated that bleomycin was able to detect phased nucleosomes at the TSSs of actively transcribed human genes. The genome-wide cleavage pattern of the bleomycin analogues 6′-deoxy-BLM Z and zorbamycin was also investigated in human cells. As found for bleomycin, these bleomycin analogues also preferentially cleaved at the TSSs of actively transcribed human genes. The cytotoxicity (IC₅₀ values) of these bleomycin analogues was determined. It was found that the degree of enhanced cleavage at TSSs was inversely correlated with the IC₅₀ values of the bleomycin analogues. This suggested that the level of cleavage at the TSSs of actively transcribed human genes was important for the cytotoxicity of bleomycin and analogues. Hence this study provided a deeper understanding of the cellular processes involved in the cancer chemotherapeutic activity of bleomycin.Keywords: anti-tumour activity, bleomycin analogues, chromatin structure, genome-wide study, Illumina DNA sequencing
Procedia PDF Downloads 1209012 Genome Sequencing of the Yeast Saccharomyces cerevisiae Strain 202-3
Authors: Yina A. Cifuentes Triana, Andrés M. Pinzón Velásco, Marío E. Velásquez Lozano
Abstract:
In this work the sequencing and genome characterization of a natural isolate of Saccharomyces cerevisiae yeast (strain 202-3), identified with potential for the production of second generation ethanol from sugarcane bagasse hydrolysates is presented. This strain was selected because its capability to consume xylose during the fermentation of sugarcane bagasse hydrolysates, taking into account that many strains of S. cerevisiae are incapable of processing this sugar. This advantage and other prominent positive aspects during fermentation profiles evaluated in bagasse hydrolysates made the strain 202-3 a candidate strain to improve the production of second-generation ethanol, which was proposed as a first step to study the strain at the genomic level. The molecular characterization was carried out by genome sequencing with the Illumina HiSeq 2000 platform paired end; the assembly was performed with different programs, finally choosing the assembler ABYSS with kmer 89. Gene prediction was developed with the approach of hidden Markov models with Augustus. The genes identified were scored based on similarity with public databases of nucleotide and protein. Records were organized from ontological functions at different hierarchical levels, which identified central metabolic functions and roles of the S. cerevisiae strain 202-3, highlighting the presence of four possible new proteins, two of them probably associated with the positive consumption of xylose.Keywords: cellulosic ethanol, Saccharomyces cerevisiae, genome sequencing, xylose consumption
Procedia PDF Downloads 3209011 Molecular-Genetics Studies of New Unknown APMV Isolated from Wild Bird in Ukraine
Authors: Borys Stegniy, Anton Gerilovych, Oleksii Solodiankin, Vitaliy Bolotin, Anton Stegniy, Denys Muzyka, Claudio Afonso
Abstract:
New APMV was isolated from white fronted goose in Ukraine. This isolate was tested serologically using monoclonal antibodies in haemagglutination-inhibition tests against APMV1-9. As the results obtained isolate showed cross reactions with APMV7. Following investigations were provided for the full genome sequencing using random primers and cloning into pCRII-TOPO. Analysis of 100 transformed colonies of E.coli using traditional sequencing gave us possibilities to find only 3 regions, which could identify by BLAST. The first region with the length of 367 bp had 70 % nucleotide sequence identity to the APMV 12 isolate Wigeon/Italy/3920_1/2005 at genome position 2419-2784. Next region (344 bp) had 66 % identity to the same APMV 12 isolate at position 4760-5103. The last region (365 bp) showed 71 % identity to Newcastle disease virus strain M4 at position 12569-12928.Keywords: APMV, Newcastle disease virus, Ukraine, full genome sequencing
Procedia PDF Downloads 4429010 Implementation of CNV-CH Algorithm Using Map-Reduce Approach
Authors: Aishik Deb, Rituparna Sinha
Abstract:
We have developed an algorithm to detect the abnormal segment/"structural variation in the genome across a number of samples. We have worked on simulated as well as real data from the BAM Files and have designed a segmentation algorithm where abnormal segments are detected. This algorithm aims to improve the accuracy and performance of the existing CNV-CH algorithm. The next-generation sequencing (NGS) approach is very fast and can generate large sequences in a reasonable time. So the huge volume of sequence information gives rise to the need for Big Data and parallel approaches of segmentation. Therefore, we have designed a map-reduce approach for the existing CNV-CH algorithm where a large amount of sequence data can be segmented and structural variations in the human genome can be detected. We have compared the efficiency of the traditional and map-reduce algorithms with respect to precision, sensitivity, and F-Score. The advantages of using our algorithm are that it is fast and has better accuracy. This algorithm can be applied to detect structural variations within a genome, which in turn can be used to detect various genetic disorders such as cancer, etc. The defects may be caused by new mutations or changes to the DNA and generally result in abnormally high or low base coverage and quantification values.Keywords: cancer detection, convex hull segmentation, map reduce, next generation sequencing
Procedia PDF Downloads 1369009 Genodata: The Human Genome Variation Using BigData
Authors: Surabhi Maiti, Prajakta Tamhankar, Prachi Uttam Mehta
Abstract:
Since the accomplishment of the Human Genome Project, there has been an unparalled escalation in the sequencing of genomic data. This project has been the first major vault in the field of medical research, especially in genomics. This project won accolades by using a concept called Bigdata which was earlier, extensively used to gain value for business. Bigdata makes use of data sets which are generally in the form of files of size terabytes, petabytes, or exabytes and these data sets were traditionally used and managed using excel sheets and RDBMS. The voluminous data made the process tedious and time consuming and hence a stronger framework called Hadoop was introduced in the field of genetic sciences to make data processing faster and efficient. This paper focuses on using SPARK which is gaining momentum with the advancement of BigData technologies. Cloud Storage is an effective medium for storage of large data sets which is generated from the genetic research and the resultant sets produced from SPARK analysis.Keywords: human genome project, Bigdata, genomic data, SPARK, cloud storage, Hadoop
Procedia PDF Downloads 2599008 Genetic Diversity and Discovery of Unique SNPs in Five Country Cultivars of Sesamum indicum by Next-Generation Sequencing
Authors: Nam-Kuk Kim, Jin Kim, Soomin Park, Changhee Lee, Mijin Chu, Seong-Hun Lee
Abstract:
In this study, we conducted whole genome re-sequencing of 10 cultivars originated from five countries including Korea, China, India, Pakistan and Ethiopia with Sesamum indicum (Zhongzho No. 13) genome as a reference. Almost 80% of the whole genome sequences of the reference genome could be covered by sequenced reads. Numerous SNP and InDel were detected by bioinformatic analysis. Among these variants, 266,051 SNPs were identified as unique to countries. Pakistan and Ethiopia had high densities of SNPs compared to other countries. Three main clusters (cluster 1: Korea, cluster 2: Pakistan and India, cluster 3: Ethiopia and China) were recovered by neighbor-joining analysis using all variants. Interestingly, some variants were detected in DGAT1 (diacylglycerol O-acyltransferase 1) and FADS (fatty acid desaturase) genes, which are known to be related with fatty acid synthesis and metabolism. These results can provide useful information to understand the regional characteristics and develop DNA markers for origin discrimination of sesame.Keywords: Sesamum indicum, NGS, SNP, DNA marker
Procedia PDF Downloads 3279007 Genomic Diversity and Relationship among Arabian Peninsula Dromedary Camels Using Full Genome Sequencing Approach
Authors: H. Bahbahani, H. Musa, F. Al Mathen
Abstract:
The dromedary camels (Camelus dromedarius) are single-humped even-toed ungulates populating the African Sahara, Arabian Peninsula, and Southwest Asia. The genome of this desert-adapted species has been minimally investigated using autosomal microsatellite and mitochondrial DNA markers. In this study, the genomes of 33 dromedary camel samples from different parts of the Arabian Peninsula were sequenced using Illumina Next Generation Sequencing (NGS) platform. These data were combined with Genotyping-by-Sequencing (GBS) data from African (Sudanese) dromedaries to investigate the genomic relationship between African and Arabian Peninsula dromedary camels. Principle Component Analysis (PCA) and average genome-wide admixture analysis were be conducted on these data to tackle the objectives of these studies. Both of the two analyses conducted revealed phylogeographic distinction between these two camel populations. However, no breed-wise genetic classification has been revealed among the African (Sudanese) camel breeds. The Arabian Peninsula camel populations also show higher heterozygosity than the Sudanese camels. The results of this study explain the evolutionary history and migration of African dromedary camels from their center of domestication in the southern Arabian Peninsula. These outputs help scientists to further understand the evolutionary history of dromedary camels, which might impact in conserving the favorable genetic of this species.Keywords: dromedary, genotyping-by-sequencing, Arabian Peninsula, Sudan
Procedia PDF Downloads 2069006 Whole Exome Sequencing Data Analysis of Rare Diseases: Non-Coding Variants and Copy Number Variations
Authors: S. Fahiminiya, J. Nadaf, F. Rauch, L. Jerome-Majewska, J. Majewski
Abstract:
Background: Sequencing of protein coding regions of human genome (Whole Exome Sequencing; WES), has demonstrated a great success in the identification of causal mutations for several rare genetic disorders in human. Generally, most of WES studies have focused on rare variants in coding exons and splicing-sites where missense substitutions lead to the alternation of protein product. Although focusing on this category of variants has revealed the mystery behind many inherited genetic diseases in recent years, a subset of them remained still inconclusive. Here, we present the result of our WES studies where analyzing only rare variants in coding regions was not conclusive but further investigation revealed the involvement of non-coding variants and copy number variations (CNV) in etiology of the diseases. Methods: Whole exome sequencing was performed using our standard protocols at Genome Quebec Innovation Center, Montreal, Canada. All bioinformatics analyses were done using in-house WES pipeline. Results: To date, we successfully identified several disease causing mutations within gene coding regions (e.g. SCARF2: Van den Ende-Gupta syndrome and SNAP29: 22q11.2 deletion syndrome) by using WES. In addition, we showed that variants in non-coding regions and CNV have also important value and should not be ignored and/or filtered out along the way of bioinformatics analysis on WES data. For instance, in patients with osteogenesis imperfecta type V and in patients with glucocorticoid deficiency, we identified variants in 5'UTR, resulting in the production of longer or truncating non-functional proteins. Furthermore, CNVs were identified as the main cause of the diseases in patients with metaphyseal dysplasia with maxillary hypoplasia and brachydactyly and in patients with osteogenesis imperfecta type VII. Conclusions: Our study highlights the importance of considering non-coding variants and CNVs during interpretation of WES data, as they can be the only cause of disease under investigation.Keywords: whole exome sequencing data, non-coding variants, copy number variations, rare diseases
Procedia PDF Downloads 4199005 Phenotype Prediction of DNA Sequence Data: A Machine and Statistical Learning Approach
Authors: Mpho Mokoatle, Darlington Mapiye, James Mashiyane, Stephanie Muller, Gciniwe Dlamini
Abstract:
Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on $k$-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0%, 80.5%, 80.5%, 63.6%, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanisms.Keywords: AWD-LSTM, bootstrapping, k-mers, next generation sequencing
Procedia PDF Downloads 1679004 Phenotype Prediction of DNA Sequence Data: A Machine and Statistical Learning Approach
Authors: Darlington Mapiye, Mpho Mokoatle, James Mashiyane, Stephanie Muller, Gciniwe Dlamini
Abstract:
Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on k-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0 %, 80.5 %, 80.5 %, 63.6 %, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanismsKeywords: AWD-LSTM, bootstrapping, k-mers, next generation sequencing
Procedia PDF Downloads 1599003 Efficient Reuse of Exome Sequencing Data for Copy Number Variation Callings
Authors: Chen Wang, Jared Evans, Yan Asmann
Abstract:
With the quick evolvement of next-generation sequencing techniques, whole-exome or exome-panel data have become a cost-effective way for detection of small exonic mutations, but there has been a growing desire to accurately detect copy number variations (CNVs) as well. In order to address this research and clinical needs, we developed a sequencing coverage pattern-based method not only for copy number detections, data integrity checks, CNV calling, and visualization reports. The developed methodologies include complete automation to increase usability, genome content-coverage bias correction, CNV segmentation, data quality reports, and publication quality images. Automatic identification and removal of poor quality outlier samples were made automatically. Multiple experimental batches were routinely detected and further reduced for a clean subset of samples before analysis. Algorithm improvements were also made to improve somatic CNV detection as well as germline CNV detection in trio family. Additionally, a set of utilities was included to facilitate users for producing CNV plots in focused genes of interest. We demonstrate the somatic CNV enhancements by accurately detecting CNVs in whole exome-wide data from the cancer genome atlas cancer samples and a lymphoma case study with paired tumor and normal samples. We also showed our efficient reuses of existing exome sequencing data, for improved germline CNV calling in a family of the trio from the phase-III study of 1000 Genome to detect CNVs with various modes of inheritance. The performance of the developed method is evaluated by comparing CNV calling results with results from other orthogonal copy number platforms. Through our case studies, reuses of exome sequencing data for calling CNVs have several noticeable functionalities, including a better quality control for exome sequencing data, improved joint analysis with single nucleotide variant calls, and novel genomic discovery of under-utilized existing whole exome and custom exome panel data.Keywords: bioinformatics, computational genetics, copy number variations, data reuse, exome sequencing, next generation sequencing
Procedia PDF Downloads 2579002 Genome Sequencing of Infectious Bronchitis Virus QX-Like Strain Isolated in Malaysia
Authors: M. Suwaibah, S. W. Tan, I. Aiini, K. Yusoff, A. R. Omar
Abstract:
Respiratory diseases are the most important infectious diseases affecting poultry worldwide. One of the avian respiratory virus of global importance causing significant economic losses is Infectious Bronchitis Virus (IBV). The virus causes a wide spectrum disease known as Infectious Bronchitis (IB), affecting not only the respiratory system but also the kidney and the reproductive system, depending on its strain. IB and Newcastle disease are two of the most prevalent diseases affecting poultry in Malaysia. However, a study on the molecular characterization of Malaysian IBV is lacking. In this study, an IBV strain IBS130 which was isolated in 2015 was fully sequenced using next-gene sequencing approach. Sequence analysis of IBS130 based on the complete genome, polyprotein 1ab and S1 genes were compared with other IBV sequences available in Genbank, National Center for Biotechnology Information (NCBI). IBV strain IBS130 is characterised as QX-like strain based on whole genome and S1 gene sequence analysis. Comparisons of the virus with other IBV strains showed that the nucleotide identity ranged from 67% to 99.2%, depending on the region analysed. The similarity in whole genome nucleotide ranging from 84.9% to 90.7% with the least similar was from Singapore strains (84.9%) and highly similar with China QX-like strains. Meanwhile, the similarity in polyprotein 1ab ranging from 85.3% to 89.9% with the least similar to Singapore strains (85.3%) and highly similar with Mass strains from USA.Keywords: infectious bronchitis virus, phylogenetic analysis, chicken, Malaysia
Procedia PDF Downloads 1879001 Wide Dissemination of CTX-M-Type Extended-Spectrum β-Lactamases in Korean Swine Farms
Authors: Young Ah Kim, Hyunsoo Kim, Eun-Jeong Yoon, Young Hee Seo, Kyungwon Lee
Abstract:
Extended-spectrum β-lactamase (ESBL)-producing Escherichia coli from food animals are considered as a reservoir for transmission of ESBL genes to human. The aim of this study is to assess the prevalence and molecular epidemiology of ESBL-producing E. coli colonization in pigs, farm workers, and farm environments to elucidate the transmission of multidrug-resistant clones from animal to human. Nineteen pig farms were enrolled across the country in Korea from August to December 2017. ESBL-producing E. coli isolates were detected in 190 pigs, 38 farm workers, and 112 sites of farm environments using ChromID ESBL (bioMerieux, Marcy l'Etoile, France), directly (stool or perirectal swab) or after enrichment (sewage). Antimicrobial susceptibility tests were done with disk diffusion methods and blaTEM, blaSHV, and blaCTX-M were detected with PCR and sequencing. The genomes of the four CTX-M-55-producing E. coli isolates from various sources in one farm were entirely sequenced to assess the relatedness of the strains. Whole genome sequencing (WGS) was performed with PacBio RS II system (Pacific Biosciences, Menlo Park, CA, USA). ESBL genotypes were 85 CTX-M-1 group (one CTX-M-3, 23 CTX-M-15, one CTX-M-28, 59 CTX-M-55, one CTX-M-69) and 60 CTX-M-9 group (41 CTX-M-14, one CTX-M-17, one CTX-M-27, 13 CTX-M-65, 4 CTX-M-102) in total 145 isolates. The rectal colonization rates were 53.2% (101/190) in pigs and 39.5% (15/38) in farm workers. In WGS, sequence types (STs) were determined as ST69 (E. coli PJFH115 isolate from a human carrier), ST457 (two E. coli isolates PJFE101 recovered from a fence and PJFA1104 from a pig) and ST5899 (E. coli PJFA173 isolate from the other pig). The four plasmids encoding CTX-M-55 (88,456 to 149, 674 base pair), whether it belonged to IncFIB or IncFIC-IncFIB type, shared IncF backbone furnishing the conjugal elements, suggesting of genes originated from same ancestor. In conclusion, the prevalence of ESBL-producing E. coli in swine farms was surprisingly high, and many of them shared common ESBL genotypes of clinical isolates such as CTX-M-14, 15, and 55 in Korea. It could spread by horizontal transfer between isolates from different reservoirs (human-animal-environment).Keywords: Escherichia coli, extended-spectrum β-lactamase, prevalence, whole genome sequencing
Procedia PDF Downloads 2039000 Genomic Adaptation to Local Climate Conditions in Native Cattle Using Whole Genome Sequencing Data
Authors: Rugang Tian
Abstract:
In this study, we generated whole-genome sequence (WGS) data from110 native cattle. Together with whole-genome sequences from world-wide cattle populations, we estimated the genetic diversity and population genetic structure of different cattle populations. Our findings revealed clustering of cattle groups in line with their geographic locations. We identified noticeable genetic diversity between indigenous cattle breeds and commercial populations. Among all studied cattle groups, lower genetic diversity measures were found in commercial populations, however, high genetic diversity were detected in some local cattle, particularly in Rashoki and Mongolian breeds. Our search for potential genomic regions under selection in native cattle revealed several candidate genes related with immune response and cold shock protein on multiple chromosomes such as TRPM8, NMUR1, PRKAA2, SMTNL2 and OXR1 that are involved in energy metabolism and metabolic homeostasis.Keywords: cattle, whole-genome, population structure, adaptation
Procedia PDF Downloads 758999 Genome Analyses of Pseudomonas Fluorescens b29b from Coastal Kerala
Authors: Wael Ali Mohammed Hadi
Abstract:
Pseudomonas fluorescens B29B, which has asparaginase enzymatic activity, was isolated from the surface coastal seawater of Trivandrum, India. We report the complete Pseudomonas fluorescens B29B genome sequenced, identified, and annotated from a marine source. We find the genome at most minuscule a 7,331,508 bp single circular chromosome with a GC content of 62.19% and 6883 protein-coding genes. Three hundred forty subsystems were identified, including two predicted asparaginases from the genome analysis of P. fluorescens B29B for further investigation. This genome data will help further industrial biotechnology applications of proteins in general and asparaginase as a target.Keywords: pseudomonas, marine, asparaginases, Kerala, whole-genome
Procedia PDF Downloads 2158998 Computing the Similarity and the Diversity in the Species Based on Cronobacter Genome
Authors: E. Al Daoud
Abstract:
The purpose of computing the similarity and the diversity in the species is to trace the process of evolution and to find the relationship between the species and discover the unique, the special, the common and the universal proteins. The proteins of the whole genome of 40 species are compared with the cronobacter genome which is used as reference genome. More than 3 billion pairwise alignments are performed using blastp. Several findings are introduced in this study, for example, we found 172 proteins in cronobacter genome which have insignificant hits in other species, 116 significant proteins in the all tested species with very high score value and 129 common proteins in the plants but have insignificant hits in mammals, birds, fishes, and insects.Keywords: genome, species, blastp, conserved genes, Cronobacter
Procedia PDF Downloads 4978997 Massively Parallel Sequencing Improved Resolution for Paternity Testing
Authors: Xueying Zhao, Ke Ma, Hui Li, Yu Cao, Fan Yang, Qingwen Xu, Wenbin Liu
Abstract:
Massively parallel sequencing (MPS) technologies allow high-throughput sequencing analyses with a relatively affordable price and have gradually been applied to forensic casework. MPS technology identifies short tandem repeat (STR) loci based on sequence so that repeat motif variation within STRs can be detected, which may help one to infer the origin of the mutation in some cases. Here, we report on one case with one three-step mismatch (D18S51) in family trios based on both capillary electrophoresis (CE) and MPS typing. The alleles of the alleged father (AF) are [AGAA]₁₇AGAG[AGAA]₃ and [AGAA]₁₅. The mother’s alleles are [AGAA]₁₉ and [AGAA]₉AGGA[AGAA]₃. The questioned child’s (QC) alleles are [AGAA]₁₉ and [AGAA]₁₂. Given that the sequence variants in repeat regions of AF and mother are not observed in QC’s alleles, the QC’s allele [AGAA]₁₂ was likely inherited from the AF’s allele [AGAA]₁₅ by loss of three repeat [AGAA]. Besides, two new alleles of D18S51 in this study, [AGAA]₁₇AGAG[AGAA]₃ and [AGAA]₉AGGA[AGAA]₃, have not been reported before. All the results in this study were verified using Sanger-type sequencing. In summary, the MPS typing method can offer valuable information for forensic genetics research and play a promising role in paternity testing.Keywords: family trios analysis, forensic casework, ion torrent personal genome machine (PGM), massively parallel sequencing (MPS)
Procedia PDF Downloads 3028996 DeepOmics: Deep Learning for Understanding Genome Functioning and the Underlying Genetic Causes of Disease
Authors: Vishnu Pratap Singh Kirar, Madhuri Saxena
Abstract:
Advancement in sequence data generation technologies is churning out voluminous omics data and posing a massive challenge to annotate the biological functional features. With so much data available, the use of machine learning methods and tools to make novel inferences has become obvious. Machine learning methods have been successfully applied to a lot of disciplines, including computational biology and bioinformatics. Researchers in computational biology are interested to develop novel machine learning frameworks to classify the huge amounts of biological data. In this proposal, it plan to employ novel machine learning approaches to aid the understanding of how apparently innocuous mutations (in intergenic DNA and at synonymous sites) cause diseases. We are also interested in discovering novel functional sites in the genome and mutations in which can affect a phenotype of interest.Keywords: genome wide association studies (GWAS), next generation sequencing (NGS), deep learning, omics
Procedia PDF Downloads 988995 Accurate HLA Typing at High-Digit Resolution from NGS Data
Authors: Yazhi Huang, Jing Yang, Dingge Ying, Yan Zhang, Vorasuk Shotelersuk, Nattiya Hirankarn, Pak Chung Sham, Yu Lung Lau, Wanling Yang
Abstract:
Human leukocyte antigen (HLA) typing from next generation sequencing (NGS) data has the potential for applications in clinical laboratories and population genetic studies. Here we introduce a novel technique for HLA typing from NGS data based on read-mapping using a comprehensive reference panel containing all known HLA alleles and de novo assembly of the gene-specific short reads. An accurate HLA typing at high-digit resolution was achieved when it was tested on publicly available NGS data, outperforming other newly-developed tools such as HLAminer and PHLAT.Keywords: human leukocyte antigens, next generation sequencing, whole exome sequencing, HLA typing
Procedia PDF Downloads 6658994 Scalable and Accurate Detection of Pathogens from Whole-Genome Shotgun Sequencing
Authors: Janos Juhasz, Sandor Pongor, Balazs Ligeti
Abstract:
Next-generation sequencing, especially whole genome shotgun sequencing, is becoming a common approach to gain insight into the microbiomes in a culture-independent way, even in clinical practice. It does not only give us information about the species composition of an environmental sample but opens the possibility to detect antimicrobial resistance and novel, or currently unknown, pathogens. Accurately and reliably detecting the microbial strains is a challenging task. Here we present a sensitive approach for detecting pathogens in metagenomics samples with special regard to detecting novel variants of known pathogens. We have developed a pipeline that uses fast, short read aligner programs (i.e., Bowtie2/BWA) and comprehensive nucleotide databases. Taxonomic binning is based on the lowest common ancestor (LCA) principle; each read is assigned to a taxon, covering the most significantly hit taxa. This approach helps in balancing between sensitivity and running time. The program was tested both on experimental and synthetic data. The results implicate that our method performs as good as the state-of-the-art BLAST-based ones, furthermore, in some cases, it even proves to be better, while running two orders magnitude faster. It is sensitive and capable of identifying taxa being present only in small abundance. Moreover, it needs two orders of magnitude less reads to complete the identification than MetaPhLan2 does. We analyzed an experimental anthrax dataset (B. anthracis strain BA104). The majority of the reads (96.50%) was classified as Bacillus anthracis, a small portion, 1.2%, was classified as other species from the Bacillus genus. We demonstrate that the evaluation of high-throughput sequencing data is feasible in a reasonable time with good classification accuracy.Keywords: metagenomics, taxonomy binning, pathogens, microbiome, B. anthracis
Procedia PDF Downloads 1378993 Expression Profiling and Immunohistochemical Analysis of Squamous Cell Carcinoma of Head and Neck (Tumor, Transition Zone, Normal) by Whole Genome Scale Sequencing
Authors: Veronika Zivicova, Petr Broz, Zdenek Fik, Alzbeta Mifkova, Jan Plzak, Zdenek Cada, Herbert Kaltner, Jana Fialova Kucerova, Hans-Joachim Gabius, Karel Smetana Jr.
Abstract:
The possibility to determine genome-wide expression profiles of cells and tissues opens a new level of analysis in the quest to define dysregulation in malignancy and thus identify new tumor markers. Toward this long-term aim, we here address two issues on this level for head and neck cancer specimen: i) defining profiles in different regions, i.e. the tumor, the transition zone and normal control and ii) comparing complete data sets for seven individual patients. Special focus in the flanking immunohistochemical part is given to adhesion/growth-regulatory galectins that upregulate chemo- and cytokine expression in an NF-κB-dependent manner, to these regulators and to markers of differentiation, i.e. keratins. The detailed listing of up- and down-regulations, also available in printed form (1), not only served to unveil new candidates for testing as marker but also let the impact of the tumor in the transition zone become apparent. The extent of interindividual variation raises a strong cautionary note on assuming uniformity of regulatory events, to be noted when considering therapeutic implications. Thus, a combination of test targets (and a network analysis for galectins and their downstream effectors) is (are) advised prior to reaching conclusions on further perspectives.Keywords: galectins, genome scale sequencing, squamous cell carcinoma, transition zone
Procedia PDF Downloads 2398992 Genomic Characterisation of Equine Sarcoid-derived Bovine Papillomavirus Type 1 and 2 Using Nanopore-Based Sequencing
Authors: Lien Gysens, Bert Vanmechelen, Maarten Haspeslagh, Piet Maes, Ann Martens
Abstract:
Bovine papillomavirus (BPV) types 1 and 2 play a central role in the etiology of the most common neoplasm in horses, the equine sarcoid. The unknown mechanism behind the unique variety in a clinical presentation on the one hand and the host-dependent clinical outcome of BPV-1 infection, on the other hand, indicate the involvement of additional factors. Earlier studies have reported the potential functional significance of intratypic sequence variants, along with the existence of sarcoid-sourced BPV variants. Therefore, intratypic sequence variation seems to be an important emerging viral factor. This study aimed to give a broad insight in sarcoid-sourced BPV variation and explore its potential association with disease presentation. In order to do this, a nanopore sequencing approach was successfully optimized for screening a wide spectrum of clinical samples. Specimens of each tumour were initially screened for BPV-1/-2 by quantitative real-time PCR. A custom-designed primer set was used on BPV-positive samples to amplify the complete viral genome in two multiplex PCR reactions, resulting in a set of overlapping amplicons. For phylogenetic analysis, separate alignments were made of all available complete genome sequences for BPV-1/-2. The resulting alignments were used to infer Bayesian phylogenetic trees. We found substantial genetic variation among sarcoid-derived BPV-1, although this variation could not be linked to disease severity. Several of the BPV-1 genomes had multiple major deletions. Remarkably, the majority of the cluster within the region coding for late viral genes. Together with the extensiveness (up to 603 nucleotides) of the described deletions, this suggests an altered function of L1/L2 in disease pathogenesis. By generating a significant amount of complete-length BPV genomes, we succeeded in introducing next-generation sequencing into veterinary research focusing on the equine sarcoid, thus facilitating the first report of both nanopore-based sequencing of complete sarcoid-sourced BPV-1/-2 and the simultaneous nanopore sequencing of multiple complete genomes originating from a single clinical sample.Keywords: Bovine papillomavirus, equine sarcoid, horse, nanopore sequencing, phylogenetic analysis
Procedia PDF Downloads 1798991 Towards End-To-End Disease Prediction from Raw Metagenomic Data
Authors: Maxence Queyrel, Edi Prifti, Alexandre Templier, Jean-Daniel Zucker
Abstract:
Analysis of the human microbiome using metagenomic sequencing data has demonstrated high ability in discriminating various human diseases. Raw metagenomic sequencing data require multiple complex and computationally heavy bioinformatics steps prior to data analysis. Such data contain millions of short sequences read from the fragmented DNA sequences and stored as fastq files. Conventional processing pipelines consist in multiple steps including quality control, filtering, alignment of sequences against genomic catalogs (genes, species, taxonomic levels, functional pathways, etc.). These pipelines are complex to use, time consuming and rely on a large number of parameters that often provide variability and impact the estimation of the microbiome elements. Training Deep Neural Networks directly from raw sequencing data is a promising approach to bypass some of the challenges associated with mainstream bioinformatics pipelines. Most of these methods use the concept of word and sentence embeddings that create a meaningful and numerical representation of DNA sequences, while extracting features and reducing the dimensionality of the data. In this paper we present an end-to-end approach that classifies patients into disease groups directly from raw metagenomic reads: metagenome2vec. This approach is composed of four steps (i) generating a vocabulary of k-mers and learning their numerical embeddings; (ii) learning DNA sequence (read) embeddings; (iii) identifying the genome from which the sequence is most likely to come and (iv) training a multiple instance learning classifier which predicts the phenotype based on the vector representation of the raw data. An attention mechanism is applied in the network so that the model can be interpreted, assigning a weight to the influence of the prediction for each genome. Using two public real-life data-sets as well a simulated one, we demonstrated that this original approach reaches high performance, comparable with the state-of-the-art methods applied directly on processed data though mainstream bioinformatics workflows. These results are encouraging for this proof of concept work. We believe that with further dedication, the DNN models have the potential to surpass mainstream bioinformatics workflows in disease classification tasks.Keywords: deep learning, disease prediction, end-to-end machine learning, metagenomics, multiple instance learning, precision medicine
Procedia PDF Downloads 1258990 Analysis of Endogenous Sirevirus in Germinating Barley (Hordeum vulgare L.)
Authors: Nermin Gozukirmizi, Buket Cakmak, Sevgi Marakli
Abstract:
Sireviruses are genera of copia LTR retrotransposons with a unique genome structure among retrotransposons. Barley (Hordeum vulgare L.) is an economically important plant and has been studied as a model plant regarding its short annual life cycle and seven chromosome pairs. In this study, we used mature barley embryos, 10-day-old roots and 10-day-old leaves derived from the same barley plant to investigate SIRE1 retrotransposon movements by Inter-Retrotransposon Amplified Polymorphism (IRAP) technique. We found polymorphism rates between 0-64% among embryos, roots and leaves. Polymorphism rates were detected to be 0-27% among embryos, 8-60% among roots, and 11-50% among leaves. Polymorphisms were observed not only among the parts of different individuals, but also on the parts of the same plant (23-64%). The internal domains of SIRE1 (gag, env and rt) were also analyzed in the embryos, roots and leaves. Analysis of band profiles showed no polymorphism for gag, however, different band patterns were observed among samples for rt and env. The sequencing of SIRE1 gag, env and rt domains revealed 79% similarity for gag, 95% for env and 84% for rt to Ty1-copia retrotransposons. SIRE1 retrotransposon was identified in the soybean genome and has been studied on other plants (maize, rice, tomatoe etc.). This study is the first detailed investigation of SIRE1 in barley genome. The obtained findings are expected to contribute to the comprehension of SIRE1 retrotransposon and its role in barley genome.Keywords: barley, polymorphism, retrotransposon, SIRE1 virus
Procedia PDF Downloads 308