Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 4562

Search results for: small whole genome sequencing

4562 The Role and Importance of Genome Sequencing in Prediction of Cancer Risk

Authors: M. Sadeghi, H. Pezeshk, R. Tusserkani, A. Sharifi Zarchi, A. Malekpour, M. Foroughmand, S. Goliaei, M. Totonchi, N. Ansari–Pour

Abstract:

The role and relative importance of intrinsic and extrinsic factors in the development of complex diseases such as cancer still remains a controversial issue. Determining the amount of variation explained by these factors needs experimental data and statistical models. These models are nevertheless based on the occurrence and accumulation of random mutational events during stem cell division, thus rendering cancer development a stochastic outcome. We demonstrate that not only individual genome sequencing is uninformative in determining cancer risk, but also assigning a unique genome sequence to any given individual (healthy or affected) is not meaningful. Current whole-genome sequencing approaches are therefore unlikely to realize the promise of personalized medicine. In conclusion, since genome sequence differs from cell to cell and changes over time, it seems that determining the risk factor of complex diseases based on genome sequence is somewhat unrealistic, and therefore, the resulting data are likely to be inherently uninformative.

Keywords: cancer risk, extrinsic factors, genome sequencing, intrinsic factors

Procedia PDF Downloads 191
4561 Genomics of Adaptation in the Sea

Authors: Agostinho Antunes

Abstract:

The completion of the human genome sequencing in 2003 opened a new perspective into the importance of whole genome sequencing projects, and currently multiple species are having their genomes completed sequenced, from simple organisms, such as bacteria, to more complex taxa, such as mammals. This voluminous sequencing data generated across multiple organisms provides also the framework to better understand the genetic makeup of such species and related ones, allowing to explore the genetic changes underlining the evolution of diverse phenotypic traits. Here, recent results from our group retrieved from comparative evolutionary genomic analyses of selected marine animal species will be considered to exemplify how gene novelty and gene enhancement by positive selection might have been determinant in the success of adaptive radiations into diverse habitats and lifestyles.

Keywords: marine genomics, evolutionary bioinformatics, human genome sequencing, genomic analyses

Procedia PDF Downloads 532
4560 Genomics of Aquatic Adaptation

Authors: Agostinho Antunes

Abstract:

The completion of the human genome sequencing in 2003 opened a new perspective into the importance of whole genome sequencing projects, and currently multiple species are having their genomes completed sequenced, from simple organisms, such as bacteria, to more complex taxa, such as mammals. This voluminous sequencing data generated across multiple organisms provides also the framework to better understand the genetic makeup of such species and related ones, allowing to explore the genetic changes underlining the evolution of diverse phenotypic traits. Here, recent results from our group retrieved from comparative evolutionary genomic analyses of selected marine animal species will be considered to exemplify how gene novelty and gene enhancement by positive selection might have been determinant in the success of adaptive radiations into diverse habitats and lifestyles.

Keywords: comparative genomics, adaptive evolution, bioinformatics, phylogenetics, genome mining

Procedia PDF Downloads 455
4559 Genome Sequencing, Assembly and Annotation of Gelidium Pristoides from Kenton-on-Sea, South Africa

Authors: Sandisiwe Mangali, Graeme Bradley

Abstract:

Genome is complete set of the organism's hereditary information encoded as either deoxyribonucleic acid or ribonucleic acid in most viruses. The three different types of genomes are nuclear, mitochondrial and the plastid genome and their sequences which are uncovered by genome sequencing are known as an archive for all genetic information and enable researchers to understand the composition of a genome, regulation of gene expression and also provide information on how the whole genome works. These sequences enable researchers to explore the population structure, genetic variations, and recent demographic events in threatened species. Particularly, genome sequencing refers to a process of figuring out the exact arrangement of the basic nucleotide bases of a genome and the process through which all the afore-mentioned genomes are sequenced is referred to as whole or complete genome sequencing. Gelidium pristoides is South African endemic Rhodophyta species which has been harvested in the Eastern Cape since the 1950s for its high economic value which is one motivation for its sequencing. Its endemism further motivates its sequencing for conservation biology as endemic species are more vulnerable to anthropogenic activities endangering a species. As sequencing, mapping and annotating the Gelidium pristoides genome is the aim of this study. To accomplish this aim, the genomic DNA was extracted and quantified using the Nucleospin Plank Kit, Qubit 2.0 and Nanodrop. Thereafter, the Ion Plus Fragment Library was used for preparation of a 600bp library which was then sequenced through the Ion S5 sequencing platform for two runs. The produced reads were then quality-controlled and assembled through the SPAdes assembler with default parameters and the genome assembly was quality assessed through the QUAST software. From this assembly, the plastid and the mitochondrial genomes were then sampled out using Gelidiales organellar genomes as search queries and ordered according to them using the Geneious software. The Qubit and the Nanodrop instruments revealed an A260/A280 and A230/A260 values of 1.81 and 1.52 respectively. A total of 30792074 reads were obtained and produced a total of 94140 contigs with resulted into a sequence length of 217.06 Mbp with N50 value of 3072 bp and GC content of 41.72%. A total length of 179281bp and 25734 bp was obtained for plastid and mitochondrial respectively. Genomic data allows a clear understanding of the genomic constituent of an organism and is valuable as foundation information for studies of individual genes and resolving the evolutionary relationships between organisms including Rhodophytes and other seaweeds.

Keywords: Gelidium pristoides, genome, genome sequencing and assembly, Ion S5 sequencing platform

Procedia PDF Downloads 79
4558 Evolutionary Genomic Analysis of Adaptation Genomics

Authors: Agostinho Antunes

Abstract:

The completion of the human genome sequencing in 2003 opened a new perspective into the importance of whole genome sequencing projects, and currently multiple species are having their genomes completed sequenced, from simple organisms, such as bacteria, to more complex taxa, such as mammals. This voluminous sequencing data generated across multiple organisms provides also the framework to better understand the genetic makeup of such species and related ones, allowing to explore the genetic changes underlining the evolution of diverse phenotypic traits. Here, recent results from our group retrieved from comparative evolutionary genomic analyses of varied species will be considered to exemplify how gene novelty and gene enhancement by positive selection might have been determinant in the success of adaptive radiations into diverse habitats and lifestyles.

Keywords: adaptation, animals, evolution, genomics

Procedia PDF Downloads 352
4557 Genome Sequencing of the Yeast Saccharomyces cerevisiae Strain 202-3

Authors: Yina A. Cifuentes Triana, Andrés M. Pinzón Velásco, Marío E. Velásquez Lozano

Abstract:

In this work the sequencing and genome characterization of a natural isolate of Saccharomyces cerevisiae yeast (strain 202-3), identified with potential for the production of second generation ethanol from sugarcane bagasse hydrolysates is presented. This strain was selected because its capability to consume xylose during the fermentation of sugarcane bagasse hydrolysates, taking into account that many strains of S. cerevisiae are incapable of processing this sugar. This advantage and other prominent positive aspects during fermentation profiles evaluated in bagasse hydrolysates made the strain 202-3 a candidate strain to improve the production of second-generation ethanol, which was proposed as a first step to study the strain at the genomic level. The molecular characterization was carried out by genome sequencing with the Illumina HiSeq 2000 platform paired end; the assembly was performed with different programs, finally choosing the assembler ABYSS with kmer 89. Gene prediction was developed with the approach of hidden Markov models with Augustus. The genes identified were scored based on similarity with public databases of nucleotide and protein. Records were organized from ontological functions at different hierarchical levels, which identified central metabolic functions and roles of the S. cerevisiae strain 202-3, highlighting the presence of four possible new proteins, two of them probably associated with the positive consumption of xylose.

Keywords: cellulosic ethanol, Saccharomyces cerevisiae, genome sequencing, xylose consumption

Procedia PDF Downloads 242
4556 Scalable and Accurate Detection of Pathogens from Whole-Genome Shotgun Sequencing

Authors: Janos Juhasz, Sandor Pongor, Balazs Ligeti

Abstract:

Next-generation sequencing, especially whole genome shotgun sequencing, is becoming a common approach to gain insight into the microbiomes in a culture-independent way, even in clinical practice. It does not only give us information about the species composition of an environmental sample but opens the possibility to detect antimicrobial resistance and novel, or currently unknown, pathogens. Accurately and reliably detecting the microbial strains is a challenging task. Here we present a sensitive approach for detecting pathogens in metagenomics samples with special regard to detecting novel variants of known pathogens. We have developed a pipeline that uses fast, short read aligner programs (i.e., Bowtie2/BWA) and comprehensive nucleotide databases. Taxonomic binning is based on the lowest common ancestor (LCA) principle; each read is assigned to a taxon, covering the most significantly hit taxa. This approach helps in balancing between sensitivity and running time. The program was tested both on experimental and synthetic data. The results implicate that our method performs as good as the state-of-the-art BLAST-based ones, furthermore, in some cases, it even proves to be better, while running two orders magnitude faster. It is sensitive and capable of identifying taxa being present only in small abundance. Moreover, it needs two orders of magnitude less reads to complete the identification than MetaPhLan2 does. We analyzed an experimental anthrax dataset (B. anthracis strain BA104). The majority of the reads (96.50%) was classified as Bacillus anthracis, a small portion, 1.2%, was classified as other species from the Bacillus genus. We demonstrate that the evaluation of high-throughput sequencing data is feasible in a reasonable time with good classification accuracy.

Keywords: metagenomics, taxonomy binning, pathogens, microbiome, B. anthracis

Procedia PDF Downloads 65
4555 Molecular-Genetics Studies of New Unknown APMV Isolated from Wild Bird in Ukraine

Authors: Borys Stegniy, Anton Gerilovych, Oleksii Solodiankin, Vitaliy Bolotin, Anton Stegniy, Denys Muzyka, Claudio Afonso

Abstract:

New APMV was isolated from white fronted goose in Ukraine. This isolate was tested serologically using monoclonal antibodies in haemagglutination-inhibition tests against APMV1-9. As the results obtained isolate showed cross reactions with APMV7. Following investigations were provided for the full genome sequencing using random primers and cloning into pCRII-TOPO. Analysis of 100 transformed colonies of E.coli using traditional sequencing gave us possibilities to find only 3 regions, which could identify by BLAST. The first region with the length of 367 bp had 70 % nucleotide sequence identity to the APMV 12 isolate Wigeon/Italy/3920_1/2005 at genome position 2419-2784. Next region (344 bp) had 66 % identity to the same APMV 12 isolate at position 4760-5103. The last region (365 bp) showed 71 % identity to Newcastle disease virus strain M4 at position 12569-12928.

Keywords: APMV, Newcastle disease virus, Ukraine, full genome sequencing

Procedia PDF Downloads 336
4554 Efficient Reuse of Exome Sequencing Data for Copy Number Variation Callings

Authors: Chen Wang, Jared Evans, Yan Asmann

Abstract:

With the quick evolvement of next-generation sequencing techniques, whole-exome or exome-panel data have become a cost-effective way for detection of small exonic mutations, but there has been a growing desire to accurately detect copy number variations (CNVs) as well. In order to address this research and clinical needs, we developed a sequencing coverage pattern-based method not only for copy number detections, data integrity checks, CNV calling, and visualization reports. The developed methodologies include complete automation to increase usability, genome content-coverage bias correction, CNV segmentation, data quality reports, and publication quality images. Automatic identification and removal of poor quality outlier samples were made automatically. Multiple experimental batches were routinely detected and further reduced for a clean subset of samples before analysis. Algorithm improvements were also made to improve somatic CNV detection as well as germline CNV detection in trio family. Additionally, a set of utilities was included to facilitate users for producing CNV plots in focused genes of interest. We demonstrate the somatic CNV enhancements by accurately detecting CNVs in whole exome-wide data from the cancer genome atlas cancer samples and a lymphoma case study with paired tumor and normal samples. We also showed our efficient reuses of existing exome sequencing data, for improved germline CNV calling in a family of the trio from the phase-III study of 1000 Genome to detect CNVs with various modes of inheritance. The performance of the developed method is evaluated by comparing CNV calling results with results from other orthogonal copy number platforms. Through our case studies, reuses of exome sequencing data for calling CNVs have several noticeable functionalities, including a better quality control for exome sequencing data, improved joint analysis with single nucleotide variant calls, and novel genomic discovery of under-utilized existing whole exome and custom exome panel data.

Keywords: bioinformatics, computational genetics, copy number variations, data reuse, exome sequencing, next generation sequencing

Procedia PDF Downloads 183
4553 Genetic Diversity and Discovery of Unique SNPs in Five Country Cultivars of Sesamum indicum by Next-Generation Sequencing

Authors: Nam-Kuk Kim, Jin Kim, Soomin Park, Changhee Lee, Mijin Chu, Seong-Hun Lee

Abstract:

In this study, we conducted whole genome re-sequencing of 10 cultivars originated from five countries including Korea, China, India, Pakistan and Ethiopia with Sesamum indicum (Zhongzho No. 13) genome as a reference. Almost 80% of the whole genome sequences of the reference genome could be covered by sequenced reads. Numerous SNP and InDel were detected by bioinformatic analysis. Among these variants, 266,051 SNPs were identified as unique to countries. Pakistan and Ethiopia had high densities of SNPs compared to other countries. Three main clusters (cluster 1: Korea, cluster 2: Pakistan and India, cluster 3: Ethiopia and China) were recovered by neighbor-joining analysis using all variants. Interestingly, some variants were detected in DGAT1 (diacylglycerol O-acyltransferase 1) and FADS (fatty acid desaturase) genes, which are known to be related with fatty acid synthesis and metabolism. These results can provide useful information to understand the regional characteristics and develop DNA markers for origin discrimination of sesame.

Keywords: Sesamum indicum, NGS, SNP, DNA marker

Procedia PDF Downloads 251
4552 Genomic Diversity and Relationship among Arabian Peninsula Dromedary Camels Using Full Genome Sequencing Approach

Authors: H. Bahbahani, H. Musa, F. Al Mathen

Abstract:

The dromedary camels (Camelus dromedarius) are single-humped even-toed ungulates populating the African Sahara, Arabian Peninsula, and Southwest Asia. The genome of this desert-adapted species has been minimally investigated using autosomal microsatellite and mitochondrial DNA markers. In this study, the genomes of 33 dromedary camel samples from different parts of the Arabian Peninsula were sequenced using Illumina Next Generation Sequencing (NGS) platform. These data were combined with Genotyping-by-Sequencing (GBS) data from African (Sudanese) dromedaries to investigate the genomic relationship between African and Arabian Peninsula dromedary camels. Principle Component Analysis (PCA) and average genome-wide admixture analysis were be conducted on these data to tackle the objectives of these studies. Both of the two analyses conducted revealed phylogeographic distinction between these two camel populations. However, no breed-wise genetic classification has been revealed among the African (Sudanese) camel breeds. The Arabian Peninsula camel populations also show higher heterozygosity than the Sudanese camels. The results of this study explain the evolutionary history and migration of African dromedary camels from their center of domestication in the southern Arabian Peninsula. These outputs help scientists to further understand the evolutionary history of dromedary camels, which might impact in conserving the favorable genetic of this species.

Keywords: dromedary, genotyping-by-sequencing, Arabian Peninsula, Sudan

Procedia PDF Downloads 97
4551 Isolate-Specific Variations among Clinical Isolates of Brucella Identified by Whole-Genome Sequencing, Bioinformatics and Comparative Genomics

Authors: Abu S. Mustafa, Mohammad W. Khan, Faraz Shaheed Khan, Nazima Habibi

Abstract:

Brucellosis is a zoonotic disease of worldwide prevalence. There are at least four species and several strains of Brucella that cause human disease. Brucella genomes have very limited variation across strains, which hinder strain identification using classical molecular techniques, including PCR and 16 S rDNA sequencing. The aim of this study was to perform whole genome sequencing of clinical isolates of Brucella and perform bioinformatics and comparative genomics analyses to determine the existence of genetic differences across the isolates of a single Brucella species and strain. The draft sequence data were generated from 15 clinical isolates of Brucella melitensis (biovar 2 strain 63/9) using MiSeq next generation sequencing platform. The generated reads were used for further assembly and analysis. All the analysis was performed using Bioinformatics work station (8 core i7 processor, 8GB RAM with Bio-Linux operating system). FastQC was used to determine the quality of reads and low quality reads were trimmed or eliminated using Fastx_trimmer. Assembly was done by using Velvet and ABySS softwares. The ordering of assembled contigs was performed by Mauve. An online server RAST was employed to annotate the contigs assembly. Annotated genomes were compared using Mauve and ACT tools. The QC score for DNA sequence data, generated by MiSeq, was higher than 30 for 80% of reads with more than 100x coverage, which suggested that data could be utilized for further analysis. However when analyzed by FastQC, quality of four reads was not good enough for creating a complete genome draft so remaining 11 samples were used for further analysis. The comparative genome analyses showed that despite sharing same gene sets, single nucleotide polymorphisms and insertions/deletions existed across different genomes, which provided a variable extent of diversity to these bacteria. In conclusion, the next generation sequencing, bioinformatics, and comparative genome analysis can be utilized to find variations (point mutations, insertions and deletions) across different genomes of Brucella within a single strain. This information could be useful in surveillance and epidemiological studies supported by Kuwait University Research Sector grants MI04/15 and SRUL02/13.

Keywords: brucella, bioinformatics, comparative genomics, whole genome sequencing

Procedia PDF Downloads 275
4550 Characterization of the Intestinal Microbiota: A Signature in Fecal Samples from Patients with Irritable Bowel Syndrome

Authors: Mina Hojat Ansari, Kamran Bagheri Lankarani, Mohammad Reza Fattahi, Ali Reza Safarpour

Abstract:

Irritable bowel syndrome (IBS) is a common bowel disorder which is usually diagnosed through the abdominal pain, fecal irregularities and bloating. Alteration in the intestinal microbial composition is implicating to inflammatory and functional bowel disorders which is recently also noted as an IBS feature. Owing to the potential importance of microbiota implication in both efficiencies of the treatment and prevention of the diseases, we examined the association between the intestinal microbiota and different bowel patterns in a cohort of subjects with IBS and healthy controls. Fresh fecal samples were collected from a total of 50 subjects, 30 of whom met the Rome IV criteria for IBS and 20 Healthy control. Total DNA was extracted and library preparation was conducted following the standard protocol for small whole genome sequencing. The pooled libraries sequenced on an Illumina Nextseq platform with a 2 × 150 paired-end read length and obtained sequences were analyzed using several bioinformatics programs. The majority of sequences obtained in the current study assigned to bacteria. However, our finding highlighted the significant microbial taxa variation among the studied groups. The result, therefore, suggests a significant association of the microbiota with symptoms and bowel characteristics in patients with IBS. These alterations in fecal microbiota could be exploited as a biomarker for IBS or its subtypes and suggest the modification of the microbiota might be integrated into prevention and treatment strategies for IBS.

Keywords: irritable bowel syndrome, intestinal microbiota, small whole genome sequencing, fecal samples, Illumina

Procedia PDF Downloads 76
4549 Language Shapes Thought: An Experimental Study on English and Mandarin Native Speakers' Sequencing of Size

Authors: Hsi Wei

Abstract:

Does the language we speak affect the way we think? This question has been discussed for a long time from different aspects. In this article, the issue is examined with an experiment on how speakers of different languages tend to do different sequencing when it comes to the size of general objects. An essential difference between the usage of English and Mandarin is the way we sequence the size of places or objects. In English, when describing the location of something we may say, for example, ‘The pen is inside the trashcan next to the tree at the park.’ In Mandarin, however, we would say, ‘The pen is at the park next to the tree inside the trashcan.’ It’s clear that generally English use the sequence of small to big while Mandarin the opposite. Therefore, the experiment was conducted to test if the difference of the languages affects the speakers’ ability to do the different sequencing. There were two groups of subjects; one consisted of English native speakers, another of Mandarin native speakers. Within the experiment, three nouns were showed as a group to the subjects as their native languages. Before they saw the nouns, they would first get an instruction of ‘big to small’, ‘small to big’, or ‘repeat’. Therefore, the subjects had to sequence the following group of nouns as the instruction they get or simply repeat the nouns. After completing every sequencing and repetition in their minds, they pushed a button as reaction. The repetition design was to gather the mere reading time of the person. As the result of the experiment showed, English native speakers reacted more quickly to the sequencing of ‘small to big’; on the other hand, Mandarin native speakers reacted more quickly to the sequence ‘big to small’. To conclude, this study may be of importance as a support for linguistic relativism that the language we speak do shape the way we think.

Keywords: language, linguistic relativism, size, sequencing

Procedia PDF Downloads 213
4548 Implementation of CNV-CH Algorithm Using Map-Reduce Approach

Authors: Aishik Deb, Rituparna Sinha

Abstract:

We have developed an algorithm to detect the abnormal segment/"structural variation in the genome across a number of samples. We have worked on simulated as well as real data from the BAM Files and have designed a segmentation algorithm where abnormal segments are detected. This algorithm aims to improve the accuracy and performance of the existing CNV-CH algorithm. The next-generation sequencing (NGS) approach is very fast and can generate large sequences in a reasonable time. So the huge volume of sequence information gives rise to the need for Big Data and parallel approaches of segmentation. Therefore, we have designed a map-reduce approach for the existing CNV-CH algorithm where a large amount of sequence data can be segmented and structural variations in the human genome can be detected. We have compared the efficiency of the traditional and map-reduce algorithms with respect to precision, sensitivity, and F-Score. The advantages of using our algorithm are that it is fast and has better accuracy. This algorithm can be applied to detect structural variations within a genome, which in turn can be used to detect various genetic disorders such as cancer, etc. The defects may be caused by new mutations or changes to the DNA and generally result in abnormally high or low base coverage and quantification values.

Keywords: cancer detection, convex hull segmentation, map reduce, next generation sequencing

Procedia PDF Downloads 53
4547 Phenotype Prediction of DNA Sequence Data: A Machine and Statistical Learning Approach

Authors: Darlington Mapiye, Mpho Mokoatle, James Mashiyane, Stephanie Muller, Gciniwe Dlamini

Abstract:

Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on k-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0 %, 80.5 %, 80.5 %, 63.6 %, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanisms

Keywords: AWD-LSTM, bootstrapping, k-mers, next generation sequencing

Procedia PDF Downloads 66
4546 Phenotype Prediction of DNA Sequence Data: A Machine and Statistical Learning Approach

Authors: Mpho Mokoatle, Darlington Mapiye, James Mashiyane, Stephanie Muller, Gciniwe Dlamini

Abstract:

Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on $k$-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0%, 80.5%, 80.5%, 63.6%, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanisms.

Keywords: AWD-LSTM, bootstrapping, k-mers, next generation sequencing

Procedia PDF Downloads 76
4545 Genome Sequencing of Infectious Bronchitis Virus QX-Like Strain Isolated in Malaysia

Authors: M. Suwaibah, S. W. Tan, I. Aiini, K. Yusoff, A. R. Omar

Abstract:

Respiratory diseases are the most important infectious diseases affecting poultry worldwide. One of the avian respiratory virus of global importance causing significant economic losses is Infectious Bronchitis Virus (IBV). The virus causes a wide spectrum disease known as Infectious Bronchitis (IB), affecting not only the respiratory system but also the kidney and the reproductive system, depending on its strain. IB and Newcastle disease are two of the most prevalent diseases affecting poultry in Malaysia. However, a study on the molecular characterization of Malaysian IBV is lacking. In this study, an IBV strain IBS130 which was isolated in 2015 was fully sequenced using next-gene sequencing approach. Sequence analysis of IBS130 based on the complete genome, polyprotein 1ab and S1 genes were compared with other IBV sequences available in Genbank, National Center for Biotechnology Information (NCBI). IBV strain IBS130 is characterised as QX-like strain based on whole genome and S1 gene sequence analysis. Comparisons of the virus with other IBV strains showed that the nucleotide identity ranged from 67% to 99.2%, depending on the region analysed. The similarity in whole genome nucleotide ranging from 84.9% to 90.7% with the least similar was from Singapore strains (84.9%) and highly similar with China QX-like strains. Meanwhile, the similarity in polyprotein 1ab ranging from 85.3% to 89.9% with the least similar to Singapore strains (85.3%) and highly similar with Mass strains from USA.

Keywords: infectious bronchitis virus, phylogenetic analysis, chicken, Malaysia

Procedia PDF Downloads 105
4544 Novel Recombinant Betasatellite Associated with Vein Thickening Symptoms on Okra Plants in Saudi Arabia

Authors: Adel M. Zakri, Mohammed A. Al-Saleh, Judith. K. Brown, Ali M. Idris

Abstract:

Betasatellites are small circular single stranded DNA molecules found associated with begomoviruses on field symptomatic plants. Their genome size is about half that of the helper begomovirus, ranging between 1.3 and 1.4 kb. The helper begomoviruses are usually members of the family Geminiviridae. Okra leaves showing vein thickening were collected from okra plants growing in Jazan, Saudi Arabia. Total DNA was extracted from leaves and used as a template to amplify circular DNA using rolling circle amplification (RCA) technology. Products were digested with PstI to linearize the helper viral genome(s), and associated DNA satellite(s), yielding a 2.8kbp and 1.4kbp fragment, respectively. The linearized fragments were cloned into the pGEM-5Zf (+) vector and subjected to DNA sequencing. The 2.8 kb fragment was identified as Cotton leaf curl Gezira virus genome, at 2780bp, an isolate closely related to strains reported previously from Saudi Arabia. A clone obtained from the 1.4 kb fragments he 1.4kb was blasted to GeneBank database found to be a betasatellite. The genome of betasatellite was 1357-bp in size. It was found to be a recombinant containing one fragment (877-bp) that shared 91% nt identity with Cotton leaf curl Gezira betasatellite [KM279620], and a smaller fragment [133--bp) that shared 86% nt identity with Tomato leaf curl Sudan virus [JX483708]. This satellite is thus a recombinant between a malvaceous-infecting satellite and a solanaceous-infecting begomovirus.

Keywords: begomovirus, betasatellites, cotton leaf curl Gezira virus, okra plants

Procedia PDF Downloads 265
4543 Genome Analyses of Pseudomonas Fluorescens b29b from Coastal Kerala

Authors: Wael Ali Mohammed Hadi

Abstract:

Pseudomonas fluorescens B29B, which has asparaginase enzymatic activity, was isolated from the surface coastal seawater of Trivandrum, India. We report the complete Pseudomonas fluorescens B29B genome sequenced, identified, and annotated from a marine source. We find the genome at most minuscule a 7,331,508 bp single circular chromosome with a GC content of 62.19% and 6883 protein-coding genes. Three hundred forty subsystems were identified, including two predicted asparaginases from the genome analysis of P. fluorescens B29B for further investigation. This genome data will help further industrial biotechnology applications of proteins in general and asparaginase as a target.

Keywords: pseudomonas, marine, asparaginases, Kerala, whole-genome

Procedia PDF Downloads 125
4542 Computing the Similarity and the Diversity in the Species Based on Cronobacter Genome

Authors: E. Al Daoud

Abstract:

The purpose of computing the similarity and the diversity in the species is to trace the process of evolution and to find the relationship between the species and discover the unique, the special, the common and the universal proteins. The proteins of the whole genome of 40 species are compared with the cronobacter genome which is used as reference genome. More than 3 billion pairwise alignments are performed using blastp. Several findings are introduced in this study, for example, we found 172 proteins in cronobacter genome which have insignificant hits in other species, 116 significant proteins in the all tested species with very high score value and 129 common proteins in the plants but have insignificant hits in mammals, birds, fishes, and insects.

Keywords: genome, species, blastp, conserved genes, Cronobacter

Procedia PDF Downloads 403
4541 Massively Parallel Sequencing Improved Resolution for Paternity Testing

Authors: Xueying Zhao, Ke Ma, Hui Li, Yu Cao, Fan Yang, Qingwen Xu, Wenbin Liu

Abstract:

Massively parallel sequencing (MPS) technologies allow high-throughput sequencing analyses with a relatively affordable price and have gradually been applied to forensic casework. MPS technology identifies short tandem repeat (STR) loci based on sequence so that repeat motif variation within STRs can be detected, which may help one to infer the origin of the mutation in some cases. Here, we report on one case with one three-step mismatch (D18S51) in family trios based on both capillary electrophoresis (CE) and MPS typing. The alleles of the alleged father (AF) are [AGAA]₁₇AGAG[AGAA]₃ and [AGAA]₁₅. The mother’s alleles are [AGAA]₁₉ and [AGAA]₉AGGA[AGAA]₃. The questioned child’s (QC) alleles are [AGAA]₁₉ and [AGAA]₁₂. Given that the sequence variants in repeat regions of AF and mother are not observed in QC’s alleles, the QC’s allele [AGAA]₁₂ was likely inherited from the AF’s allele [AGAA]₁₅ by loss of three repeat [AGAA]. Besides, two new alleles of D18S51 in this study, [AGAA]₁₇AGAG[AGAA]₃ and [AGAA]₉AGGA[AGAA]₃, have not been reported before. All the results in this study were verified using Sanger-type sequencing. In summary, the MPS typing method can offer valuable information for forensic genetics research and play a promising role in paternity testing.

Keywords: family trios analysis, forensic casework, ion torrent personal genome machine (PGM), massively parallel sequencing (MPS)

Procedia PDF Downloads 224
4540 Whole Exome Sequencing Data Analysis of Rare Diseases: Non-Coding Variants and Copy Number Variations

Authors: S. Fahiminiya, J. Nadaf, F. Rauch, L. Jerome-Majewska, J. Majewski

Abstract:

Background: Sequencing of protein coding regions of human genome (Whole Exome Sequencing; WES), has demonstrated a great success in the identification of causal mutations for several rare genetic disorders in human. Generally, most of WES studies have focused on rare variants in coding exons and splicing-sites where missense substitutions lead to the alternation of protein product. Although focusing on this category of variants has revealed the mystery behind many inherited genetic diseases in recent years, a subset of them remained still inconclusive. Here, we present the result of our WES studies where analyzing only rare variants in coding regions was not conclusive but further investigation revealed the involvement of non-coding variants and copy number variations (CNV) in etiology of the diseases. Methods: Whole exome sequencing was performed using our standard protocols at Genome Quebec Innovation Center, Montreal, Canada. All bioinformatics analyses were done using in-house WES pipeline. Results: To date, we successfully identified several disease causing mutations within gene coding regions (e.g. SCARF2: Van den Ende-Gupta syndrome and SNAP29: 22q11.2 deletion syndrome) by using WES. In addition, we showed that variants in non-coding regions and CNV have also important value and should not be ignored and/or filtered out along the way of bioinformatics analysis on WES data. For instance, in patients with osteogenesis imperfecta type V and in patients with glucocorticoid deficiency, we identified variants in 5'UTR, resulting in the production of longer or truncating non-functional proteins. Furthermore, CNVs were identified as the main cause of the diseases in patients with metaphyseal dysplasia with maxillary hypoplasia and brachydactyly and in patients with osteogenesis imperfecta type VII. Conclusions: Our study highlights the importance of considering non-coding variants and CNVs during interpretation of WES data, as they can be the only cause of disease under investigation.

Keywords: whole exome sequencing data, non-coding variants, copy number variations, rare diseases

Procedia PDF Downloads 337
4539 Genomic Characterisation of Equine Sarcoid-derived Bovine Papillomavirus Type 1 and 2 Using Nanopore-Based Sequencing

Authors: Lien Gysens, Bert Vanmechelen, Maarten Haspeslagh, Piet Maes, Ann Martens

Abstract:

Bovine papillomavirus (BPV) types 1 and 2 play a central role in the etiology of the most common neoplasm in horses, the equine sarcoid. The unknown mechanism behind the unique variety in a clinical presentation on the one hand and the host-dependent clinical outcome of BPV-1 infection, on the other hand, indicate the involvement of additional factors. Earlier studies have reported the potential functional significance of intratypic sequence variants, along with the existence of sarcoid-sourced BPV variants. Therefore, intratypic sequence variation seems to be an important emerging viral factor. This study aimed to give a broad insight in sarcoid-sourced BPV variation and explore its potential association with disease presentation. In order to do this, a nanopore sequencing approach was successfully optimized for screening a wide spectrum of clinical samples. Specimens of each tumour were initially screened for BPV-1/-2 by quantitative real-time PCR. A custom-designed primer set was used on BPV-positive samples to amplify the complete viral genome in two multiplex PCR reactions, resulting in a set of overlapping amplicons. For phylogenetic analysis, separate alignments were made of all available complete genome sequences for BPV-1/-2. The resulting alignments were used to infer Bayesian phylogenetic trees. We found substantial genetic variation among sarcoid-derived BPV-1, although this variation could not be linked to disease severity. Several of the BPV-1 genomes had multiple major deletions. Remarkably, the majority of the cluster within the region coding for late viral genes. Together with the extensiveness (up to 603 nucleotides) of the described deletions, this suggests an altered function of L1/L2 in disease pathogenesis. By generating a significant amount of complete-length BPV genomes, we succeeded in introducing next-generation sequencing into veterinary research focusing on the equine sarcoid, thus facilitating the first report of both nanopore-based sequencing of complete sarcoid-sourced BPV-1/-2 and the simultaneous nanopore sequencing of multiple complete genomes originating from a single clinical sample.

Keywords: Bovine papillomavirus, equine sarcoid, horse, nanopore sequencing, phylogenetic analysis

Procedia PDF Downloads 85
4538 Expression Profiling and Immunohistochemical Analysis of Squamous Cell Carcinoma of Head and Neck (Tumor, Transition Zone, Normal) by Whole Genome Scale Sequencing

Authors: Veronika Zivicova, Petr Broz, Zdenek Fik, Alzbeta Mifkova, Jan Plzak, Zdenek Cada, Herbert Kaltner, Jana Fialova Kucerova, Hans-Joachim Gabius, Karel Smetana Jr.

Abstract:

The possibility to determine genome-wide expression profiles of cells and tissues opens a new level of analysis in the quest to define dysregulation in malignancy and thus identify new tumor markers. Toward this long-term aim, we here address two issues on this level for head and neck cancer specimen: i) defining profiles in different regions, i.e. the tumor, the transition zone and normal control and ii) comparing complete data sets for seven individual patients. Special focus in the flanking immunohistochemical part is given to adhesion/growth-regulatory galectins that upregulate chemo- and cytokine expression in an NF-κB-dependent manner, to these regulators and to markers of differentiation, i.e. keratins. The detailed listing of up- and down-regulations, also available in printed form (1), not only served to unveil new candidates for testing as marker but also let the impact of the tumor in the transition zone become apparent. The extent of interindividual variation raises a strong cautionary note on assuming uniformity of regulatory events, to be noted when considering therapeutic implications. Thus, a combination of test targets (and a network analysis for galectins and their downstream effectors) is (are) advised prior to reaching conclusions on further perspectives.

Keywords: galectins, genome scale sequencing, squamous cell carcinoma, transition zone

Procedia PDF Downloads 175
4537 Genome Sequencing and Analysis of the Spontaneous Nanosilver Resistant Bacterium Proteus mirabilis Strain scdr1

Authors: Amr Saeb, Khalid Al-Rubeaan, Mohamed Abouelhoda, Manojkumar Selvaraju, Hamsa Tayeb

Abstract:

Background: P. mirabilis is a common uropathogenic bacterium that can cause major complications in patients with long-standing indwelling catheters or patients with urinary tract anomalies. In addition, P. mirabilis is a common cause of chronic osteomyelitis in diabetic foot ulcer (DFU) patients. Methodology: P. mirabilis SCDR1 was isolated from a diabetic ulcer patient. We examined P. mirabilis SCDR1 levels of resistance against nano-silver colloids, the commercial nano-silver and silver containing bandages and commonly used antibiotics. We utilized next generation sequencing techniques (NGS), bioinformatics, phylogenetic analysis and pathogenomics in the identification and characterization of the infectious pathogen. Results: P. mirabilis SCDR1 is a multi-drug resistant isolate that also showed high levels of resistance against nano-silver colloids, nano-silver chitosan composite and the commercially available nano-silver and silver bandages. The P. mirabilis-SCDR1 genome size is 3,815,621 bp with G+C content of 38.44%. P. mirabilis-SCDR1 genome contains a total of 3,533 genes, 3,414 coding DNA sequence genes, 11, 10, 18 rRNAs (5S, 16S, and 23S), and 76 tRNAs. Our isolate contains all the required pathogenicity and virulence factors to establish a successful infection. P. mirabilis SCDR1 isolate is a potential virulent pathogen that despite its original isolation site, wound, it can establish kidney infection and its associated complications. P. mirabilis SCDR1 contains several mechanisms for antibiotics and metals resistance including, biofilm formation, swarming mobility, efflux systems, and enzymatic detoxification. Conclusion: P. mirabilis SCDR1 is the spontaneous nano-silver resistant bacterial strain. P. mirabilis SCDR1 strain contains all reported pathogenic and virulence factors characteristic for the species. In addition, it possesses several mechanisms that may lead to the observed nano-silver resistance.

Keywords: Proteus mirabilis, multi-drug resistance, silver nanoparticles, resistance, next generation sequencing techniques, genome analysis, bioinformatics, phylogeny, pathogenomics, diabetic foot ulcer, xenobiotics, multidrug resistance efflux, biofilm formation, swarming mobility, resistome, glutathione S-transferase, copper/silver efflux system, altruism

Procedia PDF Downloads 261
4536 The Cleavage of DNA by the Anti-Tumor Drug Bleomycin at the Transcription Start Sites of Human Genes Using Genome-Wide Techniques

Authors: Vincent Murray

Abstract:

The glycopeptide bleomycin is used in the treatment of testicular cancer, Hodgkin's lymphoma, and squamous cell carcinoma. Bleomycin damages and cleaves DNA in human cells, and this is considered to be the main mode of action for bleomycin's anti-tumor activity. In particular, double-strand breaks are thought to be the main mechanism for the cellular toxicity of bleomycin. Using Illumina next-generation DNA sequencing techniques, the genome-wide sequence specificity of bleomycin-induced double-strand breaks was determined in human cells. The degree of bleomycin cleavage was also assessed at the transcription start sites (TSSs) of actively transcribed genes and compared with non-transcribed genes. It was observed that bleomycin preferentially cleaved at the TSSs of actively transcribed human genes. There was a correlation between the degree of this enhanced cleavage at TSSs and the level of transcriptional activity. Bleomycin cleavage is also affected by chromatin structure and at TSSs, the peaks of bleomycin cleavage were approximately 200 bp apart. This indicated that bleomycin was able to detect phased nucleosomes at the TSSs of actively transcribed human genes. The genome-wide cleavage pattern of the bleomycin analogues 6′-deoxy-BLM Z and zorbamycin was also investigated in human cells. As found for bleomycin, these bleomycin analogues also preferentially cleaved at the TSSs of actively transcribed human genes. The cytotoxicity (IC₅₀ values) of these bleomycin analogues was determined. It was found that the degree of enhanced cleavage at TSSs was inversely correlated with the IC₅₀ values of the bleomycin analogues. This suggested that the level of cleavage at the TSSs of actively transcribed human genes was important for the cytotoxicity of bleomycin and analogues. Hence this study provided a deeper understanding of the cellular processes involved in the cancer chemotherapeutic activity of bleomycin.

Keywords: anti-tumour activity, bleomycin analogues, chromatin structure, genome-wide study, Illumina DNA sequencing

Procedia PDF Downloads 61
4535 Analysis of Endogenous Sirevirus in Germinating Barley (Hordeum vulgare L.)

Authors: Nermin Gozukirmizi, Buket Cakmak, Sevgi Marakli

Abstract:

Sireviruses are genera of copia LTR retrotransposons with a unique genome structure among retrotransposons. Barley (Hordeum vulgare L.) is an economically important plant and has been studied as a model plant regarding its short annual life cycle and seven chromosome pairs. In this study, we used mature barley embryos, 10-day-old roots and 10-day-old leaves derived from the same barley plant to investigate SIRE1 retrotransposon movements by Inter-Retrotransposon Amplified Polymorphism (IRAP) technique. We found polymorphism rates between 0-64% among embryos, roots and leaves. Polymorphism rates were detected to be 0-27% among embryos, 8-60% among roots, and 11-50% among leaves. Polymorphisms were observed not only among the parts of different individuals, but also on the parts of the same plant (23-64%). The internal domains of SIRE1 (gag, env and rt) were also analyzed in the embryos, roots and leaves. Analysis of band profiles showed no polymorphism for gag, however, different band patterns were observed among samples for rt and env. The sequencing of SIRE1 gag, env and rt domains revealed 79% similarity for gag, 95% for env and 84% for rt to Ty1-copia retrotransposons. SIRE1 retrotransposon was identified in the soybean genome and has been studied on other plants (maize, rice, tomatoe etc.). This study is the first detailed investigation of SIRE1 in barley genome. The obtained findings are expected to contribute to the comprehension of SIRE1 retrotransposon and its role in barley genome.

Keywords: barley, polymorphism, retrotransposon, SIRE1 virus

Procedia PDF Downloads 231
4534 CMPD: Cancer Mutant Proteome Database

Authors: Po-Jung Huang, Chi-Ching Lee, Bertrand Chin-Ming Tan, Yuan-Ming Yeh, Julie Lichieh Chu, Tin-Wen Chen, Cheng-Yang Lee, Ruei-Chi Gan, Hsuan Liu, Petrus Tang

Abstract:

Whole-exome sequencing focuses on the protein coding regions of disease/cancer associated genes based on a priori knowledge is the most cost-effective method to study the association between genetic alterations and disease. Recent advances in high throughput sequencing technologies and proteomic techniques has provided an opportunity to integrate genomics and proteomics, allowing readily detectable mutated peptides corresponding to mutated genes. Since sequence database search is the most widely used method for protein identification using Mass spectrometry (MS)-based proteomics technology, a mutant proteome database is required to better approximate the real protein pool to improve disease-associated mutated protein identification. Large-scale whole exome/genome sequencing studies were launched by National Cancer Institute (NCI), Broad Institute, and The Cancer Genome Atlas (TCGA), which provide not only a comprehensive report on the analysis of coding variants in diverse samples cell lines but a invaluable resource for extensive research community. No existing database is available for the collection of mutant protein sequences related to the identified variants in these studies. CMPD is designed to address this issue, serving as a bridge between genomic data and proteomic studies and focusing on protein sequence-altering variations originated from both germline and cancer-associated somatic variations.

Keywords: TCGA, cancer, mutant, proteome

Procedia PDF Downloads 491
4533 Genodata: The Human Genome Variation Using BigData

Authors: Surabhi Maiti, Prajakta Tamhankar, Prachi Uttam Mehta

Abstract:

Since the accomplishment of the Human Genome Project, there has been an unparalled escalation in the sequencing of genomic data. This project has been the first major vault in the field of medical research, especially in genomics. This project won accolades by using a concept called Bigdata which was earlier, extensively used to gain value for business. Bigdata makes use of data sets which are generally in the form of files of size terabytes, petabytes, or exabytes and these data sets were traditionally used and managed using excel sheets and RDBMS. The voluminous data made the process tedious and time consuming and hence a stronger framework called Hadoop was introduced in the field of genetic sciences to make data processing faster and efficient. This paper focuses on using SPARK which is gaining momentum with the advancement of BigData technologies. Cloud Storage is an effective medium for storage of large data sets which is generated from the genetic research and the resultant sets produced from SPARK analysis.

Keywords: human genome project, Bigdata, genomic data, SPARK, cloud storage, Hadoop

Procedia PDF Downloads 157