Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 881

Search results for: Genome sequences

851 Complete Genome Sequence Analysis of Pasteurella multocida Subspecies multocida Serotype A Strain PMTB2.1

Authors: Shagufta Jabeen, Faez J. Firdaus Abdullah, Zunita Zakaria, Nurulfiza M. Isa, Yung C. Tan, Wai Y. Yee, Abdul R. Omar

Abstract:

Pasteurella multocida (PM) is an important veterinary opportunistic pathogen particularly associated with septicemic pasteurellosis, pneumonic pasteurellosis and hemorrhagic septicemia in cattle and buffaloes. P. multocida serotype A has been reported to cause fatal pneumonia and septicemia. Pasteurella multocida subspecies multocida of serotype A Malaysian isolate PMTB2.1 was first isolated from buffaloes died of septicemia. In this study, the genome of P. multocida strain PMTB2.1 was sequenced using third-generation sequencing technology, PacBio RS2 system and analyzed bioinformatically via de novo analysis followed by in-depth analysis based on comparative genomics. Bioinformatics analysis based on de novo assembly of PacBio raw reads generated 3 contigs followed by gap filling of aligned contigs with PCR sequencing, generated a single contiguous circular chromosome with a genomic size of 2,315,138 bp and a GC content of approximately 40.32% (Accession number CP007205). The PMTB2.1 genome comprised of 2,176 protein-coding sequences, 6 rRNA operons and 56 tRNA and 4 ncRNAs sequences. The comparative genome sequence analysis of PMTB2.1 with nine complete genomes which include Actinobacillus pleuropneumoniae, Haemophilus parasuis, Escherichia coli and five P. multocida complete genome sequences including, PM70, PM36950, PMHN06, PM3480, PMHB01 and PMTB2.1 was carried out based on OrthoMCL analysis and Venn diagram. The analysis showed that 282 CDs (13%) are unique to PMTB2.1and 1,125 CDs with orthologs in all. This reflects overall close relationship of these bacteria and supports the classification in the Gamma subdivision of the Proteobacteria. In addition, genomic distance analysis among all nine genomes indicated that PMTB2.1 is closely related with other five Pasteurella species with genomic distance less than 0.13. Synteny analysis shows subtle differences in genetic structures among different P.multocida indicating the dynamics of frequent gene transfer events among different P. multocida strains. However, PM3480 and PM70 exhibited exceptionally large structural variation since they were swine and chicken isolates. Furthermore, genomic structure of PMTB2.1 is more resembling that of PM36950 with a genomic size difference of approximately 34,380 kb (smaller than PM36950) and strain-specific Integrative and Conjugative Elements (ICE) which was found only in PM36950 is absent in PMTB2.1. Meanwhile, two intact prophages sequences of approximately 62 kb were found to be present only in PMTB2.1. One of phage is similar to transposable phage SfMu. The phylogenomic tree was constructed and rooted with E. coli, A. pleuropneumoniae and H. parasuis based on OrthoMCL analysis. The genomes of P. multocida strain PMTB2.1 were clustered with bovine isolates of P. multocida strain PM36950 and PMHB01 and were separated from avian isolate PM70 and swine isolates PM3480 and PMHN06 and are distant from Actinobacillus and Haemophilus. Previous studies based on Single Nucleotide Polymorphism (SNPs) and Multilocus Sequence Typing (MLST) unable to show a clear phylogenetic relatedness between Pasteurella multocida and the different host. In conclusion, this study has provided insight on the genomic structure of PMTB2.1 in terms of potential genes that can function as virulence factors for future study in elucidating the mechanisms behind the ability of the bacteria in causing diseases in susceptible animals.

Keywords: comparative genomics, DNA sequencing, phage, phylogenomics

Procedia PDF Downloads 154

850 Genome-Wide Mining of Potential Guide RNAs for Streptococcus pyogenes and Neisseria meningitides CRISPR-Cas Systems for Genome Engineering

Authors: Farahnaz Sadat Golestan Hashemi, Mohd Razi Ismail, Mohd Y. Rafii

Abstract:

Clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated protein (Cas) system can facilitate targeted genome editing in organisms. Dual or single guide RNA (gRNA) can program the Cas9 nuclease to cut target DNA in particular areas; thus, introducing concise mutations either via error-prone non-homologous end-joining repairing or via incorporating foreign DNAs by homologous recombination between donor DNA and target area. In spite of high demand of such promising technology, developing a well-organized procedure in order for reliable mining of potential target sites for gRNAs in large genomic data is still challenging. Hence, we aimed to perform high-throughput detection of target sites by specific PAMs for not only common Streptococcus pyogenes (SpCas9) but also for Neisseria meningitides (NmCas9) CRISPR-Cas systems. Previous research confirmed the successful application of such RNA-guided Cas9 orthologs for effective gene targeting and subsequently genome manipulation. However, Cas9 orthologs need their particular PAM sequence for DNA cleavage activity. Activity levels are based on the sequence of the protospacer and specific combinations of favorable PAM bases. Therefore, based on the specific length and sequence of PAM followed by a constant length of the target site for the two orthogonals of Cas9 protein, we created a reliable procedure to explore possible gRNA sequences. To mine CRISPR target sites, four different searching modes of sgRNA binding to target DNA strand were applied. These searching modes are as follows i) coding strand searching, ii) anti-coding strand searching, iii) both strand searching, and iv) paired-gRNA searching. Finally, a complete list of all potential gRNAs along with their locations, strands, and PAMs sequence orientation can be provided for both SpCas9 as well as another potential Cas9 ortholog (NmCas9). The artificial design of potential gRNAs in a genome of interest can accelerate functional genomic studies. Consequently, the application of such novel genome editing tool (CRISPR/Cas technology) will enhance by presenting increased versatility and efficiency.

Keywords: CRISPR/Cas9 genome editing, gRNA mining, SpCas9, NmCas9

Procedia PDF Downloads 227

849 Development and Characterization of Polymorphic Genomic-SSR Markers in Asian Long-Horned Beetle (Anoplophora glabripennis)

Authors: Zhao Yang Liu, Jing Tao

Abstract:

The Asian long-horned beetle, Anoplophora glabripennis (Motschulsky) (Coleoptera: Cerambycidae: Lamiinae), is a wood-borer and polyphagous xylophages native to Asia and killing healthy trees. As it causes serious danger to trees, the beetle has been paid close attention in the world. However, the genetic markers limited, especially microsatellite. In this study, 24 novel simple sequence repeat (SSR) molecular markers, a powerful tool for genetic diversity studies and linkage map construction, were developed and characterized from whole genome shotgun sequences. We developed SSR loci of 2 to 6 repeated and perfect units including 9895 points, the density of SSRs was found one SSR per 56.57 kb and the abundance of SSR was 0.02/kb, besides 140 types of repeats motifs were found. Half of the 48 pairs SSR primers (containing 4 di-, 7 tri-, 2 tetra- and 11 hexamers SSRs) we selected randomly from 1222 pairs of primers were polymorphism. The number of alleles for these markers in 48 individuals varied from 3 to 21 with an average of 7.71, the number of effective alleles ranged from 1.22 to 9.97 with an average of 3.54. Besides this, the polymorphic information content (PIC) ranged from 0.18 to 0.89 with a mean of 0.65, And Shannon's Information index (I) ranged from 0.46 to 2.62 with an average of 1.44. The results suggest that the method for screening of SSR in the whole genome is feasible and efficient. SSR markers developed in this study can be used for population genetic studies of A. glabripennis. Moreover, they may also be helpful for the development of microsatellites for other Coleoptera.

Keywords: SSR markers, Anoplophora glabripennis, genetic diversity, whole genome

Procedia PDF Downloads 360

848 Development of Microsatellite Markers for Dalmatian Pyrethrum Using Next-Generation Sequencing

Authors: Ante Turudic, Filip Varga, Zlatko Liber, Jernej Jakse, Zlatko Satovic, Ivan Radosavljevic, Martina Grdisa

Abstract:

Microsatellites (SSRs) are highly informative repetitive sequences of 2-6 base pairs, which are the most used molecular markers in assessing the genetic diversity of plant species. Dalmatian pyrethrum (Tanacetum cinerariifolium /Trevir./ Sch. Bip) is an outcrossing diploid (2n = 18) endemic to the eastern Adriatic coast and source of the natural insecticide pyrethrin. Due to the high repetitiveness and large size of the genome (haploid genome size of 9,58 pg), previous attempts to develop microsatellite markers using the standard methods were unsuccessful. A next-generation sequencing (NGS) approach was applied on genomic DNA extracted from fresh leaves of Dalmatian pyrethrum. The sequencing was conducted using NovaSeq6000 Illumina sequencer, after which almost 400 million high-quality paired-end reads were obtained, with a read length of 150 base pairs. Short reads were assembled by combining two approaches; (1) de-novo assembly and (2) joining of overlapped pair-end reads. In total, 6.909.675 contigs were obtained, with the contig average length of 249 base pairs. Of the resulting contigs, 31.380 contained one or multiple microsatellite sequences, in total 35.556 microsatellite loci were identified. Out of detected microsatellites, dinucleotide repeats were the most frequent, accounting for more than half of all microsatellites identifies (21,212; 59.7%), followed by trinucleotide repeats (9,204; 25.9%). Tetra-, penta- and hexanucleotides had similar frequency of 1,822 (5.1%), 1,472 (4.1%), and 1,846 (5.2%), respectively. Contigs containing microsatellites were further filtered by SSR pattern type, transposon occurrences, assembly characteristics, GC content, and the number of occurrences against the draft genome of T. cinerariifolium published previously. After the selection process, 50 microsatellite loci were used for primer design. Designed primers were tested on samples from five distinct populations, and 25 of them showed a high degree of polymorphism. The selected loci were then genotyped on 20 samples belonging to one population resulting in 17 microsatellite markers. Availability of codominant SSR markers will significantly improve the knowledge on population genetic diversity and structure as well as complex genetics and biochemistry of this species. Acknowledgment: This work has been fully supported by the Croatian Science Foundation under the project ‘Genetic background of Dalmatian pyrethrum (Tanacetum cinerariifolium /Trevir/ Sch. Bip.) insecticidal potential’ - (PyrDiv) (IP-06-2016-9034).

Keywords: genome assembly, NGS, SSR, Tanacetum cinerariifolium

Procedia PDF Downloads 101

847 Exploring an Exome Target Capture Method for Cross-Species Population Genetic Studies

Authors: Benjamin A. Ha, Marco Morselli, Xinhui Paige Zhang, Elizabeth A. C. Heath-Heckman, Jonathan B. Puritz, David K. Jacobs

Abstract:

Next-generation sequencing has enhanced the ability to acquire massive amounts of sequence data to address classic population genetic questions for non-model organisms. Targeted approaches allow for cost effective or more precise analyses of relevant sequences; although, many such techniques require a known genome and it can be costly to purchase probes from a company. This is challenging for non-model organisms with no published genome and can be expensive for large population genetic studies. Expressed exome capture sequencing (EecSeq) synthesizes probes in the lab from expressed mRNA, which is used to capture and sequence the coding regions of genomic DNA from a pooled suite of samples. A normalization step produces probes to recover transcripts from a wide range of expression levels. This approach offers low cost recovery of a broad range of genes in the genome. This research project expands on EecSeq to investigate if mRNA from one taxon may be used to capture relevant sequences from a series of increasingly less closely related taxa. For this purpose, we propose to use the endangered Northern Tidewater goby, Eucyclogobius newberryi, a non-model organism that inhabits California coastal lagoons. mRNA will be extracted from E. newberryi to create probes and capture exomes from eight other taxa, including the more at-risk Southern Tidewater goby, E. kristinae, and more divergent species. Captured exomes will be sequenced, analyzed bioinformatically and phylogenetically, then compared to previously generated phylogenies across this group of gobies. This will provide an assessment of the utility of the technique in cross-species studies and for analyzing low genetic variation within species as is the case for E. kristinae. This method has potential applications to provide economical ways to expand population genetic and evolutionary biology studies for non-model organisms.

Keywords: coastal lagoons, endangered species, non-model organism, target capture method

Procedia PDF Downloads 161

846 Mitigating Ruminal Methanogenesis Through Genomic and Transcriptomic Approaches

Authors: Muhammad Adeel Arshad, Faiz-Ul Hassan, Yanfen Cheng

Abstract:

According to FAO, enteric methane (CH4) production is about 44% of all greenhouse gas emissions from the livestock sector. Ruminants produce CH4 as a result of fermentation of feed in the rumen especially from roughages which yield more CH4 per unit of biomass ingested as compared to concentrates. Efficient ruminal fermentation is not possible without abating CO2 and CH4. Methane abatement strategies are required to curb the predicted rise in emissions associated with greater ruminant production in future to meet ever increasing animal protein requirements. Ecology of ruminal methanogenesis and avenues for its mitigation can be identified through various genomic and transcriptomic techniques. Programs such as Hungate1000 and the Global Rumen Census have been launched to enhance our understanding about global ruminal microbial communities. Through Hungate1000 project, a comprehensive reference set of rumen microbial genome sequences has been developed from cultivated rumen bacteria and methanogenic archaea along with representative rumen anaerobic fungi and ciliate protozoa cultures. But still many species of rumen microbes are underrepresented especially uncultivable microbes. Lack of sequence information specific to the rumen's microbial community has inhibited efforts to use genomic data to identify specific set of species and their target genes involved in methanogenesis. Metagenomic and metatranscriptomic study of entire microbial rumen populations offer new perspectives to understand interaction of methanogens with other rumen microbes and their potential association with total gas and methane production. Deep understanding of methanogenic pathway will help to devise potentially effective strategies to abate methane production while increasing feed efficiency in ruminants.

Keywords: Genome sequences, Hungate1000, methanogens, ruminal fermentation

Procedia PDF Downloads 110

845 Metagenomics-Based Molecular Epidemiology of Viral Diseases

Authors: Vyacheslav Furtak, Merja Roivainen, Olga Mirochnichenko, Majid Laassri, Bella Bidzhieva, Tatiana Zagorodnyaya, Vladimir Chizhikov, Konstantin Chumakov

Abstract:

Molecular epidemiology and environmental surveillance are parts of a rational strategy to control infectious diseases. They have been widely used in the worldwide campaign to eradicate poliomyelitis, which otherwise would be complicated by the inability to rapidly respond to outbreaks and determine sources of the infection. The conventional scheme involves isolation of viruses from patients and the environment, followed by their identification by nucleotide sequences analysis to determine phylogenetic relationships. This is a tedious and time-consuming process that yields definitive results when it may be too late to implement countermeasures. Because of the difficulty of high-throughput full-genome sequencing, most such studies are conducted by sequencing only capsid genes or their parts. Therefore the important information about the contribution of other parts of the genome and inter- and intra-species recombination to viral evolution is not captured. Here we propose a new approach based on the rapid concentration of sewage samples with tangential flow filtration followed by deep sequencing and reconstruction of nucleotide sequences of viruses present in the samples. The entire nucleic acids content of each sample is sequenced, thus preserving in digital format the complete spectrum of viruses. A set of rapid algorithms was developed to separate deep sequence reads into discrete populations corresponding to each virus and assemble them into full-length consensus contigs, as well as to generate a complete profile of sequence heterogeneities in each of them. This provides an effective approach to study molecular epidemiology and evolution of natural viral populations.

Keywords: poliovirus, eradication, environmental surveillance, laboratory diagnosis

Procedia PDF Downloads 250

844 Bioinformatics Approach to Identify Physicochemical and Structural Properties Associated with Successful Cell-free Protein Synthesis

Authors: Alexander A. Tokmakov

Abstract:

Cell-free protein synthesis is widely used to synthesize recombinant proteins. It allows genome-scale expression of various polypeptides under strictly controlled uniform conditions. However, only a minor fraction of all proteins can be successfully expressed in the systems of protein synthesis that are currently used. The factors determining expression success are poorly understood. At present, the vast volume of data is accumulated in cell-free expression databases. It makes possible comprehensive bioinformatics analysis and identification of multiple features associated with successful cell-free expression. Here, we describe an approach aimed at identification of multiple physicochemical and structural properties of amino acid sequences associated with protein solubility and aggregation and highlight major correlations obtained using this approach. The developed method includes: categorical assessment of the protein expression data, calculation and prediction of multiple properties of expressed amino acid sequences, correlation of the individual properties with the expression scores, and evaluation of statistical significance of the observed correlations. Using this approach, we revealed a number of statistically significant correlations between calculated and predicted features of protein sequences and their amenability to cell-free expression. It was found that some of the features, such as protein pI, hydrophobicity, presence of signal sequences, etc., are mostly related to protein solubility, whereas the others, such as protein length, number of disulfide bonds, content of secondary structure, etc., affect mainly the expression propensity. We also demonstrated that amenability of polypeptide sequences to cell-free expression correlates with the presence of multiple sites of post-translational modifications. The correlations revealed in this study provide a plethora of important insights into protein folding and rationalization of protein production. The developed bioinformatics approach can be of practical use for predicting expression success and optimizing cell-free protein synthesis.

Keywords: bioinformatics analysis, cell-free protein synthesis, expression success, optimization, recombinant proteins

Procedia PDF Downloads 385

843 Isolation and Molecular Characterization of Lytic Bacteriophage against Carbapenem Resistant Klebsiella pneumoniae

Authors: Guna Raj Dhungana, Roshan Nepal, Apshara Parajuli, , Archana Maharjan, Shyam K. Mishra, Pramod Aryal, Rajani Malla

Abstract:

Introduction: Klebsiella pneumoniae is a well-known opportunistic human pathogen, primarily causing healthcare-associated infections. The global emergence of carbapenemase-producing K. pneumoniaeis a major public health burden, which is often extensively multidrug resistant.Thus, because of the difficulty to treat these ‘superbug’ and menace and some term as ‘apocalypse’ of post antibiotics era, an alternative approach to controlling this pathogen is prudent and one of the approaches is phage mediated control and/or treatment. Objective: In this study, we aimed to isolate novel bacteriophage against carbapenemase-producing K. pneumoniaeand characterize for potential use inphage therapy. Material and Methods: Twenty lytic phages were isolated from river water using double layer agar assay and purified. Biological features, physiochemical characters, burst size, host specificity and activity spectrum of phages were determined. One most potent phage: Phage TU_Kle10O was selected and characterized by electron microscopy. Whole genome sequences of the phage were analyzed for presence/absence of virulent factors, and other lysin genes. Results: Novel phage TU_Kle10O showed multiple host range within own genus and did not induce any BIM up to 5th generation of host’s life cycle. Electron microscopy confirmed that the phage was tailed and belonged to Caudovirales family. Next generation sequencing revealed its genome to be 166.2 Kb. bioinformatical analysis further confirmed that the phage genome ‘did not’ contain any ‘bacterial genes’ within phage genome, which ruled out the concern for transfer of virulent genes. Specific 'lysin’ enzyme was identified phages which could be used as 'antibiotics'. Conclusion: Extensively multidrug resistant bacteria like carbapenemase-producing K. pneumoniaecould be treated efficiently by phages.Absence of ‘virulent’ genes of bacterial origin and presence of lysin proteins within phage genome makes phages an excellent candidate for therapeutics.

Keywords: bacteriophage, Klebsiella pneumoniae, MDR, phage therapy, carbapenemase,

Procedia PDF Downloads 153

842 Genome Editing in Sorghum: Advancements and Future Possibilities: A Review

Authors: Micheale Yifter Weldemichael, Hailay Mehari Gebremedhn, Teklehaimanot Hailesslasie

Abstract:

The advancement of target-specific genome editing tools, including clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein9 (Cas9), mega-nucleases, base editing (BE), prime editing (PE), transcription activator-like endonucleases (TALENs), and zinc-finger nucleases (ZFNs), have paved the way for a modern era of gene editing. CRISPR/Cas9, as a versatile, simple, cost-effective and robust system for genome editing, has dominated the genome manipulation field over the last few years. The application of CRISPR/Cas9 in sorghum improvement is particularly vital in the context of ecological, environmental and agricultural challenges, as well as global climate change. In this context, gene editing using CRISPR/Cas9 can improve nutritional value, yield, resistance to pests and disease and tolerance to different abiotic stress. Moreover, CRISPR/Cas9 can potentially perform complex editing to reshape already available elite varieties and new genetic variations. However, existing research is targeted at improving even further the effectiveness of the CRISPR/Cas9 genome editing techniques to fruitfully edit endogenous sorghum genes. These findings suggest that genome editing is a feasible and successful venture in sorghum. Newer improvements and developments of CRISPR/Cas9 techniques have further qualified researchers to modify extra genes in sorghum with improved efficiency. The fruitful application and development of CRISPR techniques for genome editing in sorghum will not only help in gene discovery, creating new, improved traits in sorghum regulating gene expression sorghum functional genomics, but also in making site-specific integration events.

Keywords: CRISPR/Cas9, genome editing, quality, sorghum, stress, yield

Procedia PDF Downloads 34

841 Prediction of Solanum Lycopersicum Genome Encoded microRNAs Targeting Tomato Spotted Wilt Virus

Authors: Muhammad Shahzad Iqbal, Zobia Sarwar, Salah-ud-Din

Abstract:

Tomato spotted wilt virus (TSWV) belongs to the genus Tospoviruses (family Bunyaviridae). It is one of the most devastating pathogens of tomato (Solanum Lycopersicum) and heavily damages the crop yield each year around the globe. In this study, we retrieved 329 mature miRNA sequences from two microRNA databases (miRBase and miRSoldb) and checked the putative target sites in the downloaded-genome sequence of TSWV. A consensus of three miRNA target prediction tools (RNA22, miRanda and psRNATarget) was used to screen the false-positive microRNAs targeting sites in the TSWV genome. These tools calculated different target sites by calculating minimum free energy (mfe), site-complementarity, minimum folding energy and other microRNA-mRNA binding factors. R language was used to plot the predicted target-site data. All the genes having possible target sites for different miRNAs were screened by building a consensus table. Out of these 329 mature miRNAs predicted by three algorithms, only eight miRNAs met all the criteria/threshold specifications. MC-Fold and MC-Sym were used to predict three-dimensional structures of miRNAs and further analyzed in USCF chimera to visualize the structural and conformational changes before and after microRNA-mRNA interactions. The results of the current study show that the predicted eight miRNAs could further be evaluated by in vitro experiments to develop TSWV-resistant transgenic tomato plants in the future.

Keywords: tomato spotted wild virus (TSWV), Solanum lycopersicum, plant virus, miRNAs, microRNA target prediction, mRNA

Procedia PDF Downloads 120

840 Genome-Wide Association Study Identify COL2A1 as a Susceptibility Gene for the Hand Development Failure of Kashin-Beck Disease

Authors: Feng Zhang

Abstract:

Kashin-Beck disease (KBD) is a chronic osteochondropathy. The mechanism of hand growth and development failure of KBD remains elusive now. In this study, we conducted a two-stage genome-wide association study (GWAS) of palmar length-width ratio (LWR) of KBD, totally involving 493 Chinese Han KBD patients. Affymetrix Genome Wide Human SNP Array 6.0 was applied for SNP genotyping. Association analysis was conducted by PLINK software. Imputation analysis was performed by IMPUTE against the reference panel of the 1000 genome project. In the GWAS, the most significant association was observed between palmar LWR and rs2071358 of COL2A1 gene (P value = 4.68×10-8). Imputation analysis identified 3 SNPs surrounding rs2071358 with significant or suggestive association signals. Replication study observed additional significant association signals at both rs2071358 (P value = 0.017) and rs4760608 (P value = 0.002) of COL2A1 gene after Bonferroni correction. Our results suggest that COL2A1 gene was a novel susceptibility gene involved in the growth and development failure of hand of KBD.

Keywords: Kashin-Beck disease, genome-wide association study, COL2A1, hand

Procedia PDF Downloads 181

839 Exploring Emerging Viruses From a Protected Reserve

Authors: Nemat Sokhandan Bashir

Abstract:

Threats from viruses to agricultural crops could be even larger than the losses caused by the other pathogens because, in many cases, the viral infection is latent but crucial from an epidemic point of view. Wild vegetation can be a source of many viruses that eventually find their destiny in crop plants. Although often asymptomatic in wild plants due to adaptation, they can potentially cause serious losses in crops. Therefore, exploring viruses in wild vegetation is very important. Recently, omics have been quite useful for exploring plant viruses from various plant sources, especially wild vegetation. For instance, we have discovered viruses such as Ambrossia asymptomatic virus I (AAV-1) through the application of metagenomics from Oklahoma Prairie Reserve. Accordingly, extracts from randomly-sampled plants are subjected to high speed and ultracentrifugation to separated virus-like particles (VLP), then nucleic acids in the form of DNA or RNA are extracted from such VLPs by treatment with phenol—chloroform and subsequent precipitation by ethanol. The nucleic acid preparations are separately treated with RNAse or DNAse in order to determine the genome component of VLPs. In the case of RNAs, the complementary cDNAs are synthesized before submitting to DNA sequencing. However, for VLPs with DNA contents, the procedure would be relatively straightforward without making cDNA. Because the length of the nucleic acid content of VPLs can be different, various strategies are employed to achieve sequencing. Techniques similar to so-called "chromosome walking" may be used to achieve sequences of long segments. When the nucleotide sequence data were obtained, they were subjected to BLAST analysis to determine the most related previously reported virus sequences. In one case, we determined that the novel virus was AAV-l because the sequence comparison and analysis revealed that the reads were the closest to the Indian citrus ringspot virus (ICRSV). AAV—l had an RNA genome with 7408 nucleotides in length and contained six open reading frames (ORFs). Based on phylogenies inferred from the replicase and coat protein ORFs of the virus, it was placed in the genus Mandarivirus.

Keywords: wild, plant, novel, metagenomics

Procedia PDF Downloads 45

838 A Hybrid Feature Selection and Deep Learning Algorithm for Cancer Disease Classification

Authors: Niousha Bagheri Khulenjani, Mohammad Saniee Abadeh

Abstract:

Learning from very big datasets is a significant problem for most present data mining and machine learning algorithms. MicroRNA (miRNA) is one of the important big genomic and non-coding datasets presenting the genome sequences. In this paper, a hybrid method for the classification of the miRNA data is proposed. Due to the variety of cancers and high number of genes, analyzing the miRNA dataset has been a challenging problem for researchers. The number of features corresponding to the number of samples is high and the data suffer from being imbalanced. The feature selection method has been used to select features having more ability to distinguish classes and eliminating obscures features. Afterward, a Convolutional Neural Network (CNN) classifier for classification of cancer types is utilized, which employs a Genetic Algorithm to highlight optimized hyper-parameters of CNN. In order to make the process of classification by CNN faster, Graphics Processing Unit (GPU) is recommended for calculating the mathematic equation in a parallel way. The proposed method is tested on a real-world dataset with 8,129 patients, 29 different types of tumors, and 1,046 miRNA biomarkers, taken from The Cancer Genome Atlas (TCGA) database.

Keywords: cancer classification, feature selection, deep learning, genetic algorithm

Procedia PDF Downloads 88

837 Genome Characterization and Phylogeny Analysis of Viruses Infected Invertebrates, Parvoviridae Family

Authors: Niloofar Fariborzi, Hamzeh Alipour, Kourosh Azizi, Neda Eskandarzade, Abozar Ghorbani

Abstract:

The family Parvoviridae consists of a large diversity of single-stranded DNA viruses, which cause mild to severe diseases in both vertebrates and invertebrates. The Parvoviridae are classified into three subfamilies: Parvovirinae infect vertebrates, Densovirinae infects invertebrates, while Hamaparovirinae infects both vertebrates and invertebrates. Except for the NS1 region, which is the prime criterion for phylogeny analysis, other parts of the parvoviruses genome, such as UTRs, are diverse even among closely related viruses or within the same genus. It is believed that host switching in parvoviruses may be related to genetic changes in regions other than NS1; therefore, whole-genome screening is valuable for studying parvoviruses' host-virus interactions. The aim of this study was to analyze genome organization and phylogeny of the complete genome sequence of the 132 Paroviridae family members, focusing on viruses that infect invertebrates. The maximum and minimum divergence within each subfamily belonged to Densovirinae and Parvovirinae, respectively. The greatest evolutionary divergence was between Hamaparovirinae and Parvovirinae. Unclassified viruses were mostly from Parovirinae and had the highest divergence to densoviruses and the lowest divergence to Parovirinae viruses. In a phylogenetic tree, all hamparoviruses were found in the center of densoviruses, with the exception of Syngnathid Ichthamaparvovirus 1 (NC_055527), which was positioned between two Parvovirinae members (NC _022089 and NC_038544). The proximity of hamparoviruses members to some densoviruses strengthens the possibility that densoviruses may be the ancestors of hamaparoviruses or vice versa. Therefore, examination and phylogeny analysis of the whole genome is necessary to understand Parvoviridae family host selection.

Keywords: densoviruses, parvoviridae, bioinformatics, phylogeny

Procedia PDF Downloads 59

836 Genome-Wide Analysis of Long Terminal Repeat (LTR) Retrotransposons in Rabbit (Oryctolagus cuniculus)

Authors: Zeeshan Khan, Faisal Nouroz, Shumaila Noureen

Abstract:

European or common rabbit (Oryctolagus cuniculus) belongs to class Mammalia, order Lagomorpha of family Leporidae. They are distributed worldwide and are native to Europe (France, Spain and Portugal) and Africa (Morocco and Algeria). LTR retrotransposons are major Class I mobile genetic elements of eukaryotic genomes and play a crucial role in genome expansion, evolution and diversification. They were mostly annotated in various genomes by conventional approaches of homology searches, which restricted the annotation of novel elements. Present work involved de novo identification of LTR retrotransposons by LTR_FINDER in haploid genome of rabbit (2247.74 Mb) distributed in 22 chromosomes, of which 7,933 putative full-length or partial copies were identified containing 69.38 Mb of elements, accounting 3.08% of the genome. Highest copy numbers (731) were found on chromosome 7, followed by chromosome 12 (705), while the lowest copy numbers (27) were detected in chromosome 19 with no elements identified from chromosome 21 due to partially sequenced chromosome, unidentified nucleotides (N) and repeated simple sequence repeats (SSRs). The identified elements ranged in sizes from 1.2 - 25.8 Kb with average sizes between 2-10 Kb. Highest percentage (4.77%) of elements was found in chromosome 15, while lowest (0.55%) in chromosome 19. The most frequent tRNA type was Arginine present in majority of the elements. Based on gained results, it was estimated that rabbit exhibits 15,866 copies having 137.73 Mb of elements accounting 6.16% of diploid genome (44 chromosomes). Further molecular analyses will be helpful in chromosomal localization and distribution of these elements on chromosomes.

Keywords: rabbit, LTR retrotransposons, genome, chromosome

Procedia PDF Downloads 120

835 Genomics of Aquatic Adaptation

Authors: Agostinho Antunes

Abstract:

The completion of the human genome sequencing in 2003 opened a new perspective into the importance of whole genome sequencing projects, and currently multiple species are having their genomes completed sequenced, from simple organisms, such as bacteria, to more complex taxa, such as mammals. This voluminous sequencing data generated across multiple organisms provides also the framework to better understand the genetic makeup of such species and related ones, allowing to explore the genetic changes underlining the evolution of diverse phenotypic traits. Here, recent results from our group retrieved from comparative evolutionary genomic analyses of selected marine animal species will be considered to exemplify how gene novelty and gene enhancement by positive selection might have been determinant in the success of adaptive radiations into diverse habitats and lifestyles.

Keywords: comparative genomics, adaptive evolution, bioinformatics, phylogenetics, genome mining

Procedia PDF Downloads 502

834 Human Papillomavirus Type 16 E4 Gene Variation as Risk Factor for Cervical Cancer

Authors: Yudi Zhao, Ziyun Zhou, Yueting Yao, Shuying Dai, Zhiling Yan, Longyu Yang, Chuanyin Li, Li Shi, Yufeng Yao

Abstract:

HPV16 E4 gene plays an important role in viral genome amplification and release. Therefore, a variation of the E4 gene nucleic acid sequence may affect the carcinogenicity of HPV16. In order to understand the relationship between the variation of HPV16 E4 gene and cervical cancer, this study was to amplify and sequence the DNA sequences of E4 genes in 118 HPV16-positive cervical cancer patients and 151 HPV16-positive asymptomatic individuals. After obtaining E4 gene sequences, the phylogenetic trees were constructed by the Neighbor-joining method for gene variation analysis. The results showed that: 1) The distribution of HPV16 variants between the case group and the control group differed greatly (P = 0.015)，and the Asian-American（AA）variant was likely to relate to the occurrence of cervical cancer. 2) DNA sequence analysis showed that there were significant differences in the distribution of 8 variants between the case group and the control group (P < 0.05). And 3) In European (EUR) variant, two variations, C3384T (L18L) and A3449G (P39P), were associated with the initiation and development of cervical cancer. The results suggested that the variation of HPV16 E4 gene may be a contributor affecting the occurrence as well as the development of cervical cancer, and different HPV16 variants may have different carcinogenic capability.

Keywords: cervical cancer, HPV16, E4 gene, variations

Procedia PDF Downloads 141

833 Genomics of Adaptation in the Sea

Authors: Agostinho Antunes

Abstract:

Keywords: marine genomics, evolutionary bioinformatics, human genome sequencing, genomic analyses

Procedia PDF Downloads 584

832 Evaluating the Potential of a Fast Growing Indian Marine Cyanobacterium by Reconstructing and Analysis of a Genome Scale Metabolic Model

Authors: Ruchi Pathania, Ahmad Ahmad, Shireesh Srivastava

Abstract:

Cyanobacteria is a promising microbe that can capture and convert atmospheric CO₂ and light into valuable industrial bio-products like biofuels, biodegradable plastics, etc. Among their most attractive traits are faster autotrophic growth, whole year cultivation using non-arable land, high photosynthetic activity, much greater biomass and productivity and easy for genetic manipulations. Cyanobacteria store carbon in the form of glycogen which can be hydrolyzed to release glucose and fermented to form bioethanol or other valuable products. Marine cyanobacterial species are especially attractive for countries with scarcity of freshwater. We recently identified a marine native cyanobacterium Synechococcus sp. BDU 130192 which has good growth rate and high level of polyglucans accumulation compared to Synechococcus PCC 7002. In this study, firstly we sequenced the whole genome and the sequences were annotated using the RAST server. Genome scale metabolic model (GSMM) was reconstructed through COBRA toolbox. GSMM is a computational representation of the metabolic reactions and metabolites of the target strain. GSMMs construction through the application of Flux Balance Analysis (FBA), which uses external nutrient uptake rates and estimate steady state intracellular and extracellular reaction fluxes, including maximization of cell growth. The model, which we have named isyn942, includes 942 reactions and 913 metabolites having 831 metabolic, 78 transport and 33 exchange reactions. The phylogenetic tree obtained by BLAST search revealed that the strain was a close relative of Synechococcus PCC 7002. The flux balance analysis (FBA) was applied on the model iSyn942 to predict the theoretical yields (mol product produced/mol CO₂ consumed) for native and non-native products like acetone, butanol, etc. under phototrophic condition by applying metabolic engineering strategies. The reported strain can be a viable strain for biotechnological applications, and the model will be helpful to researchers interested in understanding the metabolism as well as to design metabolic engineering strategies for enhanced production of various bioproducts.

Keywords: cyanobacteria, flux balance analysis, genome scale metabolic model, metabolic engineering

Procedia PDF Downloads 123

831 Modified Genome-Scale Metabolic Model of Escherichia coli by Adding Hyaluronic Acid Biosynthesis-Related Enzymes (GLMU2 and HYAD) from Pasteurella multocida

Authors: P. Pasomboon, P. Chumnanpuen, T. E-kobon

Abstract:

Hyaluronic acid (HA) consists of linear heteropolysaccharides repeat of D-glucuronic acid and N-acetyl-D-glucosamine. HA has various useful properties to maintain skin elasticity and moisture, reduce inflammation, and lubricate the movement of various body parts without causing immunogenic allergy. HA can be found in several animal tissues as well as in the capsule component of some bacteria including Pasteurella multocida. This study aimed to modify a genome-scale metabolic model of Escherichia coli using computational simulation and flux analysis methods to predict HA productivity under different carbon sources and nitrogen supplement by the addition of two enzymes (GLMU2 and HYAD) from P. multocida to improve the HA production under the specified amount of carbon sources and nitrogen supplements. Result revealed that threonine and aspartate supplement raised the HA production by 12.186%. Our analyses proposed the genome-scale metabolic model is useful for improving the HA production and narrows the number of conditions to be tested further.

Keywords: Pasteurella multocida, Escherichia coli, hyaluronic acid, genome-scale metabolic model, bioinformatics

Procedia PDF Downloads 97

830 A Geometrical Perspective on the Insulin Evolution

Authors: Yuhei Kunihiro, Sorin V. Sabau, Kazuhiro Shibuya

Abstract:

We study the molecular evolution of insulin from the metric geometry point of view. In mathematics, and particularly in geometry, distances and metrics between objects are of fundamental importance. Using a weaker notion than the classical distance, namely the weighted quasi-metrics, one can study the geometry of biological sequences (DNA, mRNA, or proteins) space. We analyze from the geometrical point of view a family of 60 insulin homologous sequences ranging on a large variety of living organisms from human to the nematode C. elegans. We show that the distances between sequences provide important information about the evolution and function of insulin.

Keywords: metric geometry, evolution, insulin, C. elegans

Procedia PDF Downloads 304

829 Societal Acceptability Conditions of Genome Editing for Upland Rice in Madagascar

Authors: Anny Lucrece Nlend Nkott, Ludovic Temple

Abstract:

The appearance in 2012 of the CRISPR-CaS9 genome editing technique marks a turning point in the field of genetics. This technique would make it possible to create new varieties quickly and cheaply. Although some consider CRISPR-CaS9 to be revolutionary, others consider it a potential societal threat. To document the controversy, we explain the socioeconomic conditions under which this technique could be accepted for the creation of a rainfed rice variety in Madagascar. The methodological framework is based on 38 individual and semistructured interviews, a multistakeholder forum with 27 participants, and a survey of 148 rice producers. Results reveal that the acceptability of genome editing requires (i) strengthening the seed system through the operationalization of regulatory structures and the upgrading of stakeholders' knowledge of genetically modified organisms, (ii) assessing the effects of the edited variety on biodiversity and soil nitrogen dynamics, and (iii) strengthening the technical and human capacities of the biosafety body. Structural mechanisms for regulating the seed system are necessary to ensure safe experimentation of genome editing techniques. Organizational innovation also appears to be necessary. The study documents how collective learning between communities of scientists and nonscientists is a component of systemic processes of varietal innovation. This study was carried out with the financial support of the GENERICE project (Generation and Deployment of Genome-Edited, Nitrogen-use-Efficient Rice Varieties), funded by the Agropolis Foundation.

Keywords: CRISPR-CaS9, varietal innovation, seed system, innovation system

Procedia PDF Downloads 117

828 Molecular Characterization of Grain Storage Proteins in Some Hordeum Species

Authors: Manar Makhoul, Buthainah Alsalamah, Salam Lawand, Hassan Azzam

Abstract:

The major storage proteins in endosperm of 33 cultivated and wild barley genotypes (H.vulgare, H. spontaneum, H. bulbosum, H. murinum, H. marinum) were analyzed to demonstrate the variation in the hordein polypeptides encoded by multigene families in grains. The SDS-PAGE revealed 13 and 17 alleles at the Hor1 and the Hor2 loci respectively, with frequencies from 0.83 to 14 and 0.56 to 13.41% respectively, while seven alleles at the Hor3 locus with frequencies from 3.63 to 30.91% were recognized. The phylogenetic analysis indicated to relevance of the polymorphism in hordein patterns as successful tool in identifying the individual genotypes and discriminating the species according to genome type. We also reported in this research complete nucleotide sequence B-hordein genes of seven wild and cultivated barley genotypes. A 152bp upstream sequence of B-hordein promoter contained a TATA box, CATC box, AAAG motif, N-motif and E-motif. In silico analysis of B-Hordein sequences demonstrated that the coding regions were not interrupted by any intron, and included the complete ORF which varied between 882 and 906 bp, and encoded mature proteins with 293-301 residues characterized by high contents of glutamine (29%), and proline (18%). Comparison of the predicted polypeptide sequences with the published ones suggested that all S-rich prolamins genes are descended from common ancestor. The sequence started at N-terminal with a signal peptide, and then followed directly by two domains; a repetitive one based on the repetition of the repeat unit PQQPFPQQ and C-terminal domain. Also, it was found that positions of the eight cysteine residues were highly conserved in all the B-hordein sequences, but Hordeum bulbosum had additional unpaired one. The phylogenetic tree of B-hordein polypeptide separated the genotypes in distinct seven subgroups. In general, the high homology between B-hordeins and LMW glutenin subunits suggests similar bread-making influences for these B-hordeins.

Keywords: hordeum, phylogenetic tree, sequencing, storage protein

Procedia PDF Downloads 230

827 Occurrence of Porcine circovirus Type 2 in Pigs of Eastern Cape Province South Africa

Authors: Kayode O. Afolabi, Benson C. Iweriebor, Anthony I. Okoh, Larry C. Obi

Abstract:

Porcine circovirus type 2 (PCV2) is the major etiological viral agent of porcine multisystemic wasting syndrome (PWMS) and other porcine circovirus-associated diseases (PCVAD) of great economic importance in pig industry globally. In an effort to determine the status of swine herds in the Province as regarding the ‘small but powerful’ viral pathogen; a total of 375 blood, faecal and nasal swab samples were obtained from seven pig farms (commercial and communal) in Amathole, O.R. Tambo and Chris-Hani District Municipalities of Eastern Cape Province between the year 2015 and 2016. Three hundred and thirty nine (339) samples out of the total sample were subjected to molecular screening using PCV2 specific primers by conventional polymerase chain reaction (PCR). Selected sequences were further analyzed and confirmed through genome sequencing and phylogenetic analyses. The data obtained revealed that 15.93% of the screened samples (54/339) from the swine herds of the studied areas were positive for PCV2; while the severity of occurrence of the viral pathogen as observed at farm level ranges from approximately 5.6% to 60% in the studied farms. The Majority, precisely 15 out of 17 (88%) analyzed sequences were found clustering with other PCV2b reference strains in the phylogenetic analysis. More interestingly, two other sequences obtained were also found clustering within PCV2d genogroup, which is presently another fast-spreading genotype with observable higher virulence in global swine herds. This finding confirmed the presence of this all-important viral pathogen in pigs of the region; which could result in a serious outbreak of PCVAD and huge economic loss at the instances of triggering factors if no appropriate measures are taken to curb its spread effectively.

Keywords: pigs, polymerase chain reaction, porcine circovirus type 2, South Africa

Procedia PDF Downloads 181

826 Analysis of Endogenous Sirevirus in Germinating Barley (Hordeum vulgare L.)

Authors: Nermin Gozukirmizi, Buket Cakmak, Sevgi Marakli

Abstract:

Sireviruses are genera of copia LTR retrotransposons with a unique genome structure among retrotransposons. Barley (Hordeum vulgare L.) is an economically important plant and has been studied as a model plant regarding its short annual life cycle and seven chromosome pairs. In this study, we used mature barley embryos, 10-day-old roots and 10-day-old leaves derived from the same barley plant to investigate SIRE1 retrotransposon movements by Inter-Retrotransposon Amplified Polymorphism (IRAP) technique. We found polymorphism rates between 0-64% among embryos, roots and leaves. Polymorphism rates were detected to be 0-27% among embryos, 8-60% among roots, and 11-50% among leaves. Polymorphisms were observed not only among the parts of different individuals, but also on the parts of the same plant (23-64%). The internal domains of SIRE1 (gag, env and rt) were also analyzed in the embryos, roots and leaves. Analysis of band profiles showed no polymorphism for gag, however, different band patterns were observed among samples for rt and env. The sequencing of SIRE1 gag, env and rt domains revealed 79% similarity for gag, 95% for env and 84% for rt to Ty1-copia retrotransposons. SIRE1 retrotransposon was identified in the soybean genome and has been studied on other plants (maize, rice, tomatoe etc.). This study is the first detailed investigation of SIRE1 in barley genome. The obtained findings are expected to contribute to the comprehension of SIRE1 retrotransposon and its role in barley genome.

Keywords: barley, polymorphism, retrotransposon, SIRE1 virus

Procedia PDF Downloads 275

825 DeepOmics: Deep Learning for Understanding Genome Functioning and the Underlying Genetic Causes of Disease

Authors: Vishnu Pratap Singh Kirar, Madhuri Saxena

Abstract:

Advancement in sequence data generation technologies is churning out voluminous omics data and posing a massive challenge to annotate the biological functional features. With so much data available, the use of machine learning methods and tools to make novel inferences has become obvious. Machine learning methods have been successfully applied to a lot of disciplines, including computational biology and bioinformatics. Researchers in computational biology are interested to develop novel machine learning frameworks to classify the huge amounts of biological data. In this proposal, it plan to employ novel machine learning approaches to aid the understanding of how apparently innocuous mutations (in intergenic DNA and at synonymous sites) cause diseases. We are also interested in discovering novel functional sites in the genome and mutations in which can affect a phenotype of interest.

Keywords: genome wide association studies (GWAS), next generation sequencing (NGS), deep learning, omics

Procedia PDF Downloads 63

824 Analysis on Thermococcus achaeans with Frequent Pattern Mining

Authors: Jeongyeob Hong, Myeonghoon Park, Taeson Yoon

Abstract:

After the advent of Achaeans which utilize different metabolism pathway and contain conspicuously different cellular structure, they have been recognized as possible materials for developing quality of human beings. Among diverse Achaeans, in this paper, we compared 16s RNA Sequences of four different species of Thermococcus: Achaeans genus specialized in sulfur-dealing metabolism. Four Species, Barophilus, Kodakarensis, Hydrothermalis, and Onnurineus, live near the hydrothermal vent that emits extreme amount of sulfur and heat. By comparing ribosomal sequences of aforementioned four species, we found similarities in their sequences and expressed protein, enabling us to expect that certain ribosomal sequence or proteins are vital for their survival. Apriori algorithms and Decision Tree were used. for comparison.

Keywords: Achaeans, Thermococcus, apriori algorithm, decision tree

Procedia PDF Downloads 266

823 Molecular Characterization and Phylogenetic Analysis of Influenza a(H3N2) Virus Circulating during the 2010-2011 in Riyadh, Saudi Arabia

Authors: Ghazanfar Ali, Fahad N Almajhdi

Abstract:

This study provides data on the viral diagnosis and molecular epidemiology of influenza A(H3N2) virus isolated in Riyadh, Saudi Arabia. Nasopharyngeal aspirates from 80 clinically infected patients in the peak of the 2010-2011 winter seasons were processed for viral diagnosis by RT-PCR. Sequencing of entire HA and NA genes of representative isolates and molecular epidemiological analysis were performed. A total of 06 patients were positive for influenza A, B and respiratory syncytial viruses by RT-PCR assays; out of these only one sample was positive for influenza A(H3N2) by RT-PCR. Phylogenetic analysis of the HA and NA gene sequences showed identities higher than 99-98.8 % in both genes. They were also similar to reference isolates in HA sequences (99 % identity) and in NA sequences (99 % identity). Amino acid sequences predicted for the HA gene were highly identical to reference strains. The NA amino acid substitutions identified did not include the oseltamivir-resistant H275Y substitution. Conclusion: Viral isolation and RT-PCR together were useful for diagnosis of the influenza A (H3N2) virus. Variations in HA and NA sequences are similar to those identified in worldwide reference isolates and no drug resistance was found.

Keywords: influenza A (H3N2), genetic characterization, viral isolation, RT-PCR, Saudi Arabia

Procedia PDF Downloads 234

822 CMPD: Cancer Mutant Proteome Database

Authors: Po-Jung Huang, Chi-Ching Lee, Bertrand Chin-Ming Tan, Yuan-Ming Yeh, Julie Lichieh Chu, Tin-Wen Chen, Cheng-Yang Lee, Ruei-Chi Gan, Hsuan Liu, Petrus Tang

Abstract:

Whole-exome sequencing focuses on the protein coding regions of disease/cancer associated genes based on a priori knowledge is the most cost-effective method to study the association between genetic alterations and disease. Recent advances in high throughput sequencing technologies and proteomic techniques has provided an opportunity to integrate genomics and proteomics, allowing readily detectable mutated peptides corresponding to mutated genes. Since sequence database search is the most widely used method for protein identification using Mass spectrometry (MS)-based proteomics technology, a mutant proteome database is required to better approximate the real protein pool to improve disease-associated mutated protein identification. Large-scale whole exome/genome sequencing studies were launched by National Cancer Institute (NCI), Broad Institute, and The Cancer Genome Atlas (TCGA), which provide not only a comprehensive report on the analysis of coding variants in diverse samples cell lines but a invaluable resource for extensive research community. No existing database is available for the collection of mutant protein sequences related to the identified variants in these studies. CMPD is designed to address this issue, serving as a bridge between genomic data and proteomic studies and focusing on protein sequence-altering variations originated from both germline and cancer-associated somatic variations.

Keywords: TCGA, cancer, mutant, proteome

Procedia PDF Downloads 562