Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 26867

Search results for: genome analysis

26717 Sesame (Sesamum Indicum L.): Molecular Breeding and Transformation

Authors: Micheale Yifter Weldemichael, Stefaan Werbrouck, Hailay Mehari Gebremedhn

Abstract:

Sesame (Sesamum indicum L.) is among the most important oilseed crops for its high edible oil quality and quantity. Sesame is grown for food, medicinal, pharmaceutical, and industrial uses. Sesame is also cultivated as a main cash crop in Asia and Africa by smallholder farmers. Despite the global exponential increase in sesame cultivation area, its production and productivity remain low, mainly due to biotic and abiotic constraints. Notwithstanding the efforts to solve these problems, a low level of genetic variation and inadequate genomic resources hinder the progress of sesame improvement. The objective of this paper is, therefore, to review recent advances in the area of molecular breeding and transformation to overcome major production constraints and could result in enhanced and sustained sesame production. This paper reviews various researches conducted to date on molecular breeding and genetic transformation in sesame focusing on molecular markers used in assessing the available online database resources, genes responsible for key agronomic traits as well as transgenic technology and genome editing. The review concentrates on quantitative and semi-quantitative studies on molecular breeding for key agronomic traits such as improvement of yield components, oil and oil-related traits, disease and insect/pest resistance, and drought, waterlogging and salt tolerance, as well as sesame genetic transformation and genome editing techniques. Pitfalls and limitations of existing studies and methodologies used so far are identified and some priorities for future research directions in sesame genetic improvement are identified in this review.

Keywords: molecular breeding, oil, sesame, shattering

Procedia PDF Downloads 40

26716 Development of Microsatellite Markers for Genetic Variation Analysis in House Cricket, Acheta domesticus

Authors: Yash M. Gupta, Kittisak Buddhachat, Surin Peyachoknagul, Somjit Homchan

Abstract:

The house cricket, Acheta domesticus is one of the commonly found species of field crickets. Although it is very commonly used as food and feed, the genomic information of house cricket is still missing for genetic investigation. DNA sequencing technology has evolved over the decades, and it has also revolutionized the molecular marker development for genetic analysis. In the present study, we have sequenced the whole genome of A. domesticus using illumina platform based HiSeq X Ten sequencing technology for searching simple sequence repeats (SSRs) in DNA to develop polymorphic microsatellite markers for population genetic analysis. A total of 112,157 SSRs with primer pairs were identified, 91 randomly selected SSRs used to check DNA amplification, of which nine primers were polymorphic. These microsatellite markers have shown cross-amplification with other three species of crickets which are Gryllus bimaculatus, Gryllus testaceus and Brachytrupes portentosus. These nine polymorphic microsatellite markers were used to check genetic variation for forty-five individuals of A. domesticus, Phitsanulok population, Thailand. For nine loci, the number of alleles was ranging from 5 to 15. The observed heterozygosity was ranged from 0.4091 to 0.7556. These microsatellite markers will facilitate population genetic analysis for future studies of A. domesticus populations. Moreover, the transferability of these SSR makers would also enable researchers to conduct genetic studies for other closely related species.

Keywords: cross-amplification, microsatellite markers, observed heterozygosity, population genetic, simple sequence repeats

Procedia PDF Downloads 112

26715 Prevalence and Mechanisms of Antibiotic Resistance in Escherichia coli Isolated from Mastitic Dairy Cattle in Canada

Authors: Satwik Majumder, Dongyun Jung, Jennifer Ronholm, Saji George

Abstract:

Bovine mastitis is the most common infectious disease in dairy cattle, with major economic implications for the dairy industry worldwide. Continuous monitoring for the emergence of antimicrobial resistance (AMR) among bacterial isolates from dairy farms is vital not only for animal husbandry but also for public health. In this study, the prevalence of AMR in 113 Escherichia coli isolates from cases of bovine clinical mastitis in Canada was investigated. Kirby-Bauer disk diffusion test with 18 antibiotics and microdilution method with three heavy metals (copper, zinc, and silver) was performed to determine the antibiotic and heavy-metal susceptibility. Resistant strains were assessed for efflux and ß-lactamase activities besides assessing biofilm formation and hemolysis. Whole-genome sequences for each of the isolates were examined to detect the presence of genes corresponding to the observed AMR and virulence factors. Phenotypic analysis revealed that 32 isolates were resistant to one or more antibiotics, and 107 showed resistance against at least one heavy metal. Quinolones and silver were the most efficient against the tested isolates. Among the AMR isolates, AcrAB-TolC efflux activity and ß-lactamase enzyme activities were detected in 13 and 14 isolates, respectively. All isolates produced biofilm but with different capacities, and 33 isolates showed α-hemolysin activity. A positive correlation (Pearson r = +0.89) between efflux pump activity and quantity of biofilm was observed. Genes associated with aggregation, adhesion, cyclic di-GMP, quorum sensing were detected in the AMR isolates, corroborating phenotype observations. This investigation showed the prevalence of AMR in E. coli isolates from bovine clinical mastitis. The results also suggest the inadequacy of antimicrobials with a single mode of action to curtail AMR bacteria with multiple mechanisms of resistance and virulence factors. Therefore, it calls for combinatorial therapy for the effective management of AMR infections in dairy farms and combats its potential transmission to the food supply chain through milk and dairy products.

Keywords: antimicrobial resistance, E. coli, bovine mastitis, antibiotics, heavy-metals, efflux pump, ß-lactamase enzyme, biofilm, whole-genome sequencing

Procedia PDF Downloads 176

26714 Molecular Characterization, Host Plant Resistance and Epidemiology of Bean Common Mosaic Virus Infecting Cowpea (Vigna unguiculata L. Walp)

Authors: N. Manjunatha, K. T. Rangswamy, N. Nagaraju, H. A. Prameela, P. Rudraswamy, M. Krishnareddy

Abstract:

The identification of virus in cowpea especially potyviruses is confusing. Even though there are several studies on viruses causing diseases in cowpea, difficult to distinguish based on symptoms and serological detection. The differentiation of potyviruses considering as a constraint, the present study is initiated for molecular characterization, host plant resistance and epidemiology of the BCMV infecting cowpea. The etiological agent causing cowpea mosaic was identified as Bean Common Mosaic Virus (BCMV) on the basis of RT-PCR and electron microscopy. An approximately 750bp PCR product corresponding to coat protein (CP) region of the virus and the presence of long flexuous filamentous particles measuring about 952 nm in size typical to genus potyvirus were observed under electron microscope. The characterized virus isolate genome had 10054 nucleotides, excluding the 3’ terminal poly (A) tail. Comparison of polyprotein of the virus with other potyviruses showed similar genome organization with 9 cleavage sites resulted in 10 functional proteins. The pairwise sequence comparison of individual genes, P1 showed most divergent, but CP gene was less divergent at nucleotide and amino acid level. A phylogenetic tree constructed based on multiple sequence alignments of the polyprotein nucleotide and amino acid sequences of cowpea BCMV and potyviruses showed virus is closely related to BCMV-HB. Whereas, Soybean variant of china (KJ807806) and NL1 isolate (AY112735) showed 93.8 % (5’UTR) and 94.9 % (3’UTR) homology respectively with other BCMV isolates. This virus transmitted to different leguminous plant species and produced systemic symptoms under greenhouse conditions. Out of 100 cowpea genotypes screened, three genotypes viz., IC 8966, V 5 and IC 202806 showed immune reaction in both field and greenhouse conditions. Single marker analysis (SMA) was revealed out of 4 SSR markers linked to BCMV resistance, M135 marker explains 28.2 % of phenotypic variation (R2) and Polymorphic information content (PIC) value of these markers was ranged from 0.23 to 0.37. The correlation and regression analysis showed rainfall, and minimum temperature had significant negative impact and strong relationship with aphid population, whereas weak correlation was observed with disease incidence. Path coefficient analysis revealed most of the weather parameters exerted their indirect contributions to the aphid population and disease incidence except minimum temperature. This study helps to identify specific gaps in knowledge for researchers who may wish to further analyse the science behind complex interactions between vector-virus and host in relation to the environment. The resistant genotypes identified are could be effectively used in resistance breeding programme.

Keywords: cowpea, epidemiology, genotypes, virus

Procedia PDF Downloads 202

26713 Computational Investigation on Structural and Functional Impact of Oncogenes and Tumor Suppressor Genes on Cancer

Authors: Abdoulie K. Ceesay

Abstract:

Within the sequence of the whole genome, it is known that 99.9% of the human genome is similar, whilst our difference lies in just 0.1%. Among these minor dissimilarities, the most common type of genetic variations that occurs in a population is SNP, which arises due to nucleotide substitution in a protein sequence that leads to protein destabilization, alteration in dynamics, and other physio-chemical properties’ distortions. While causing variations, they are equally responsible for our difference in the way we respond to a treatment or a disease, including various cancer types. There are two types of SNPs; synonymous single nucleotide polymorphism (sSNP) and non-synonymous single nucleotide polymorphism (nsSNP). sSNP occur in the gene coding region without causing a change in the encoded amino acid, while nsSNP is deleterious due to its replacement of a nucleotide residue in the gene sequence that results in a change in the encoded amino acid. Predicting the effects of cancer related nsSNPs on protein stability, function, and dynamics is important due to the significance of phenotype-genotype association of cancer. In this thesis, Data of 5 oncogenes (ONGs) (AKT1, ALK, ERBB2, KRAS, BRAF) and 5 tumor suppressor genes (TSGs) (ESR1, CASP8, TET2, PALB2, PTEN) were retrieved from ClinVar. Five common in silico tools; Polyphen, Provean, Mutation Assessor, Suspect, and FATHMM, were used to predict and categorize nsSNPs as deleterious, benign, or neutral. To understand the impact of each variation on the phenotype, Maestro, PremPS, Cupsat, and mCSM-NA in silico structural prediction tools were used. This study comprises of in-depth analysis of 10 cancer gene variants downloaded from Clinvar. Various analysis of the genes was conducted to derive a meaningful conclusion from the data. Research done indicated that pathogenic variants are more common among ONGs. Our research also shows that pathogenic and destabilizing variants are more common among ONGs than TSGs. Moreover, our data indicated that ALK(409) and BRAF(86) has higher benign count among ONGs; whilst among TSGs, PALB2(1308) and PTEN(318) genes have higher benign counts. Looking at the individual cancer genes predisposition or frequencies of causing cancer according to our research data, KRAS(76%), BRAF(55%), and ERBB2(36%) among ONGs; and PTEN(29%) and ESR1(17%) among TSGs have higher tendencies of causing cancer. Obtained results can shed light to the future research in order to pave new frontiers in cancer therapies.

Keywords: tumor suppressor genes (TSGs), oncogenes (ONGs), non synonymous single nucleotide polymorphism (nsSNP), single nucleotide polymorphism (SNP)

Procedia PDF Downloads 60

26712 Prevalence Determination of Hepatitis D Virus Genotypes among HBsAg Positive Patients in Kerman Province of Iran

Authors: Khabat Barkhordari, Ali Mohammad Arabzadeh

Abstract:

Hepatitis delta virus (HDV) is a RNA virus that needs the function of hepatitis B virus (HBV) for its propagation and assembly. Infection by HDV can occur spontaneously with HBV infection and cause acute hepatitis or develop as secondary infection in HBV suffering patients. Based on genome sequence analysis, HDV has several genotypes which show broad geographic and diverse clinical features. The aim of current study is determine the prevalence of hepatitis delta virus genotype in patients with positive HBsAg in Kerman province of Iran. This cross-sectional study a total of 400 patients with HBV infection attending the clinic center of Besat from 2012 to 2014 were included. We carried out ELISA to detect anti-HDV antibodies. Those testing positive were analyzed further for HDV-RNA and for genotyping using restriction fragment length polymorphism (RFLP) and RT-nested PCR- sequencing. Among 400 patients in this study, 67 cases (16.75 %) were containing anti-HDV antibody which we found HDV RNA in just 7 (1.75%) serum samples. Analysis of these 7 positive HDV showed that all of them have genotype I. According to current study the HDV prevalence in Kerman is higher than the reported prevalence of 6.6% for Iran as a whole and clade 1 (genotype 1) is the predominant clade of HDV in Kerman.

Keywords: genotyping, hepatitis delta virus, molecular epidemiology, Kerman, Iran

Procedia PDF Downloads 262

26711 Mitigating Ruminal Methanogenesis Through Genomic and Transcriptomic Approaches

Authors: Muhammad Adeel Arshad, Faiz-Ul Hassan, Yanfen Cheng

Abstract:

According to FAO, enteric methane (CH4) production is about 44% of all greenhouse gas emissions from the livestock sector. Ruminants produce CH4 as a result of fermentation of feed in the rumen especially from roughages which yield more CH4 per unit of biomass ingested as compared to concentrates. Efficient ruminal fermentation is not possible without abating CO2 and CH4. Methane abatement strategies are required to curb the predicted rise in emissions associated with greater ruminant production in future to meet ever increasing animal protein requirements. Ecology of ruminal methanogenesis and avenues for its mitigation can be identified through various genomic and transcriptomic techniques. Programs such as Hungate1000 and the Global Rumen Census have been launched to enhance our understanding about global ruminal microbial communities. Through Hungate1000 project, a comprehensive reference set of rumen microbial genome sequences has been developed from cultivated rumen bacteria and methanogenic archaea along with representative rumen anaerobic fungi and ciliate protozoa cultures. But still many species of rumen microbes are underrepresented especially uncultivable microbes. Lack of sequence information specific to the rumen's microbial community has inhibited efforts to use genomic data to identify specific set of species and their target genes involved in methanogenesis. Metagenomic and metatranscriptomic study of entire microbial rumen populations offer new perspectives to understand interaction of methanogens with other rumen microbes and their potential association with total gas and methane production. Deep understanding of methanogenic pathway will help to devise potentially effective strategies to abate methane production while increasing feed efficiency in ruminants.

Keywords: Genome sequences, Hungate1000, methanogens, ruminal fermentation

Procedia PDF Downloads 106

26710 Fat-Tail Test of Regulatory DNA Sequences

Authors: Jian-Jun Shu

Abstract:

The statistical properties of CRMs are explored by estimating similar-word set occurrence distribution. It is observed that CRMs tend to have a fat-tail distribution for similar-word set occurrence. Thus, the fat-tail test with two fatness coefficients is proposed to distinguish CRMs from non-CRMs, especially from exons. For the first fatness coefficient, the separation accuracy between CRMs and exons is increased as compared with the existing content-based CRM prediction method – fluffy-tail test. For the second fatness coefficient, the computing time is reduced as compared with fluffy-tail test, making it very suitable for long sequences and large data-base analysis in the post-genome time. Moreover, these indexes may be used to predict the CRMs which have not yet been observed experimentally. This can serve as a valuable filtering process for experiment.

Keywords: statistical approach, transcription factor binding sites, cis-regulatory modules, DNA sequences

Procedia PDF Downloads 262

26709 Allele Mining for Rice Sheath Blight Resistance by Whole-Genome Association Mapping in a Tail-End Population

Authors: Naoki Yamamoto, Hidenobu Ozaki, Taiichiro Ookawa, Youming Liu, Kazunori Okada, Aiping Zheng

Abstract:

Rice sheath blight is one of the destructive fungal diseases in rice. We have thought that rice sheath blight resistance is a polygenic trait. Host-pathogen interactions and secondary metabolites such as lignin and phytoalexins are likely to be involved in defense against R. solani. However, to our knowledge, it is still unknown how sheath blight resistance can be enhanced in rice breeding. To seek for an alternative genetic factor that contribute to sheath blight resistance, we mined relevant allelic variations from rice core collections created in Japan. Based on disease lesion length on detached leaf sheath, we selected 30 varieties of the top tail-end and the bottom tail-end, respectively, from the core collections to perform genome-wide association mapping. Re-sequencing reads for these varieties were used for calling single nucleotide polymorphisms among the 60 varieties to create a SNP panel, which contained 1,137,131 homozygous variant sites after filitering. Association mapping highlighted a locus on the long arm of chromosome 11, which is co-localized with three sheath blight QTLs, qShB11-2-TX, qShB11, and qSBR-11-2. Based on the localization of the trait-associated alleles, we identified an ankyryn repeat-containing protein gene (ANK-M) as an uncharacterized candidate factor for rice sheath blight resistance. Allelic distributions for ANK-M in the whole rice population supported the reliability of trait-allele associations. Gene expression characteristics were checked to evaluiate the functionality of ANK-M. Since an ANK-M homolog (OsPIANK1) in rice seems a basal defense regulator against rice blast and bacterial leaf blight, ANK-M may also play a role in the rice immune system.

Keywords: allele mining, GWAS, QTL, rice sheath blight

Procedia PDF Downloads 45

26708 Molecular Epidemiologic Distribution of HDV Genotypes among Different Ethnic Groups in Iran: A Systematic Review

Authors: Khabat Barkhordari

Abstract:

Keywords: HDV, genotype, epidemiology, distribution

Procedia PDF Downloads 248

26707 CRISPR/Cas9 Based Gene Stacking in Plants for Virus Resistance Using Site-Specific Recombinases

Authors: Sabin Aslam, Sultan Habibullah Khan, James G. Thomson, Abhaya M. Dandekar

Abstract:

Losses due to viral diseases are posing a serious threat to crop production. A quick breakdown of resistance to viruses like Cotton Leaf Curl Virus (CLCuV) demands the application of a proficient technology to engineer durable resistance. Gene stacking has recently emerged as a potential approach for integrating multiple genes in crop plants. In the present study, recombinase technology has been used for site-specific gene stacking. A target vector (pG-Rec) was designed for engineering a predetermined specific site in the plant genome whereby genes can be stacked repeatedly. Using Agrobacterium-mediated transformation, the pG-Rec was transformed into Coker-312 along with Nicotiana tabacum L. cv. Xanthi and Nicotiana benthamiana. The transgene analysis of target lines was conducted through junction PCR. The transgene positive target lines were used for further transformations to site-specifically stack two genes of interest using Bxb1 and PhiC31 recombinases. In the first instance, Cas9 driven by multiplex gRNAs (for Rep gene of CLCuV) was site-specifically integrated into the target lines and determined by the junction PCR and real-time PCR. The resulting plants were subsequently used to stack the second gene of interest (AVP3 gene from Arabidopsis for enhancing cotton plant growth). The addition of the genes is simultaneously achieved with the removal of marker genes for recycling with the next round of gene stacking. Consequently, transgenic marker-free plants were produced with two genes stacked at the specific site. These transgenic plants can be potential germplasm to introduce resistance against various strains of cotton leaf curl virus (CLCuV) and abiotic stresses. The results of the research demonstrate gene stacking in crop plants, a technology that can be used to introduce multiple genes sequentially at predefined genomic sites. The current climate change scenario highlights the use of such technologies so that gigantic environmental issues can be tackled by several traits in a single step. After evaluating virus resistance in the resulting plants, the lines can be a primer to initiate stacking of further genes in Cotton for other traits as well as molecular breeding with elite cotton lines.

Keywords: cotton, CRISPR/Cas9, gene stacking, genome editing, recombinases

Procedia PDF Downloads 115

26706 Human Papillomavirus Type 16 E4 Gene Variation as Risk Factor for Cervical Cancer

Authors: Yudi Zhao, Ziyun Zhou, Yueting Yao, Shuying Dai, Zhiling Yan, Longyu Yang, Chuanyin Li, Li Shi, Yufeng Yao

Abstract:

HPV16 E4 gene plays an important role in viral genome amplification and release. Therefore, a variation of the E4 gene nucleic acid sequence may affect the carcinogenicity of HPV16. In order to understand the relationship between the variation of HPV16 E4 gene and cervical cancer, this study was to amplify and sequence the DNA sequences of E4 genes in 118 HPV16-positive cervical cancer patients and 151 HPV16-positive asymptomatic individuals. After obtaining E4 gene sequences, the phylogenetic trees were constructed by the Neighbor-joining method for gene variation analysis. The results showed that: 1) The distribution of HPV16 variants between the case group and the control group differed greatly (P = 0.015)，and the Asian-American（AA）variant was likely to relate to the occurrence of cervical cancer. 2) DNA sequence analysis showed that there were significant differences in the distribution of 8 variants between the case group and the control group (P < 0.05). And 3) In European (EUR) variant, two variations, C3384T (L18L) and A3449G (P39P), were associated with the initiation and development of cervical cancer. The results suggested that the variation of HPV16 E4 gene may be a contributor affecting the occurrence as well as the development of cervical cancer, and different HPV16 variants may have different carcinogenic capability.

Keywords: cervical cancer, HPV16, E4 gene, variations

Procedia PDF Downloads 140

26705 Scalable and Accurate Detection of Pathogens from Whole-Genome Shotgun Sequencing

Authors: Janos Juhasz, Sandor Pongor, Balazs Ligeti

Abstract:

Next-generation sequencing, especially whole genome shotgun sequencing, is becoming a common approach to gain insight into the microbiomes in a culture-independent way, even in clinical practice. It does not only give us information about the species composition of an environmental sample but opens the possibility to detect antimicrobial resistance and novel, or currently unknown, pathogens. Accurately and reliably detecting the microbial strains is a challenging task. Here we present a sensitive approach for detecting pathogens in metagenomics samples with special regard to detecting novel variants of known pathogens. We have developed a pipeline that uses fast, short read aligner programs (i.e., Bowtie2/BWA) and comprehensive nucleotide databases. Taxonomic binning is based on the lowest common ancestor (LCA) principle; each read is assigned to a taxon, covering the most significantly hit taxa. This approach helps in balancing between sensitivity and running time. The program was tested both on experimental and synthetic data. The results implicate that our method performs as good as the state-of-the-art BLAST-based ones, furthermore, in some cases, it even proves to be better, while running two orders magnitude faster. It is sensitive and capable of identifying taxa being present only in small abundance. Moreover, it needs two orders of magnitude less reads to complete the identification than MetaPhLan2 does. We analyzed an experimental anthrax dataset (B. anthracis strain BA104). The majority of the reads (96.50%) was classified as Bacillus anthracis, a small portion, 1.2%, was classified as other species from the Bacillus genus. We demonstrate that the evaluation of high-throughput sequencing data is feasible in a reasonable time with good classification accuracy.

Keywords: metagenomics, taxonomy binning, pathogens, microbiome, B. anthracis

Procedia PDF Downloads 106

26704 Population Structure Analysis of Pakistani Indigenous Cattle Population by Using High Density SNP Array

Authors: Hamid Mustafa, Huson J. Heather, Kim Eiusoo, McClure Matt, Khalid Javed, Talat Nasser Pasha, Afzal Ali1, Adeela Ajmal, Tad Sonstegard

Abstract:

Genetic differences associated with speciation, breed formation or local adaptation can help to preserve and effective utilization of animals in selection programs. Analyses of population structure and breed diversity have provided insight into the origin and evolution of cattle. In this study, we used a high-density panel of SNP markers to examine population structure and diversity among ten Pakistani indigenous cattle breeds. In total, 25 individuals from three cattle populations, including Achi (n=08), Bhagnari (n=04) and Cholistani (n=13) were genotyped for 777, 962 single nucleotide polymorphism (SNP) markers. Population structure was examined using the linkage model in the program STRUCTURE. After characterizing SNP polymorphism in the different populations, we performed a detailed analysis of genetic structure at both the individual and population levels. The whole-genome SNP panel identified several levels of population substructure in the set of examined cattle breeds. We further searched for spatial patterns of genetic diversity among these breeds under the recently developed spatial principal component analysis framework. Overall, such high throughput genotyping data confirmed a clear partitioning of the cattle genetic diversity into distinct breeds. The resulting complex historical origins associated with both natural and artificial selection have led to the differentiation of numerous different cattle breeds displaying a broad phenotypic variety over a short period of time.

Keywords: Pakistan, cattle, genetic diversity, population structure

Procedia PDF Downloads 583

26703 Improving the Biocontrol of the Argentine Stem Weevil; Using the Parasitic Wasp Microctonus hyperodae

Authors: John G. Skelly, Peter K. Dearden, Thomas W. R. Harrop, Sarah N. Inwood, Joseph Guhlin

Abstract:

The Argentine stem weevil (ASW; L. bonariensis) is an economically important pasture pest in New Zealand, which causes about $200 million of damage per annum. Microctonus hyperodae (Mh), a parasite of the ASW in its natural range in South America, was introduced into New Zealand to curb the pasture damage caused by the ASW. Mh is an endoparasitic wasp that lays its eggs in the ASW halting its reproduction. Mh was initially successful at preventing ASW proliferation and reducing pasture damage. The effectiveness of Mh has since declined due to decreased parasitism rates and has resulted in increased pasture damage. Although the mechanism through which ASW has developed resistance to Mh has not been discovered, it has been proposed to be due to the different reproductive modes used by Mh and the ASW in New Zealand. The ASW reproduces sexually, whereas Mh reproduces asexually, which has been hypothesised to have allowed the ASW to ‘out evolve’ Mh. Other species within the Microctonus genus reproduce both sexually and asexually. Strains of Microctonus aethiopoides (Ma), a species closely related to Mh, reproduce either by sexual or asexual reproduction. Comparing the genomes of sexual and asexual Microctonus may allow for the identification of the mechanism of asexual reproduction and other characteristics that may improve Mh as a biocontrol agent. The genomes of Mh and three strains of Ma, two of which reproduce sexually and one reproduces asexually, have been sequenced and annotated. The French (MaFR) and Moroccan (MaMO) reproduce sexually, whereas the Irish strain (MaIR) reproduces asexually. Like Mh, The Ma strains are also used as biocontrol agents, but for different weevil species. The genomes of Mh and MaIR were subsequently upgraded using Hi-C, resulting in a set of high quality, highly contiguous genomes. A subset of the genes involved in mitosis and meiosis, which have been identified though the use of Hidden Markov Models generated from genes involved in these processes in other Hymenoptera, have been catalogued in Mh and the strains of Ma. Meiosis and mitosis genes were broadly conserved in both sexual and asexual Microctonus species. This implies that either the asexual species have retained a subset of the molecular components required for sexual reproduction or that the molecular mechanisms of mitosis and meiosis are different or differently regulated in Microctonus to other insect species in which these mechanisms are more broadly characterised. Bioinformatic analysis of the chemoreceptor compliment in Microctonus has revealed some variation in the number of olfactory receptors, which may be related to host preference. Phylogenetic analysis of olfactory receptors highlights variation, which may be able to explain different host range preferences in the Microctonus. Hi-C clustering implies that Mh has 12 chromosomes, and MaIR has 8. Hence there may be variation in gene regulation between species. Genome alignment of Mh and MaIR implies that there may be large scale genome structural variation. Greater insight into the genetics of these agriculturally important group of parasitic wasps may be beneficial in restoring or maintaining their biocontrol efficacy.

Keywords: argentine stem weevil, asexual, genomics, Microctonus hyperodae

Procedia PDF Downloads 125

26702 In silico Subtractive Genomics Approach for Identification of Strain-Specific Putative Drug Targets among Hypothetical Proteins of Drug-Resistant Klebsiella pneumoniae Strain 825795-1

Authors: Umairah Natasya Binti Mohd Omeershffudin, Suresh Kumar

Abstract:

Klebsiella pneumoniae, a Gram-negative enteric bacterium that causes nosocomial and urinary tract infections. Particular concern is the global emergence of multidrug-resistant (MDR) strains of Klebsiella pneumoniae. Characterization of antibiotic resistance determinants at the genomic level plays a critical role in understanding, and potentially controlling, the spread of multidrug-resistant (MDR) pathogens. In this study, drug-resistant Klebsiella pneumoniae strain 825795-1 was investigated with extensive computational approaches aimed at identifying novel drug targets among hypothetical proteins. We have analyzed 1099 hypothetical proteins available in genome. We have used in-silico genome subtraction methodology to design potential and pathogen-specific drug targets against Klebsiella pneumoniae. We employed bioinformatics tools to subtract the strain-specific paralogous and host-specific homologous sequences from the bacterial proteome. The sorted 645 proteins were further refined to identify the essential genes in the pathogenic bacterium using the database of essential genes (DEG). We found 135 unique essential proteins in the target proteome that could be utilized as novel targets to design newer drugs. Further, we identified 49 cytoplasmic protein as potential drug targets through sub-cellular localization prediction. Further, we investigated these proteins in the DrugBank databases, and 11 of the unique essential proteins showed druggability according to the FDA approved drug bank databases with diverse broad-spectrum property. The results of this study will facilitate discovery of new drugs against Klebsiella pneumoniae.

Keywords: pneumonia, drug target, hypothetical protein, subtractive genomics

Procedia PDF Downloads 152

26701 Association of Non Synonymous SNP in DC-SIGN Receptor Gene with Tuberculosis (Tb)

Authors: Saima Suleman, Kalsoom Sughra, Naeem Mahmood Ashraf

Abstract:

Mycobacterium tuberculosis is a communicable chronic illness. This disease is being highly focused by researchers as it is present approximately in one third of world population either in active or latent form. The genetic makeup of a person plays an important part in producing immunity against disease. And one important factor association is single nucleotide polymorphism of relevant gene. In this study, we have studied association between single nucleotide polymorphism of CD-209 gene (encode DC-SIGN receptor) and patients of tuberculosis. Dry lab (in silico) and wet lab (RFLP) analysis have been carried out. GWAS catalogue and GEO database have been searched to find out previous association data. No association study has been found related to CD-209 nsSNPs but role of CD-209 in pulmonary tuberculosis have been addressed in GEO database.Therefore, CD-209 has been selected for this study. Different databases like ENSEMBLE and 1000 Genome Project has been used to retrieve SNP data in form of VCF file which is further submitted to different software to sort SNPs into benign and deleterious. Selected SNPs are further annotated by using 3-D modeling techniques using I-TASSER online software. Furthermore, selected nsSNPs were checked in Gujrat and Faisalabad population through RFLP analysis. In this study population two SNPs are found to be associated with tuberculosis while one nsSNP is not found to be associated with the disease.

Keywords: association, CD209, DC-SIGN, tuberculosis

Procedia PDF Downloads 281

26700 Inbreeding Study Using Runs of Homozygosity in Nelore Beef Cattle

Authors: Priscila A. Bernardes, Marcos E. Buzanskas, Luciana C. A. Regitano, Ricardo V. Ventura, Danisio P. Munari

Abstract:

The best linear unbiased predictor (BLUP) is a method commonly used in genetic evaluations of breeding programs. However, this approach can lead to higher inbreeding coefficients in the population due to the intensive use of few bulls with higher genetic potential, usually presenting some degree of relatedness. High levels of inbreeding are associated to low genetic viability, fertility, and performance for some economically important traits and therefore, should be constantly monitored. Unreliable pedigree data can also lead to misleading results. Genomic information (i.e., single nucleotide polymorphism – SNP) is a useful tool to estimate the inbreeding coefficient. Runs of homozygosity have been used to evaluate homozygous segments inherited due to direct or collateral inbreeding and allows inferring population selection history. This study aimed to evaluate runs of homozygosity (ROH) and inbreeding in a population of Nelore beef cattle. A total of 814 animals were genotyped with the Illumina BovineHD BeadChip and the quality control was carried out excluding SNPs located in non-autosomal regions, with unknown position, with a p-value in the Hardy-Weinberg equilibrium lower than 10⁻⁵, call rate lower than 0.98 and samples with the call rate lower than 0.90. After the quality control, 809 animals and 509,107 SNPs remained for analyses. For the ROH analysis, PLINK software was used considering segments with at least 50 SNPs with a minimum length of 1Mb in each animal. The inbreeding coefficient was calculated using the ratio between the sum of all ROH sizes and the size of the whole genome (2,548,724kb). A total of 25.711 ROH were observed, presenting mean, median, minimum, and maximum length of 3.34Mb, 2Mb, 1Mb, and 80.8Mb, respectively. The number of SNPs present in ROH segments varied from 50 to 14.954. The longest ROH length was observed in one animal, which presented a length of 634Mb (24.88% of the genome). Four bulls were among the 10 animals with the longest extension of ROH, presenting 11% of ROH with length higher than 10Mb. Segments longer than 10Mb indicate recent inbreeding. Therefore, the results indicate an intensive use of few sires in the studied data. The distribution of ROH along the chromosomes showed that chromosomes 5 and 6 presented a large number of segments when compared to other chromosomes. The mean, median, minimum, and maximum inbreeding coefficients were 5.84%, 5.40%, 0.00%, and 24.88%, respectively. Although the mean inbreeding was considered low, the ROH indicates a recent and intensive use of few sires, which should be avoided for the genetic progress of breed.

Keywords: autozygosity, Bos taurus indicus, genomic information, single nucleotide polymorphism

Procedia PDF Downloads 124

26699 Persistent Ribosomal In-Frame Mis-Translation of Stop Codons as Amino Acids in Multiple Open Reading Frames of a Human Long Non-Coding RNA

Authors: Leonard Lipovich, Pattaraporn Thepsuwan, Anton-Scott Goustin, Juan Cai, Donghong Ju, James B. Brown

Abstract:

Two-thirds of human genes do not encode any known proteins. Aside from long non-coding RNA (lncRNA) genes with recently-discovered functions, the ~40,000 non-protein-coding human genes remain poorly understood, and a role for their transcripts as de-facto unconventional messenger RNAs has not been formally excluded. Ribosome profiling (Riboseq) predicts translational potential, but without independent evidence of proteins from lncRNA open reading frames (ORFs), ribosome binding of lncRNAs does not prove translation. Previously, we mass-spectrometrically documented translation of specific lncRNAs in human K562 and GM12878 cells. We now examined lncRNA translation in human MCF7 cells, integrating strand-specific Illumina RNAseq, Riboseq, and deep mass spectrometry in biological quadruplicates performed at two core facilities (BGI, China; City of Hope, USA). We excluded known-protein matches. UCSC Genome Browser-assisted manual annotation of imperfect (tryptic-digest-peptides)-to-(lncRNA-three-frame-translations) alignments revealed three peptides hypothetically explicable by 'stop-to-nonstop' in-frame replacement of stop codons by amino acids in two ORFs of the lncRNA MMP24-AS1. To search for this phenomenon genomewide, we designed and implemented a novel pipeline, matching tryptic-digest spectra to wildcard-instead-of-stop versions of repeat-masked, six-frame, whole-genome translations. Along with singleton putative stop-to-nonstop events affecting four other lncRNAs, we identified 24 additional peptides with stop-to-nonstop in-frame substitutions from multiple positive-strand MMP24-AS1 ORFs. Only UAG and UGA, never UAA, stop codons were impacted. All MMP24-AS1-matching spectra met the same significance thresholds as high-confidence known-protein signatures. Targeted resequencing of MMP24-AS1 genomic DNA and cDNA from the same samples did not reveal any mutations, polymorphisms, or sequencing-detectable RNA editing. This unprecedented apparent gene-specific violation of the genetic code highlights the importance of matching peptides to whole-genome, not known-genes-only, ORFs in mass-spectrometry workflows, and suggests a new mechanism enhancing the combinatorial complexity of the proteome. Funding: NIH Director’s New Innovator Award 1DP2-CA196375 to LL.

Keywords: genetic code, lncRNA, long non-coding RNA, mass spectrometry, proteogenomics, ribo-seq, ribosome, RNAseq

Procedia PDF Downloads 200

26698 Potyviruses Genomic Analysis and Complete Evaluation

Authors: Narin Salehiyan, Ramin Ghasemi Shayan

Abstract:

The largest genus of plant viruses, the potyvirus, is responsible for significant crop losses. Potyviruses are aphid sent in a nonpersistent way, and some of them are likewise seed communicated. As significant microorganisms, potyviruses are substantially more examined than other plant infections having a place with different genera, and their review covers numerous parts of plant virology, like utilitarian portrayal of viral proteins, sub-atomic communication with hosts and vectors, structure, scientific classification, development, the study of disease transmission, and determination. Biotechnological utilizations of potyviruses are likewise being investigated. During this last ten years, significant advances have been made in the comprehension of the sub-atomic science of these infections and the elements of their different proteins. Potyvirus multiplication, movement, and transmission, as well as potyvirus/plant compatible interactions, including pathogenicity and symptom determinants, are updated following a general overview of the family Potyviridae and the potyviral proteins. it end the survey giving data on biotechnological uses of potyviruses.

Keywords: virology, poty, virus, genome, genetic

Procedia PDF Downloads 39

26697 Modeling Competition Between Subpopulations with Variable DNA Content in Resource-Limited Microenvironments

Authors: Parag Katira, Frederika Rentzeperis, Zuzanna Nowicka, Giada Fiandaca, Thomas Veith, Jack Farinhas, Noemi Andor

Abstract:

Resource limitations shape the outcome of competitions between genetically heterogeneous pre-malignant cells. One example of such heterogeneity is in the ploidy (DNA content) of pre-malignant cells. A whole-genome duplication (WGD) transforms a diploid cell into a tetraploid one and has been detected in 28-56% of human cancers. If a tetraploid subclone expands, it consistently does so early in tumor evolution, when cell density is still low, and competition for nutrients is comparatively weak – an observation confirmed for several tumor types. WGD+ cells need more resources to synthesize increasing amounts of DNA, RNA, and proteins. To quantify resource limitations and how they relate to ploidy, we performed a PAN cancer analysis of WGD, PET/CT, and MRI scans. Segmentation of >20 different organs from >900 PET/CT scans were performed with MOOSE. We observed a strong correlation between organ-wide population-average estimates of Oxygen and the average ploidy of cancers growing in the respective organ (Pearson R = 0.66; P= 0.001). In-vitro experiments using near-diploid and near-tetraploid lineages derived from a breast cancer cell line supported the hypothesis that DNA content influences Glucose- and Oxygen-dependent proliferation-, death- and migration rates. To model how subpopulations with variable DNA content compete in the resource-limited environment of the human brain, we developed a stochastic state-space model of the brain (S3MB). The model discretizes the brain into voxels, whereby the state of each voxel is defined by 8+ variables that are updated over time: stiffness, Oxygen, phosphate, glucose, vasculature, dead cells, migrating cells and proliferating cells of various DNA content, and treat conditions such as radiotherapy and chemotherapy. Well-established Fokker-Planck partial differential equations govern the distribution of resources and cells across voxels. We applied S3MB on sequencing and imaging data obtained from a primary GBM patient. We performed whole genome sequencing (WGS) of four surgical specimens collected during the 1ˢᵗ and 2ⁿᵈ surgeries of the GBM and used HATCHET to quantify its clonal composition and how it changes between the two surgeries. HATCHET identified two aneuploid subpopulations of ploidy 1.98 and 2.29, respectively. The low-ploidy clone was dominant at the time of the first surgery and became even more dominant upon recurrence. MRI images were available before and after each surgery and registered to MNI space. The S3MB domain was initiated from 4mm³ voxels of the MNI space. T1 post and T2 flair scan acquired after the 1ˢᵗ surgery informed tumor cell densities per voxel. Magnetic Resonance Elastography scans and PET/CT scans informed stiffness and Glucose access per voxel. We performed a parameter search to recapitulate the GBM’s tumor cell density and ploidy composition before the 2ⁿᵈ surgery. Results suggest that the high-ploidy subpopulation had a higher Glucose-dependent proliferation rate (0.70 vs. 0.49), but a lower Glucose-dependent death rate (0.47 vs. 1.42). These differences resulted in spatial differences in the distribution of the two subpopulations. Our results contribute to a better understanding of how genomics and microenvironments interact to shape cell fate decisions and could help pave the way to therapeutic strategies that mimic prognostically favorable environments.

Keywords: tumor evolution, intra-tumor heterogeneity, whole-genome doubling, mathematical modeling

Procedia PDF Downloads 42

26696 Physicians’ Knowledge and Perception of Gene Profiling in Malaysia: A Pilot Study

Authors: Farahnaz Amini, Woo Yun Kin, Lazwani Kolandaiveloo

Abstract:

Availability of different genetic tests after completion of Human Genome Project increases the physicians’ responsibility to keep themselves update on the potential implementation of these genetic tests in their daily practice. However, due to numbers of barriers, still many of physicians are not either aware of these tests or are not willing to offer or refer their patients for genetic tests. This study was conducted an anonymous, cross-sectional, mailed-based survey to develop a primary data of Malaysian physicians’ level of knowledge and perception of gene profiling. Questionnaire had 29 questions. Total scores on selected questions were used to assess the level of knowledge. The highest possible score was 11. Descriptive statistics, one way ANOVA and chi-squared test was used for statistical analysis. Sixty three completed questionnaires was returned by 27 general practitioners (GPs) and 36 medical specialists. Responders’ age range from 24 to 55 years old (mean 30.2 ± 6.4). About 40% of the participants rated themselves as having poor level of knowledge in genetics in general whilst 60% believed that they have fair level of knowledge. However, almost half (46%) of the respondents felt that they were not knowledgeable about available genetic tests. A majority (94%) of the responders were not aware of any lab or company which is offering gene profiling services in Malaysia. Only 4% of participants were aware of using gene profiling for detection of dosage of some drugs. Respondents perceived greater utility of gene profiling for breast cancer (38%) compared to the colorectal familial cancer (3%). The score of knowledge ranged from 2 to 8 (mean 4.38 ± 1.67). Non-significant differences between score of knowledge of GPs and specialists were observed, with score of 4.19 and 4.58 respectively. There was no significant association between any demographic factors and level of knowledge. However, those who graduated between years 2001 to 2005 had higher level of knowledge. Overall, 83% of participants showed relatively high level of perception on value of gene profiling to detect patient’s risk of disease. However, low perception was observed for both statements of using gene profiling for general population in order to alter their lifestyle (25%) as well as having the full sequence of a patient genome for the purpose of determining a patient’s best match for treatment (18%). The lack of clinical guidelines, limited provider knowledge and awareness, lack of time and resources to educate patients, lack of evidence-based clinical information and cost of tests were the most barriers of ordering gene profiling mentioned by physicians. In conclusion Malaysian physicians who participate in this study had mediocre level of knowledge and awareness in gene profiling. The low exposure to the genetic questions and problems might be a key predictor of lack of awareness and knowledge on available genetic tests. Educational and training workshop might be useful in helping Malaysian physicians incorporate genetic profiling into practice for eligible patients.

Keywords: gene profiling, knowledge, Malaysia, physician

Procedia PDF Downloads 303

26695 Identifying Protein-Coding and Non-Coding Regions in Transcriptomes

Authors: Angela U. Makolo

Abstract:

Protein-coding and Non-coding regions determine the biology of a sequenced transcriptome. Research advances have shown that Non-coding regions are important in disease progression and clinical diagnosis. Existing bioinformatics tools have been targeted towards Protein-coding regions alone. Therefore, there are challenges associated with gaining biological insights from transcriptome sequence data. These tools are also limited to computationally intensive sequence alignment, which is inadequate and less accurate to identify both Protein-coding and Non-coding regions. Alignment-free techniques can overcome the limitation of identifying both regions. Therefore, this study was designed to develop an efficient sequence alignment-free model for identifying both Protein-coding and Non-coding regions in sequenced transcriptomes. Feature grouping and randomization procedures were applied to the input transcriptomes (37,503 data points). Successive iterations were carried out to compute the gradient vector that converged the developed Protein-coding and Non-coding Region Identifier (PNRI) model to the approximate coefficient vector. The logistic regression algorithm was used with a sigmoid activation function. A parameter vector was estimated for every sample in 37,503 data points in a bid to reduce the generalization error and cost. Maximum Likelihood Estimation (MLE) was used for parameter estimation by taking the log-likelihood of six features and combining them into a summation function. Dynamic thresholding was used to classify the Protein-coding and Non-coding regions, and the Receiver Operating Characteristic (ROC) curve was determined. The generalization performance of PNRI was determined in terms of F1 score, accuracy, sensitivity, and specificity. The average generalization performance of PNRI was determined using a benchmark of multi-species organisms. The generalization error for identifying Protein-coding and Non-coding regions decreased from 0.514 to 0.508 and to 0.378, respectively, after three iterations. The cost (difference between the predicted and the actual outcome) also decreased from 1.446 to 0.842 and to 0.718, respectively, for the first, second and third iterations. The iterations terminated at the 390th epoch, having an error of 0.036 and a cost of 0.316. The computed elements of the parameter vector that maximized the objective function were 0.043, 0.519, 0.715, 0.878, 1.157, and 2.575. The PNRI gave an ROC of 0.97, indicating an improved predictive ability. The PNRI identified both Protein-coding and Non-coding regions with an F1 score of 0.970, accuracy (0.969), sensitivity (0.966), and specificity of 0.973. Using 13 non-human multi-species model organisms, the average generalization performance of the traditional method was 74.4%, while that of the developed model was 85.2%, thereby making the developed model better in the identification of Protein-coding and Non-coding regions in transcriptomes. The developed Protein-coding and Non-coding region identifier model efficiently identified the Protein-coding and Non-coding transcriptomic regions. It could be used in genome annotation and in the analysis of transcriptomes.

Keywords: sequence alignment-free model, dynamic thresholding classification, input randomization, genome annotation

Procedia PDF Downloads 25

26694 Identification of Two Novel Carbapenemase Gene Variants from a Carbapenem-Resistant Aeromonas Veronii Environmental Isolate

Authors: Rafael Estrada, Cristian Ruiz Rueda

Abstract:

Carbapenems are last-resort antibiotics used in clinical settings to treat antibiotic-resistant bacterial infections. Thus, the emergence and spread of resistance to carbapenems is a major public health concern. Here, we have studied a carbapenem-resistant Aeromonas veronii strain previously isolated from a water sample from Sam Simeon Creek (Hearst San Simeon State Park, CA). Analysis of this isolate using disk-diffusion, CarbaNP, eCIM and mCIM assays revealed that it was resistant to amoxicillin-clavulanic acid and all carbapenems tested and that this isolate produced a potentially novel carbapenemase of the Metallo-β-lactamase family. Whole genome sequencing analysis revealed that this A. veronii isolate carries a novel variant of the blacₚₕₐ class β-carbapenemase gene that was closely related to the blacₚₕₐ₇ gene of Aeromonas jandaei. This isolate also carried a novel variant of the blaₒₓₐ class D carbapenemase gene that was most closely related to the blaₒₓₐ-₉₁₂ gene found in other Aeromonas veronii isolates. Finally, we also identified a novel class C β-lactamase gene moderately related to the blaFₒₓ-₁₇ gene of Providencia stuartii and other blaFₒₓ variants identified in Klebsiella pneumoniae, Escherichia coli and other Enterobacteriaceae. Overall, our findings reveal that environmental isolates are an important reservoir of multiple carbapenemases and other β-lactamases of clinical significance.

Keywords: β-lactamases, carbapenem, antibiotic-resistant, aeromonas veronii

Procedia PDF Downloads 52

26693 Microbial Bioproduction with Design of Metabolism and Enzyme Engineering

Authors: Tomokazu Shirai, Akihiko Kondo

Abstract:

Technologies of metabolic engineering or synthetic biology are essential for effective microbial bioproduction. It is especially important to develop an in silico tool for designing a metabolic pathway producing an unnatural and valuable chemical such as fossil materials of fuel or plastics. We here demonstrated two in silico tools for designing novel metabolic pathways: BioProV and HyMeP. Furthermore, we succeeded in creating an artificial metabolic pathway by enzyme engineering.

Keywords: bioinformatics, metabolic engineering, synthetic biology, genome scale model

Procedia PDF Downloads 310

26692 Bean in Turkey: Characterization, Inter Gene Pool Hybridization Events, Breeding, Utilizations

Authors: Faheem Shahzad Baloch, Muhammad Azhar Nadeem, Muhammad Amjad Nawaz, Ephrem Habyarimana, Gonul Comertpay, Tolga Karakoy, Rustu Hatipoglu, Mehmet Zahit Yeken, Vahdettin Ciftci

Abstract:

Turkey is considered a bridge between Europe, Asia, and Africa and possibly played an important role in the distribution of many crops including common bean. Hundreds of common bean landraces can be found in Turkey, particularly in farmers’ fields, and they consistently contribute to the overall production. To investigate the existing genetic diversity and hybridization events between the Andean and Mesoamerican gene pools in the Turkish common bean, 188 common bean accessions (182 landraces and 6 modern cultivars as controls) were collected from 19 different Turkish geographic regions. These accessions were characterized using phenotypic data (growth habit and seed weight), geographic provenance, 12557 high-quality whole-genome DArTseq markers, and 3767 novel DArTseq loci were also identified. The clustering algorithms resolved the Turkish common bean landrace germplasm into the two recognized gene pools, the Mesoamerican and Andean gene pools. Hybridization events were observed in both gene pools (14.36% of the accessions) but mostly in the Mesoamerican (7.97% of the accessions), and was low relative to previous European studies. The lower level of hybridization witnessed the existence of Turkish common bean germplasm in its original form as compared to Europe. Mesoamerican gene pool reflected a higher level of diversity, while the Andean gene pool was predominant (56.91% of the accessions), but genetically less diverse and phenotypically more pure, reflecting farmers greater preference for the Andean gene pool. We also found some genetically distinct landraces and overall, a meaningful level of genetic variability which can be used by the scientific community in breeding efforts to develop superior common bean strains.

Keywords: bean germplasm, DArTseq markers, genotyping by sequencing, Turkey, whole genome diversity

Procedia PDF Downloads 206

26691 Genomic Sequence Representation Learning: An Analysis of K-Mer Vector Embedding Dimensionality

Authors: James Jr. Mashiyane, Risuna Nkolele, Stephanie J. Müller, Gciniwe S. Dlamini, Rebone L. Meraba, Darlington S. Mapiye

Abstract:

When performing language tasks in natural language processing (NLP), the dimensionality of word embeddings is chosen either ad-hoc or is calculated by optimizing the Pairwise Inner Product (PIP) loss. The PIP loss is a metric that measures the dissimilarity between word embeddings, and it is obtained through matrix perturbation theory by utilizing the unitary invariance of word embeddings. Unlike in natural language, in genomics, especially in genome sequence processing, unlike in natural language processing, there is no notion of a “word,” but rather, there are sequence substrings of length k called k-mers. K-mers sizes matter, and they vary depending on the goal of the task at hand. The dimensionality of word embeddings in NLP has been studied using the matrix perturbation theory and the PIP loss. In this paper, the sufficiency and reliability of applying word-embedding algorithms to various genomic sequence datasets are investigated to understand the relationship between the k-mer size and their embedding dimension. This is completed by studying the scaling capability of three embedding algorithms, namely Latent Semantic analysis (LSA), Word2Vec, and Global Vectors (GloVe), with respect to the k-mer size. Utilising the PIP loss as a metric to train embeddings on different datasets, we also show that Word2Vec outperforms LSA and GloVe in accurate computing embeddings as both the k-mer size and vocabulary increase. Finally, the shortcomings of natural language processing embedding algorithms in performing genomic tasks are discussed.

Keywords: word embeddings, k-mer embedding, dimensionality reduction

Procedia PDF Downloads 92

26690 ISMARA: Completely Automated Inference of Gene Regulatory Networks from High-Throughput Data

Authors: Piotr J. Balwierz, Mikhail Pachkov, Phil Arnold, Andreas J. Gruber, Mihaela Zavolan, Erik van Nimwegen

Abstract:

Understanding the key players and interactions in the regulatory networks that control gene expression and chromatin state across different cell types and tissues in metazoans remains one of the central challenges in systems biology. Our laboratory has pioneered a number of methods for automatically inferring core gene regulatory networks directly from high-throughput data by modeling gene expression (RNA-seq) and chromatin state (ChIP-seq) measurements in terms of genome-wide computational predictions of regulatory sites for hundreds of transcription factors and micro-RNAs. These methods have now been completely automated in an integrated webserver called ISMARA that allows researchers to analyze their own data by simply uploading RNA-seq or ChIP-seq data sets and provides results in an integrated web interface as well as in downloadable flat form. For any data set, ISMARA infers the key regulators in the system, their activities across the input samples, the genes and pathways they target, and the core interactions between the regulators. We believe that by empowering experimental researchers to apply cutting-edge computational systems biology tools to their data in a completely automated manner, ISMARA can play an important role in developing our understanding of regulatory networks across metazoans.

Keywords: gene expression analysis, high-throughput sequencing analysis, transcription factor activity, transcription regulation

Procedia PDF Downloads 29

26689 Blackcurrant-Associated Rhabdovirus: New Pathogen for Blackcurrants in the Baltic Sea Region

Authors: Gunta Resevica, Nikita Zrelovs, Ivars Silamikelis, Ieva Kalnciema, Helvijs Niedra, Gunārs Lācis, Toms Bartulsons, Inga Moročko-Bičevska, Arturs Stalažs, Kristīne Drevinska, Andris Zeltins, Ina Balke

Abstract:

Newly discovered viruses provide novel knowledge for basic phytovirus research, serve as tools for biotechnology and can be helpful in identification of epidemic outbreaks. Blackcurrant-associated rhabdovirus (BCaRV) have been discovered in USA germplasm collection samples from Russia and France. As it was reported in one accession originating from France it is unclear whether the material was already infected when it entered in the USA or it became infected while in collection in the USA. Due to that BCaRV was definite as non-EU viruses. According to ICTV classification BCaRV is representative of Blackcurrant betanucleorhabdovirus specie in genus Betanucleorhabdovirus (family Rhabdoviridae). Nevertheless, BCaRV impact on the host, transmission mechanisms and vectors are still unknown. In RNA-seq data pool from Ribes plants resistance gene study by high throughput sequencing (HTS) we observed differences between sample group gene transcript heat maps. Additional analysis of the whole data pool (total 393660492 of 150 bp long read pairs) by rnaSPAdes v 3.13.1 resulted into 14424 bases long contig with an average coverage of 684x with shared 99.5% identity to the previously reported first complete genome of BCaRV (MF543022.1) using EMBOSS Needle. This finding proved BCaRV presence in EU and indicated that it might be relevant pathogen. In this study leaf tissue from twelve asymptomatic blackcurrant cv. Mara Eglite plants (negatively tested for blackcurrant reversion virus (BRV)) from Dobele, Latvia (56°36'31.9"N, 23°18'13.6"E) was collected and used for total RNA isolation with RNeasy Plant Mini Kit with minor modifications, followed by plant rRNA removal by a RiboMinus Plant Kit for RNA-Seq. HTS libraries were prepared using MGI Easy RNA Directional Library Prep Set for 16 reactions to obtain 150 bp pair-end reads. Libraries were pooled, circularized and cleaned and sequenced on DNBSEQ-G400 using PE150 flow cell. Additionally, all samples were tested by RT-PCR, and amplicons were directly sequenced by Sanger-based method. The contig representing the genome of BCaRV isolate Mara Eglite was deposited at European Nucleotide Archive under accession number OU015520. Those findings indicate a second evidence on the presence of this particular virus in the EU and further research on BCaRV prevalence in Ribes from other geographical areas should be performed. As there are no information on BCaRV impact on the host this should be investigated, regarding the fact that mixed infections with BRV and nucleorhabdoviruses are reported.

Keywords: BCaRV, Betanucleorhabdovirus, Ribes, RNA-seq

Procedia PDF Downloads 158

26688 Identification of Candidate Congenital Heart Defects Biomarkers by Applying a Random Forest Approach on DNA Methylation Data

Authors: Kan Yu, Khui Hung Lee, Eben Afrifa-Yamoah, Jing Guo, Katrina Harrison, Jack Goldblatt, Nicholas Pachter, Jitian Xiao, Guicheng Brad Zhang

Abstract:

Background and Significance of the Study: Congenital Heart Defects (CHDs) are the most common malformation at birth and one of the leading causes of infant death. Although the exact etiology remains a significant challenge, epigenetic modifications, such as DNA methylation, are thought to contribute to the pathogenesis of congenital heart defects. At present, no existing DNA methylation biomarkers are used for early detection of CHDs. The existing CHD diagnostic techniques are time-consuming and costly and can only be used to diagnose CHDs after an infant was born. The present study employed a machine learning technique to analyse genome-wide methylation data in children with and without CHDs with the aim to find methylation biomarkers for CHDs. Methods: The Illumina Human Methylation EPIC BeadChip was used to screen the genome‐wide DNA methylation profiles of 24 infants diagnosed with congenital heart defects and 24 healthy infants without congenital heart defects. Primary pre-processing was conducted by using RnBeads and limma packages. The methylation levels of top 600 genes with the lowest p-value were selected and further investigated by using a random forest approach. ROC curves were used to analyse the sensitivity and specificity of each biomarker in both training and test sample sets. The functionalities of selected genes with high sensitivity and specificity were then assessed in molecular processes. Major Findings of the Study: Three genes (MIR663, FGF3, and FAM64A) were identified from both training and validating data by random forests with an average sensitivity and specificity of 85% and 95%. GO analyses for the top 600 genes showed that these putative differentially methylated genes were primarily associated with regulation of lipid metabolic process, protein-containing complex localization, and Notch signalling pathway. The present findings highlight that aberrant DNA methylation may play a significant role in the pathogenesis of congenital heart defects.

Keywords: biomarker, congenital heart defects, DNA methylation, random forest

Procedia PDF Downloads 128