Search results for: nucleotide sequencing
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 727

Search results for: nucleotide sequencing

637 Analysis of the Lung Microbiome in Cystic Fibrosis Patients Using 16S Sequencing

Authors: Manasvi Pinnaka, Brianna Chrisman

Abstract:

Cystic fibrosis patients often develop lung infections that range anywhere in severity from mild to life-threatening due to the presence of thick and sticky mucus that fills their airways. Since many of these infections are chronic, they not only affect a patient’s ability to breathe but also increase the chances of mortality by respiratory failure. With a publicly available dataset of DNA sequences from bacterial species in the lung microbiome of cystic fibrosis patients, the correlations between different microbial species in the lung and the extent of deterioration of lung function were investigated. 16S sequencing technologies were used to determine the microbiome composition of the samples in the dataset. For the statistical analyses, referencing helped distinguish between taxonomies, and the proportions of certain taxa relative to another were determined. It was found that the Fusobacterium, Actinomyces, and Leptotrichia microbial types all had a positive correlation with the FEV1 score, indicating the potential displacement of these species by pathogens as the disease progresses. However, the dominant pathogens themselves, including Pseudomonas aeruginosa and Staphylococcus aureus, did not have statistically significant negative correlations with the FEV1 score as described by past literature. Examining the lung microbiology of cystic fibrosis patients can help with the prediction of the current condition of lung function, with the potential to guide doctors when designing personalized treatment plans for patients.

Keywords: bacterial infections, cystic fibrosis, lung microbiome, 16S sequencing

Procedia PDF Downloads 66
636 Microbial Dark Matter Analysis Using 16S rRNA Gene Metagenomics Sequences

Authors: Hana Barak, Alex Sivan, Ariel Kushmaro

Abstract:

Microorganisms are the most diverse and abundant life forms on Earth and account for a large portion of the Earth’s biomass and biodiversity. To date though, our knowledge regarding microbial life is lacking, as it is based mainly on information from cultivated organisms. Indeed, microbiologists have borrowed from astrophysics and termed the ‘uncultured microbial majority’ as ‘microbial dark matter’. The realization of how diverse and unexplored microorganisms are, actually stems from recent advances in molecular biology, and in particular from novel methods for sequencing microbial small subunit ribosomal RNA genes directly from environmental samples termed next-generation sequencing (NGS). This has led us to use NGS that generates several gigabases of sequencing data in a single experimental run, to identify and classify environmental samples of microorganisms. In metagenomics sequencing analysis (both 16S and shotgun), sequences are compared to reference databases that contain only small part of the existing microorganisms and therefore their taxonomy assignment may reveal groups of unknown microorganisms or origins. These unknowns, or the ‘microbial sequences dark matter’, are usually ignored in spite of their great importance. The goal of this work was to develop an improved bioinformatics method that enables more complete analyses of the microbial communities in numerous environments. Therefore, NGS was used to identify previously unknown microorganisms from three different environments (industrials wastewater, Negev Desert’s rocks and water wells at the Arava valley). 16S rRNA gene metagenome analysis of the microorganisms from those three environments produce about ~4 million reads for 75 samples. Between 0.1-12% of the sequences in each sample were tagged as ‘Unassigned’. Employing relatively simple methodology for resequencing of original gDNA samples through Sanger or MiSeq Illumina with specific primers, this study demonstrates that the mysterious ‘Unassigned’ group apparently contains sequences of candidate phyla. Those unknown sequences can be located on a phylogenetic tree and thus provide a better understanding of the ‘sequences dark matter’ and its role in the research of microbial communities and diversity. Studying this ‘dark matter’ will extend the existing databases and could reveal the hidden potential of the ‘microbial dark matter’.

Keywords: bacteria, bioinformatics, dark matter, Next Generation Sequencing, unknown

Procedia PDF Downloads 217
635 Bioinformatics Approach to Support Genetic Research in Autism in Mali

Authors: M. Kouyate, M. Sangare, S. Samake, S. Keita, H. G. Kim, D. H. Geschwind

Abstract:

Background & Objectives: Human genetic studies can be expensive, even unaffordable, in developing countries, partly due to the sequencing costs. Our aim is to pilot the use of bioinformatics tools to guide scientifically valid, locally relevant, and economically sound autism genetic research in Mali. Methods: The following databases, NCBI, HGMD, and LSDB, were used to identify hot point mutations. Phenotype, transmission pattern, theoretical protein expression in the brain, the impact of the mutation on the 3D structure of the protein) were used to prioritize selected autism genes. We used the protein database, Modeller, and clustal W. Results: We found Mef2c (Gly27Ala/Leu38Gln), Pten (Thr131IIle), Prodh (Leu289Met), Nme1 (Ser120Gly), and Dhcr7 (Pro227Thr/Glu224Lys). These mutations were associated with endonucleases BseRI, NspI, PfrJS2IV, BspGI, BsaBI, and SpoDI, respectively. Gly27Ala/Leu38Gln mutations impacted the 3D structure of the Mef2c protein. Mef2c protein sequences across species showed a high percentage of similarity with a highly conserved MADS domain. Discussion: Mef2c, Pten, Prodh, Nme1, and Dhcr 7 gene mutation frequencies in the Malian population will be very informative. PCR coupled with restriction enzyme digestion can be used to screen the targeted gene mutations. Sanger sequencing will be used for confirmation only. This will cut down considerably the sequencing cost for gene-to-gene mutation screening. The knowledge of the 3D structure and potential impact of the mutations on Mef2c protein informed the protein family and altered function (ex. Leu38Gln). Conclusion & Future Work: Bio-informatics will positively impact autism research in Mali. Our approach can be applied to another neuropsychiatric disorder.

Keywords: bioinformatics, endonucleases, autism, Sanger sequencing, point mutations

Procedia PDF Downloads 49
634 Molecular Characterization of Ovine Herpesvirus 2 Strains Based on Selected Glycoprotein and Tegument Genes

Authors: Fulufhelo Amanda Doboro, Kgomotso Sebeko, Stephen Njiro, Moritz Van Vuuren

Abstract:

Ovine herpesvirus 2 (OvHV-2) genome obtained from the lymphopblastoid cell line of a BJ1035 cow was recently sequenced in the United States of America (USA). Information on the sequences of OvHV-2 genes obtained from South African strains from bovine or other African countries and molecular characterization of OvHV-2 is not documented. Present investigation provides information on the nucleotide and derived amino acid sequences and genetic diversity of Ov 7, Ov 8 ex2, ORF 27 and ORF 73 genes, of these genes from OvHV-2 strains circulating in South Africa. Gene-specific primers were designed and used for PCR of DNA extracted from 42 bovine blood samples that previously tested positive for OvHV-2. The expected PCR products of 495 bp, 253 bp, 890 bp and 1632 bp respectively for Ov 7, Ov 8 ex2, ORF 27 and ORF 73 genes were sequenced and multiple sequence analysis done on the selected regions of the sequenced PCR products. Two genotypes for ORF 27 and ORF 73 gene sequences, and three genotypes for Ov 7 and Ov 8 ex2 gene sequences were identified, and similar groupings for the derived amino acid sequences were obtained for each gene. Nucleotide and amino acid sequence variations that led to the identification of the different genotypes included SNPs, deletions and insertions. Sequence analysis of Ov 7 and ORF 27 genes revealed variations that distinguished between sequences from SA and reference OvHV-2 strains. The implication of geographic origin among SA sequences was difficult to evaluate because of random distribution of genotypes in the different provinces, for each gene. However, socio-economic factors such as migration of people with animals, or transportation of animals for agricultural or business use from one province to another are most likely to be responsible for this observation. The sequence variations observed in this study have no impact on the antibody binding activities of glycoproteins encoded by Ov 7, Ov 8 ex2 and ORF 27 genes, as determined by prediction of the presence of B cell epitopes using BepiPred 1.0. The findings of this study will be used for selection of gene candidates for the development of diagnostic assays and vaccine development as well.

Keywords: amino acid, genetic diversity, genes, nucleotide

Procedia PDF Downloads 464
633 Development and Performance of Aerobic Granular Sludge at Elevated Temperature

Authors: Mustafa M. Bob, Siti Izaidah Azmi, Mohd Hakim Ab Halim, Nur Syahida Abdul Jamal, Aznah Nor-Anuar, Zaini Ujang

Abstract:

In this research, the formation and development of aerobic granular sludge (AGS) for domestic wastewater treatment application in hot climate conditions was studied using a sequencing batch reactor (SBR). The performance of the developed AGS in the removal of organic matter and nutrients from wastewater was also investigated. The operation of the reactor was based on the sequencing batch system with a complete cycle time of 3 hours that included feeding, aeration, settling, discharging and idling. The reactor was seeded with sludge collected from the municipal wastewater treatment plant in Madinah city, Saudi Arabia and operated at a temperature of 40ºC using synthetic wastewater as influent. Results showed that granular sludge was developed after an operation period of 30 days. The developed granular sludge had a good settling ability with the average size of the granules ranging from 1.03 to 2.42 mm. The removal efficiency of chemical oxygen demand (COD), ammonia nitrogen (NH3-N) and total phosphorus (TP) were 87.31%, 91.93% and 61.25% respectively. These results show that AGS can be developed at elevated temperatures and it is a promising technique to treat domestic wastewater in hot and low humidity climate conditions such as those encountered in Saudi Arabia.

Keywords: aerobic granular sludge, hot climate, sequencing batch reactor, domestic wastewater treatment

Procedia PDF Downloads 331
632 Pollutants Removal from Synthetic Wastewater by the Combined Electrochemical Sequencing Batch Reactor

Authors: Amin Mojiri, Akiyoshi Ohashi, Tomonori Kindaichi

Abstract:

Synthetic domestic wastewater was treated via combining treatment methods, including electrochemical oxidation, adsorption, and sequencing batch reactor (SBR). In the upper part of the reactor, an anode and a cathode (Ti/RuO2-IrO2) were organized in parallel for the electrochemical oxidation procedure. Sodium sulfate (Na2SO4) with a concentration of 2.5 g/L was applied as the electrolyte. The voltage and current were fixed on 7.50 V and 0.40 A, respectively. Then, 15% working value of the reactor was filled by activated sludge, and 85% working value of the reactor was added with synthetic wastewater. Powdered cockleshell, 1.5 g/L, was added in the reactor to do ion-exchange. Response surface methodology was employed for statistical analysis. Reaction time (h) and pH were considered as independent factors. A total of 97.0% biochemical oxygen demand, 99.9% phosphorous and 88.6% cadmium were eliminated at the optimum reaction time (80.0 min) and pH (6.4).

Keywords: adsorption, electrochemical oxidation, metals, SBR

Procedia PDF Downloads 177
631 Liquid Biopsy Based Microbial Biomarker in Coronary Artery Disease Diagnosis

Authors: Eyup Ozkan, Ozkan U. Nalbantoglu, Aycan Gundogdu, Mehmet Hora, A. Emre Onuk

Abstract:

The human microbiome has been associated with cardiological conditions and this relationship is becoming to be defined beyond the gastrointestinal track. In this study, we investigate the alteration in circulatory microbiota in the context of Coronary Artery Disease (CAD). We received circulatory blood samples from suspected CAD patients and maintain 16S ribosomal RNA sequencing to identify each patient’s microbiome. It was found that Corynebacterium and Methanobacteria genera show statistically significant differences between healthy and CAD patients. The overall biodiversities between the groups were observed to be different revealed by machine learning classification models. We also achieve and demonstrate the performance of a diagnostic method using circulatory blood microbiome-based estimation.

Keywords: coronary artery disease, blood microbiome, machine learning, angiography, next-generation sequencing

Procedia PDF Downloads 124
630 To Study the Performance of FMS under Different Manufacturing Strategies

Authors: Mohammed Ali

Abstract:

A flexible manufacturing system has been studied under different manufacturing strategies. The aim of this paper is to test the impact of number of pallets and routing flexibility (design strategy) on system performance operating at different sequencing and dispatching rules (control strategies) at unbalanced load condition (planning strategies). A computer simulation model is developed to evaluate the effects of aforementioned strategies on the make-span time, which is taken as the system performance measure. The impact of number of pallets is shown with the different levels of routing flexibility. In this paper, the same manufacturing system is modeled under different combination of sequencing and dispatching rules. The result of the simulation shows that there is definite range of pallets for each level of routing flexibility at which the systems performs satisfactorily.

Keywords: flexible manufacturing system, manufacturing, strategy, makespan

Procedia PDF Downloads 644
629 Rapid Start-Up and Efficient Long-Term Nitritation of Low Strength Ammonium Wastewater with a Sequencing Batch Reactor Containing Immobilized Cells

Authors: Hammad Khan, Wookeun Bae

Abstract:

Major concerns regarding nitritation of low-strength ammonium wastewaters include low ammonium loading rates (usually below 0.2 kg/m3-d) and uncertainty about long-term stability of the process. The purpose of this study was to test a sequencing batch reactor (SBR) filled with cell-immobilized polyethylene glycol (PEG) pellets to see if it could achieve efficient and stable nitritation under various environmental conditions. SBR was fed with synthetic ammonium wastewater of 30±2 mg-N/L and pH: 8±0.05, maintaining the dissolved oxygen concentration of 1.7±0.2 mg/L and the temperature at 30±1oC. The reaction was easily converted to partial nitrification mode within a month by feeding relatively high ammonium substrate (~100 mg-N/L) in the beginning. We observed stable nitritation over 300 days with high ammonium loading rates (as high as ~1.1 kg-N/m3-d), nitrite accumulation rates (mostly over 97%) and ammonium removal rate (mostly over 95%). DO was a major limiting substrate when the DO concentration was below ~4 mg/L and the NH4+-N concentration was above 5 mg/L, giving almost linear increase in the ammonium oxidation rate with the bulk DO increase. Low temperatures mainly affected the reaction rate, which could be compensated for by increasing the pellet volume (i.e. biomass). Our results demonstrated that an SBR filled with small cell-immobilized PEG pellets could achieve very efficient and stable nitritation of a low-strength ammonium wastewater.

Keywords: ammonium loading rate (ALR), cell-immobilization, long-term nitritation, sequencing batch reactor (SBR), sewage treatment

Procedia PDF Downloads 246
628 Insights into Archaeological Human Sample Microbiome Using 16S rRNA Gene Sequencing

Authors: Alisa Kazarina, Guntis Gerhards, Elina Petersone-Gordina, Ilva Pole, Viktorija Igumnova, Janis Kimsis, Valentina Capligina, Renate Ranka

Abstract:

Human body is inhabited by a vast number of microorganisms, collectively known as the human microbiome, and there is a tremendous interest in evolutionary changes in human microbial ecology, diversity and function. The field of paleomicrobiology, study of ancient human microbiome, is powered by modern techniques of Next Generation Sequencing (NGS), which allows extracting microbial genomic data directly from archaeological sample of interest. One of the major techniques is 16S rRNA gene sequencing, by which certain 16S rRNA gene hypervariable regions are being amplified and sequenced. However, some limitations of this method exist including the taxonomic precision and efficacy of different regions used. The aim of this study was to evaluate the phylogenetic sensitivity of different 16S rRNA gene hypervariable regions for microbiome studies in the archaeological samples. Towards this aim, archaeological bone samples and corresponding soil samples from each burial environment were collected in Medieval cemeteries in Latvia. The Ion 16S™ Metagenomics Kit targeting different 16S rRNA gene hypervariable regions was used for library construction (Ion Torrent technologies). Sequenced data were analysed by using appropriate bioinformatic techniques; alignment and taxonomic representation was done using Mothur program. Sequences of most abundant genus were further aligned to E. coli 16S rRNA gene reference sequence using MEGA7 in order to identify the hypervariable region of the segment of interest. Our results showed that different hypervariable regions had different discriminatory power depending on the groups of microbes, as well as the nature of samples. On the basis of our results, we suggest that wider range of primers used can provide more accurate recapitulation of microbial communities in archaeological samples. Acknowledgements. This work was supported by the ERAF grant Nr. 1.1.1.1/16/A/101.

Keywords: 16S rRNA gene, ancient human microbiome, archaeology, bioinformatics, genomics, microbiome, molecular biology, next-generation sequencing

Procedia PDF Downloads 163
627 Association of Non Synonymous SNP in DC-SIGN Receptor Gene with Tuberculosis (Tb)

Authors: Saima Suleman, Kalsoom Sughra, Naeem Mahmood Ashraf

Abstract:

Mycobacterium tuberculosis is a communicable chronic illness. This disease is being highly focused by researchers as it is present approximately in one third of world population either in active or latent form. The genetic makeup of a person plays an important part in producing immunity against disease. And one important factor association is single nucleotide polymorphism of relevant gene. In this study, we have studied association between single nucleotide polymorphism of CD-209 gene (encode DC-SIGN receptor) and patients of tuberculosis. Dry lab (in silico) and wet lab (RFLP) analysis have been carried out. GWAS catalogue and GEO database have been searched to find out previous association data. No association study has been found related to CD-209 nsSNPs but role of CD-209 in pulmonary tuberculosis have been addressed in GEO database.Therefore, CD-209 has been selected for this study. Different databases like ENSEMBLE and 1000 Genome Project has been used to retrieve SNP data in form of VCF file which is further submitted to different software to sort SNPs into benign and deleterious. Selected SNPs are further annotated by using 3-D modeling techniques using I-TASSER online software. Furthermore, selected nsSNPs were checked in Gujrat and Faisalabad population through RFLP analysis. In this study population two SNPs are found to be associated with tuberculosis while one nsSNP is not found to be associated with the disease.

Keywords: association, CD209, DC-SIGN, tuberculosis

Procedia PDF Downloads 282
626 Complete Genome Sequence Analysis of Pasteurella multocida Subspecies multocida Serotype A Strain PMTB2.1

Authors: Shagufta Jabeen, Faez J. Firdaus Abdullah, Zunita Zakaria, Nurulfiza M. Isa, Yung C. Tan, Wai Y. Yee, Abdul R. Omar

Abstract:

Pasteurella multocida (PM) is an important veterinary opportunistic pathogen particularly associated with septicemic pasteurellosis, pneumonic pasteurellosis and hemorrhagic septicemia in cattle and buffaloes. P. multocida serotype A has been reported to cause fatal pneumonia and septicemia. Pasteurella multocida subspecies multocida of serotype A Malaysian isolate PMTB2.1 was first isolated from buffaloes died of septicemia. In this study, the genome of P. multocida strain PMTB2.1 was sequenced using third-generation sequencing technology, PacBio RS2 system and analyzed bioinformatically via de novo analysis followed by in-depth analysis based on comparative genomics. Bioinformatics analysis based on de novo assembly of PacBio raw reads generated 3 contigs followed by gap filling of aligned contigs with PCR sequencing, generated a single contiguous circular chromosome with a genomic size of 2,315,138 bp and a GC content of approximately 40.32% (Accession number CP007205). The PMTB2.1 genome comprised of 2,176 protein-coding sequences, 6 rRNA operons and 56 tRNA and 4 ncRNAs sequences. The comparative genome sequence analysis of PMTB2.1 with nine complete genomes which include Actinobacillus pleuropneumoniae, Haemophilus parasuis, Escherichia coli and five P. multocida complete genome sequences including, PM70, PM36950, PMHN06, PM3480, PMHB01 and PMTB2.1 was carried out based on OrthoMCL analysis and Venn diagram. The analysis showed that 282 CDs (13%) are unique to PMTB2.1and 1,125 CDs with orthologs in all. This reflects overall close relationship of these bacteria and supports the classification in the Gamma subdivision of the Proteobacteria. In addition, genomic distance analysis among all nine genomes indicated that PMTB2.1 is closely related with other five Pasteurella species with genomic distance less than 0.13. Synteny analysis shows subtle differences in genetic structures among different P.multocida indicating the dynamics of frequent gene transfer events among different P. multocida strains. However, PM3480 and PM70 exhibited exceptionally large structural variation since they were swine and chicken isolates. Furthermore, genomic structure of PMTB2.1 is more resembling that of PM36950 with a genomic size difference of approximately 34,380 kb (smaller than PM36950) and strain-specific Integrative and Conjugative Elements (ICE) which was found only in PM36950 is absent in PMTB2.1. Meanwhile, two intact prophages sequences of approximately 62 kb were found to be present only in PMTB2.1. One of phage is similar to transposable phage SfMu. The phylogenomic tree was constructed and rooted with E. coli, A. pleuropneumoniae and H. parasuis based on OrthoMCL analysis. The genomes of P. multocida strain PMTB2.1 were clustered with bovine isolates of P. multocida strain PM36950 and PMHB01 and were separated from avian isolate PM70 and swine isolates PM3480 and PMHN06 and are distant from Actinobacillus and Haemophilus. Previous studies based on Single Nucleotide Polymorphism (SNPs) and Multilocus Sequence Typing (MLST) unable to show a clear phylogenetic relatedness between Pasteurella multocida and the different host. In conclusion, this study has provided insight on the genomic structure of PMTB2.1 in terms of potential genes that can function as virulence factors for future study in elucidating the mechanisms behind the ability of the bacteria in causing diseases in susceptible animals.

Keywords: comparative genomics, DNA sequencing, phage, phylogenomics

Procedia PDF Downloads 154
625 Studies of Single Nucleotide Polymorphism of Proteosomal Gene Complex and Their Association with HBV Infection Risk in India

Authors: Jasbir Singh, Devender Kumar, Davender Redhu, Surender Kumar, Vandana Bhardwaj

Abstract:

Single Nucleotide polymorphism (SNP) of proteosomal gene complex is involved in the pathogenesis of hepatitis B Virus (HBV) infection. Some of such proteosomal gene complex are large multifunctional proteins (LMP) and antigen associated transporters that help in antigen presentation. Both are involved in intracellular processing and presentation of viral antigens in association with Major Histocompatability Complex (MHC) Class I molecules. A total of hundred each of hepatitis B virus infected and control samples from northern India were studied. Genomic DNA was extracted from all studied samples and PCR-RFLP method was used for genotyping at different positions of LMP genes. Genotypes at a given position were inferred from the pattern of bands and genotype frequencies and haplotype frequencies were also calculated. Homozygous SNP {A>C} was observed at codon 145 of LMP7 gene and having a protective role against HBV as there was statistically significant high distribution of this SNP among controls than cases. Heterozygous SNP {A>C} was observed at codon 145 of LMP7 gene and made individuals more susceptible to HBV infection as there was statistically significant high distribution of this SNP among cases than control. SNP {T>C} was observed at codon 60 of LMP2 gene but statistically significant differences were not observed among controls and cases. For codon 145 of LMP7 and codon 60 of LMP2 genes, four haplotypes were constructed. Haplotype I (LMP2 ‘C’ and LMP7 ‘A’) made individuals carrying it more susceptible to HBV infection as there was statistically significant high distribution of this haplotype among cases than control. Haplotype II (LMP2 ‘C’ and LMP7 ‘C’) made individuals carrying it more immune to HBV infection as there was statistically significant high distribution of this haplotype among control than cases. Thus it can be concluded that homozygous SNP {A>C} at codon 145 of LMP7 and Haplotype II (LMP2 ‘C’ and LMP7 ‘C’) has a protective role against HBV infection whereas heterozygous SNP {A>C} at codon 145 of LMP7 and Haplotype I (LMP2 ‘C’ and LMP7 ‘A’) made individuals more susceptible to HBV infection.

Keywords: Hepatitis B Virus, single nucleotide polymorphism, low molecular weight proteins, transporters associated with antigen presentation

Procedia PDF Downloads 283
624 A Pipeline for Detecting Copy Number Variation from Whole Exome Sequencing Using Comprehensive Tools

Authors: Cheng-Yang Lee, Petrus Tang, Tzu-Hao Chang

Abstract:

Copy number variations (CNVs) have played an important role in many kinds of human diseases, such as Autism, Schizophrenia and a number of cancers. Many diseases are found in genome coding regions and whole exome sequencing (WES) is a cost-effective and powerful technology in detecting variants that are enriched in exons and have potential applications in clinical setting. Although several algorithms have been developed to detect CNVs using WES and compared with other algorithms for finding the most suitable methods using their own samples, there were not consistent datasets across most of algorithms to evaluate the ability of CNV detection. On the other hand, most of algorithms is using command line interface that may greatly limit the analysis capability of many laboratories. We create a series of simulated WES datasets from UCSC hg19 chromosome 22, and then evaluate the CNV detective ability of 19 algorithms from OMICtools database using our simulated WES datasets. We compute the sensitivity, specificity and accuracy in each algorithm for validation of the exome-derived CNVs. After comparison of 19 algorithms from OMICtools database, we construct a platform to install all of the algorithms in a virtual machine like VirtualBox which can be established conveniently in local computers, and then create a simple script that can be easily to use for detecting CNVs using algorithms selected by users. We also build a table to elaborate on many kinds of events, such as input requirement, CNV detective ability, for all of the algorithms that can provide users a specification to choose optimum algorithms.

Keywords: whole exome sequencing, copy number variations, omictools, pipeline

Procedia PDF Downloads 283
623 Anaerobic Digestion Batch Study of Taxonomic Variations in Microbial Communities during Adaptation of Consortium to Different Lignocellulosic Substrates Using Targeted Sequencing

Authors: Priyanka Dargode, Suhas Gore, Manju Sharma, Arvind Lali

Abstract:

Anaerobic digestion has been widely used for production of methane from different biowastes. However, the complexity of microbial communities involved in the process is poorly understood. The performance of biogas production process concerning the process productivity is closely coupled to its microbial community structure and syntrophic interactions amongst the community members. The present study aims at understanding taxonomic variations occurring in any starter inoculum when acclimatised to different lignocellulosic biomass (LBM) feedstocks relating to time of digestion. The work underlines use of high throughput Next Generation Sequencing (NGS) for validating the changes in taxonomic patterns of microbial communities. Biomethane Potential (BMP) batches were set up with different pretreated and non-pretreated LBM residues using the same microbial consortium and samples were withdrawn for studying the changes in microbial community in terms of its structure and predominance with respect to changes in metabolic profile of the process. DNA of samples withdrawn at different time intervals with reference to performance changes of the digestion process, was extracted followed by its 16S rRNA amplicon sequencing analysis using Illumina Platform. Biomethane potential and substrate consumption was monitored using Gas Chromatography(GC) and reduction in COD (Chemical Oxygen Demand) respectively. Taxonomic analysis by QIIME server data revealed that microbial community structure changes with different substrates as well as at different time intervals. It was observed that biomethane potential of each substrate was relatively similar but, the time required for substrate utilization and its conversion to biomethane was different for different substrates. This could be attributed to the nature of substrate and consequently the discrepancy between the dominance of microbial communities with regards to different substrate and at different phases of anaerobic digestion process. Knowledge of microbial communities involved would allow a rational substrate specific consortium design which will help to reduce consortium adaptation period and enhance the substrate utilisation resulting in improved efficacy of biogas process.

Keywords: amplicon sequencing, biomethane potential, community predominance, taxonomic analysis

Procedia PDF Downloads 499
622 Association of 105A/C IL-18 Gene Single Nucleotide Polymorphism with House Dust Mite Allergy in an Atopic Filipino Population

Authors: Eisha Vienna M. Fernandez, Cristan Q. Cabanilla, Hiyasmin Lim, John Donnie A. Ramos

Abstract:

Allergy is a multifactorial disease affecting a significant proportion of the population. It is developed through the interaction of allergens and the presence of certain polymorphisms in various susceptibility genes. In this study, the correlation of the 105A/C single nucleotide polymorphism (SNP) of the IL-18 gene and house dust mite-specific IgE among Filipino allergic and non-allergic population was investigated. Atopic status was defined by serum total IgE concentration of ≥100 IU/mL, while house dust mite allergy was defined by specific IgE value ≥ +1SD of IgE of nonatopic participants. Two hundred twenty match-paired Filipino cases and controls aged 6-60 were the subjects of this investigation. The level of total IgE and Specific IgE were measured using Enzyme-Linked Immunosorbent Assay (ELISA) while Polymerase Chain Reaction – Restriction Fragment Length Polymorphism (PCR-RFLP) analysis was used in the SNP detection. Sensitization profiles of the allergic patients revealed that 97.3% were sensitized to Blomia tropicalis, 40.0% to Dermatophagoides farinae, and 29.1% to Dermatophagoides pteronyssinus. Multiple sensitization to HDMs was also observed among the 47.27% of the atopic participants. Any of the allergy classes of the atopic triad were exhibited by the cases (allergic asthma: 48.18%; allergic rhinitis: 62.73%; atopic dermatitis: 19.09%), and two or all of these atopic states are concurrently occurring in 26.36% of the cases. A greater proportion of the atopic participants with allergic asthma and allergic rhinitis were sensitized to D. farinae, and D. pteronyssinus, while more of those with atopic dermatitis were sensitized to D. pteronyssinus than D. farinae. Results show that there is overrepresentation of the allele “A” of the 105A/C IL-18 gene SNP in both cases and control groups of the population. The genotype that predominate the population is the heterozygous “AC”, followed by the homozygous wild “AA”, and the homozygous variant “CC” being the least. The study confirmed a positive association between serum specific IgE against B. tropicalis and D. pteronyssinus and the allele “C” (Bt P=0.021, Dp P=0.027) and “AC” (Bt P=0.003, Dp P=0.026) genotype. Findings also revealed that the genotypes “AA” (OR:1.217; 95% CI: 0.701-2.113) and “CC” (OR, 3.5; 95% CI: 0.727-16.849) increase the risk of developing allergy. This indicates that the 105A/C IL-18 gene SNP is a candidate genetic marker for HDM allergy among Filipino patients.

Keywords: house dust mite allergy, interleukin-18 (IL-18), single nucleotide polymorphism,

Procedia PDF Downloads 436
621 Analysis of Pathogen Populations Occurring in Oilseed Rape Using DNA Sequencing Techniques

Authors: Elizabeth Starzycka-Korbas, Michal Starzycki, Wojciech Rybinski, Mirosława Dabert

Abstract:

For a few years, the populations of pathogenic fungi occurring in winter oilseed rape in Malyszyn were analyzed. Brassica napus L. in Poland and in the world is a source of energy for both the men (oil), and animals, as post-extraction middling, as well as a motor fuel (oil, biofuel) therefore studies of this type are very important. The species composition of pathogenic fungi can be an indicator of seed yield. The occurrence of oilseed rape pathogens during several years were analyzed using the sequencing method DNA ITS. The results were compared in the gene bank using the program NCBI / BLAST. In field conditions before harvest of oilseed rape presence of pathogens infesting B. napus has been assessed. For example, in 2015, 150 samples have been isolated and applied to PDA medium for the identification of belonging species. From all population has been selected mycelium of 83 isolates which were sequenced. Others (67 isolates) were pathogenic fungi of the genus Alternaria which are easily to recognize. The population of pathogenic species on oilseed rape have been identified after analyzing the DNA ITS and include: Leptosphaeria sp. 38 (L. maculans 25, L. biglobosa 13), Alternaria sp. 29, Fusarium sp. 3, Sclerotinia sclerotiorum 7, heterogeneous 6, total of 83 isolates. The genus Alternaria sp. fungi wear the largest share of B. napus pathogens in particular years. Another dangerous species for oilseed rape was Leptosphaeria sp. Populations of pathogens in each year were different. The number of pathogens occurring in the field and their composition is very important for breeders and farmers because of the possible selection of the most resistant genotypes for sowing in the next growing season.

Keywords: B. napus, DNA ITS Sequencing, pathogenic fungi, population

Procedia PDF Downloads 261
620 Exploring the Correlation between Body Constitution of an Individual as Per Ayurveda and Gut Microbiome in Healthy, Multi Ethnic Urban Population in Bangalore, India

Authors: Shalini TV, Gangadharan GG, Sriranjini S Jaideep, ASN Seshasayee, Awadhesh Pandit

Abstract:

Introduction: Prakriti (body-mind constitution of an individual) is a conventional, customized and unique understanding of which is essential for the personalized medicine described in Ayurveda, Indian System of Medicine. Based on the Doshas( functional, bio humoral unit in the body), individuals are categorized into three major Prakriti- Vata, Pitta, and Kapha. The human gut microbiome hosts plenty of highly diverse and metabolically active microorganisms, mainly dominated by the bacteria, which are known to influence the physiology of an individual. Few researches have shown the correlation between the Prakriti and the biochemical parameters. In this study, an attempt was made to explore any correlation between the Prakriti (phenotype of an individual) with the Genetic makeup of the gut microbiome in healthy individuals. Materials and methods: 270 multi-ethnic, healthy volunteers of both sex with the age group between 18 to 40 years, with no history of antibiotics in the last 6 months were recruited into three groups of Vata, Pitta, and Kapha. The Prakriti of the individual was determined using Ayusoft, a software designed by CDAC, Pune, India. The volunteers were subjected to initial screening for the assessment of their height, weight, Body Mass Index, Vital signs and Blood investigations to ensure they are healthy. The stool and saliva samples of the recruited volunteers were collected as per the standard operating procedure developed, and the bacterial DNA was isolated using Qiagen kits. The extracted DNA was subjected to 16s rRNA sequencing using the Illumina kits. The sequencing libraries are targeting the variable V3 and V4 regions of the 16s rRNA gene. Paired sequencing was done on the MiSeq system and data were analyzed using the CLC Genomics workbench 11. Results: The 16s rRNA sequencing of the V3 and V4 regions showed a diverse pattern in both the oral and stool microbial DNA. The study did not reveal any specific pattern of bacterial flora amongst the Prakriti. All the p-values were more than the effective alpha values for all OTUs in both the buccal cavity and stool samples. Therefore, there was no observed significant enrichment of an OTU in the patient samples from either the buccal cavity or stool samples. Conclusion: In healthy volunteers of multi-ethnicity, due to the influence of the various factors, the correlation between the Prakriti and the gut microbiome was not seen.

Keywords: gut microbiome, ayurveda Prakriti, sequencing, multi-ethnic urban population

Procedia PDF Downloads 105
619 Molecular Diagnosis of Influenza Strains Was Carried Out on Patients of the Social Security Clinic in Karaj Using the RT-PCR Technique

Authors: A. Ferasat, S. Rostampour Yasouri

Abstract:

Seasonal flu is a highly contagious infection caused by influenza viruses. These viruses undergo genetic changes that result in new epidemics across the globe. Medical attention is crucial in severe cases, particularly for the elderly, frail, and those with chronic illnesses, as their immune systems are often weaker. The purpose of this study was to detect new subtypes of the influenza A virus rapidly using a specific RT-PCR method based on the HA gene (hemagglutinin). In the winter and spring of 2022_2023, 120 embryonated egg samples were cultured, suspected of seasonal influenza. RNA synthesis, followed by cDNA synthesis, was performed. Finally, the PCR technique was applied using a pair of specific primers designed based on the HA gene. The PCR product was identified after purification, and the nucleotide sequence of purified PCR products was compared with the sequences in the gene bank. The results showed a high similarity between the sequence of the positive samples isolated from the patients and the sequence of the new strains isolated in recent years. This RT-PCR technique is entirely specific in this study, enabling the detection and multiplication of influenza and its subspecies from clinical samples. The RT-PCR technique based on the HA gene, along with sequencing, is a fast, specific, and sensitive diagnostic method for those infected with influenza viruses and its new subtypes. Rapid molecular diagnosis of influenza is essential for suspected people to control and prevent the spread of the disease to others. It also prevents the occurrence of secondary (sometimes fatal) pneumonia that results from influenza and pathogenic bacteria. The critical role of rapid diagnosis of new strains of influenza is to prepare a drug vaccine against the latest viruses that did not exist in the community last year and are entirely new viruses.

Keywords: influenza, molecular diagnosis, patients, RT-PCR technique

Procedia PDF Downloads 33
618 Frequency of Polymorphism of Mrp1/Abcc1 And Mrp2/Abcc2 in Healthy Volunteers of the Center Savannah (Colombia)

Authors: R. H. Bustos, L. Martinez, J. García, F. Suárez

Abstract:

MRP1 (Multi-drug resistance associated protein 1) and MRP2 (Multi-drug resistance associated protein 2) are two proteins belonging to the transporters of ABC (ATP-Binding Cassette). These transporter proteins are involved in the efflux of several biological drugs and xenobiotic and also in multiple physiological, pathological and pharmacological processes. Evidence has been found that there is a correlation among different polymorphisms found and their clinical implication in the resistance to antiepileptic, chemotherapy and anti-infectious drugs. In our study, exonic regions of MRP1/ABCC1 y MRP2/ABCC2 were studied in the Colombian population, specifically in the region of the central Savannah (Cundinamarca) to determinate SNP (Single Nucleotide Polymorphisms) and determinate its allele frequency and its genomics frequency. Results showed that for our population, SNP are found that have been previously reported for MRP1/ABCC1 (rs200647436, rs200624910, rs150214567) as well as for MRP2/ABCC2 (rs2273697, rs3740066, rs142573385, rs17216212). In addition, 13 new SNP were identified. Evidences show an important clinic correlation for polymorphisms rs3740066 and rs2273697. The study object population displays genetic variability as compared to the one reported in other populations.

Keywords: ATP-binding cassette (ABCC), Colombian population, multidrug-resistance protein (MRP), pharmacogenetic, single nucleotide polymorphism (SNP)

Procedia PDF Downloads 296
617 Genetic Characterization of Acanthamoeba Isolates from Amoebic Keratitis Patients

Authors: Sumeeta Khurana, Kirti Megha, Amit Gupta, Rakesh Sehgal

Abstract:

Background: Amoebic keratitis is a painful vision threatening infection caused by a free living pathogenic amoeba Acanthamoeba. It can be misdiagnosed and very difficult to treat if not suspected early. The epidemiology of Acanthamoeba genotypes causing infection in our geographical area is not yet known to the best of our knowledge. Objective: To characterize Acanthamoeba isolates from amoebic keratitis patients. Methods: A total of 19 isolates obtained from patients with amoebic keratitis presenting to the Advanced Eye Centre at Postgraduate Institute of Medical Education and Research, a tertiary care centre of North India over a period of last 10 years were included. Their corneal scrapings, lens solution and lens case (in case of lens wearer) were collected for microscopic examination, culture and molecular diagnosis. All the isolates were maintained in the Non Nutrient agar culture medium overlaid with E.coli and 13 strains were axenised and maintained in modified Peptone Yeast Dextrose Agar. Identification of Acanthamoeba genotypes was based on amplification of diagnostic fragment 3 (DF3) region of the 18srRNA gene followed by sequencing. Nucleotide similarity search was performed by BLAST search of sequenced amplicons in GenBank database (http//www.ncbi.nlm.nih.gov/blast). Multiple Sequence alignments were determined by using CLUSTAL X. Results: Nine out of 19 Acanthamoeba isolates were found to belong to Genotype T4 followed by 6 isolates of genotype T11, 3 T5 and 1 T3 genotype. Conclusion: T4 is the predominant Acanthamoeba genotype in our geographical area. Further studies should focus on differences in pathogenicity of these genotypes and their clinical significance.

Keywords: Acanthamoeba, free living amoeba, keratitis, genotype, ocular

Procedia PDF Downloads 214
616 Discrete Breeding Swarm for Cost Minimization of Parallel Job Shop Scheduling Problem

Authors: Tarek Aboueldahab, Hanan Farag

Abstract:

Parallel Job Shop Scheduling Problem (JSP) is a multi-objective and multi constrains NP- optimization problem. Traditional Artificial Intelligence techniques have been widely used; however, they could be trapped into the local minimum without reaching the optimum solution, so we propose a hybrid Artificial Intelligence model (AI) with Discrete Breeding Swarm (DBS) added to traditional Artificial Intelligence to avoid this trapping. This model is applied in the cost minimization of the Car Sequencing and Operator Allocation (CSOA) problem. The practical experiment shows that our model outperforms other techniques in cost minimization.

Keywords: parallel job shop scheduling problem, artificial intelligence, discrete breeding swarm, car sequencing and operator allocation, cost minimization

Procedia PDF Downloads 150
615 The Relationship between Operating Condition and Sludge Wasting of an Aerobic Suspension-Sequencing Batch Reactor (ASSBR) Treating Phenolic Wastewater

Authors: Ali Alattabi, Clare Harris, Rafid Alkhaddar, Ali Alzeyadi

Abstract:

Petroleum refinery wastewater (PRW) can be considered as one of the most significant source of aquatic environmental pollution. It consists of oil and grease along with many other toxic organic pollutants. In recent years, a new technique was implemented using different types of membranes and sequencing batch reactors (SBRs) to treat PRW. SBR is a fill and draw type sludge system which operates in time instead of space. Many researchers have optimised SBRs’ operating conditions to obtain maximum removal of undesired wastewater pollutants. It has gained more importance mainly because of its essential flexibility in cycle time. It can handle shock loads, requires less area for operation and easy to operate. However, bulking sludge or discharging floating or settled sludge during the draw or decant phase with some SBR configurations are still one of the problems of SBR system. The main aim of this study is to develop and innovative design for the SBR optimising the process variables to result is a more robust and efficient process. Several experimental tests will be developed to determine the removal percentages of chemical oxygen demand (COD), Phenol and nitrogen compounds from synthetic PRW. Furthermore, the dissolved oxygen (DO), pH and oxidation-reduction potential (ORP) of the SBR system will be monitored online to ensure a good environment for the microorganisms to biodegrade the organic matter effectively.

Keywords: petroleum refinery wastewater, sequencing batch reactor, hydraulic retention time, Phenol, COD, mixed liquor suspended solids (MLSS)

Procedia PDF Downloads 228
614 Allele Mining for Rice Sheath Blight Resistance by Whole-Genome Association Mapping in a Tail-End Population

Authors: Naoki Yamamoto, Hidenobu Ozaki, Taiichiro Ookawa, Youming Liu, Kazunori Okada, Aiping Zheng

Abstract:

Rice sheath blight is one of the destructive fungal diseases in rice. We have thought that rice sheath blight resistance is a polygenic trait. Host-pathogen interactions and secondary metabolites such as lignin and phytoalexins are likely to be involved in defense against R. solani. However, to our knowledge, it is still unknown how sheath blight resistance can be enhanced in rice breeding. To seek for an alternative genetic factor that contribute to sheath blight resistance, we mined relevant allelic variations from rice core collections created in Japan. Based on disease lesion length on detached leaf sheath, we selected 30 varieties of the top tail-end and the bottom tail-end, respectively, from the core collections to perform genome-wide association mapping. Re-sequencing reads for these varieties were used for calling single nucleotide polymorphisms among the 60 varieties to create a SNP panel, which contained 1,137,131 homozygous variant sites after filitering. Association mapping highlighted a locus on the long arm of chromosome 11, which is co-localized with three sheath blight QTLs, qShB11-2-TX, qShB11, and qSBR-11-2. Based on the localization of the trait-associated alleles, we identified an ankyryn repeat-containing protein gene (ANK-M) as an uncharacterized candidate factor for rice sheath blight resistance. Allelic distributions for ANK-M in the whole rice population supported the reliability of trait-allele associations. Gene expression characteristics were checked to evaluiate the functionality of ANK-M. Since an ANK-M homolog (OsPIANK1) in rice seems a basal defense regulator against rice blast and bacterial leaf blight, ANK-M may also play a role in the rice immune system.

Keywords: allele mining, GWAS, QTL, rice sheath blight

Procedia PDF Downloads 47
613 Performance Evaluation of Flexible Manufacturing System: A Simulation Study

Authors: Mohammed Ali

Abstract:

In this paper, evaluation of flexible manufacturing system is made under different manufacturing strategies. The objective of this paper is to test the impact of pallets and routing flexibility on system performance operating at different sequencing rules, dispatching rules and at unbalanced load condition. A computer simulation model is developed to evaluate the effects of aforementioned manufacturing strategies on the make-span performance of flexible manufacturing system. The impact of number of pallets is shown with the different levels of routing flexibility. In this paper, the same manufacturing system is modeled under different combination of sequencing and dispatching rules. A series of simulation experiments are conducted and results analyzed. The result of the simulation shows that there is impact of pallets and routing flexibility on the performance of the system.

Keywords: flexibility, flexible manufacturing system, pallets, make-span, simulation

Procedia PDF Downloads 392
612 Molecular Interactions Driving RNA Binding to hnRNPA1 Implicated in Neurodegeneration

Authors: Sakina Fatima, Joseph-Patrick W. E. Clarke, Patricia A. Thibault, Subha Kalyaanamoorthy, Michael Levin, Aravindhan Ganesan

Abstract:

Heteronuclear ribonucleoprotein (hnRNPA1 or A1) is associated with the pathology of different diseases, including neurological disorders and cancers. In particular, the aggregation and dysfunction of A1 have been identified as a critical driver for neurodegeneration (NDG) in Multiple Sclerosis (MS). Structurally, A1 includes a low-complexity domain (LCD) and two RNA-recognition motifs (RRMs), and their interdomain coordination may play a crucial role in A1 aggregation. Previous studies propose that RNA-inhibitors or nucleoside analogs that bind to RRMs can potentially prevent A1 self-association. Therefore, molecular-level understanding of the structures, dynamics, and nucleotide interactions with A1 RRMs can be useful for developing therapeutics for NDG in MS. In this work, a combination of computational modelling and biochemical experiments were employed to analyze a set of RNA-A1 RRM complexes. Initially, the atomistic models of RNA-RRM complexes were constructed by modifying known crystal structures (e.g., PDBs: 4YOE and 5MPG), and through molecular docking calculations. The complexes were optimized using molecular dynamics simulations (200-400 ns), and their binding free energies were computed. The binding affinities of the selected complexes were validated using a thermal shift assay. Further, the most important molecular interactions that contributed to the overall stability of the RNA-A1 RRM complexes were deduced. The results highlight that adenine and guanine are the most suitable nucleotides for high-affinity binding with A1. These insights will be useful in the rational design of nucleotide-analogs for targeting A1 RRMs.

Keywords: hnRNPA1, molecular docking, molecular dynamics, RNA-binding proteins

Procedia PDF Downloads 84
611 Agarose Amplification Based Sequencing (AG-seq) Characterization Cell-free RNA in Preimplantation Spent Embryo Medium

Authors: Huajuan Shi

Abstract:

Background: The biopsy of the preimplantation embryo may increase the potential risk and concern of embryo viability. Clinically discarded spent embryo medium (SEM) has entered the view of researchers, sparking an interest in noninvasive embryo screening. However, one of the major restrictions is the extremelty low quantity of cf-RNA, which is difficult to efficiently and unbiased amplify cf-RNA using traditional methods. Hence, there is urgently need to an efficient and low bias amplification method which can comprehensively and accurately obtain cf-RNA information to truly reveal the state of SEM cf-RNA. Result: In this present study, we established an agarose PCR amplification system, and has significantly improved the amplification sensitivity and efficiency by ~90 fold and 9.29 %, respectively. We applied agarose to sequencing library preparation (named AG-seq) to quantify and characterize cf-RNA in SEM. The number of detected cf-RNAs (3533 vs 598) and coverage of 3' end were significantly increased, and the noise of low abundance gene detection was reduced. The increasing percentage 5' end adenine and alternative splicing (AS) events of short fragments (< 400 bp) were discovered by AG-seq. Further, the profiles and characterizations of cf-RNA in spent cleavage medium (SCM) and spent blastocyst medium (SBM) indicated that 4‐mer end motifs of cf-RNA fragments could remarkably differentiate different embryo development stages. Significance: This study established an efficient and low-cost SEM amplification and library preparation method. Not only that, we successfully described the characterizations of SEM cf-RNA of preimplantation embryo by using AG-seq, including abundance features fragment lengths. AG-seq facilitates the study of cf-RNA as a noninvasive embryo screening biomarker and opens up potential clinical utilities of trace samples.

Keywords: cell-free RNA, agarose, spent embryo medium, RNA sequencing, non-invasive detection

Procedia PDF Downloads 55
610 Impact of Totiviridae L-A dsRNA Virus on Saccharomyces Cerevisiae Host: Transcriptomic and Proteomic Approach

Authors: Juliana Lukša, Bazilė Ravoitytė, Elena Servienė, Saulius Serva

Abstract:

Totiviridae L-A virus is a persistent Saccharomyces cerevisiae dsRNA virus. It encodes the major structural capsid protein Gag and Gag-Pol fusion protein, responsible for virus replication and encapsulation. These features also enable the copying of satellite dsRNAs (called M dsRNAs) encoding a secreted toxin and immunity to it (known as killer toxin). Viral capsid pore presumably functions in nucleotide uptake and viral mRNA release. During cell division, sporogenesis, and cell fusion, the virions remain intracellular and are transferred to daughter cells. By employing high throughput RNA sequencing data analysis, we describe the influence of solely L-A virus on the expression of genes in three different S. cerevisiae hosts. We provide a new perception into Totiviridae L-A virus-related transcriptional regulation, encompassing multiple bioinformatics analyses. Transcriptional responses to L-A infection were similar to those induced upon stress or availability of nutrients. It also delves into the connection between the cell metabolism and L-A virus-conferred demands to the host transcriptome by uncovering host proteins that may be associated with intact virions. To better understand the virus-host interaction, we applied differential proteomic analysis of virus particle-enriched fractions of yeast strains that harboreither complete killer system (L-A-lus and M-2 virus), M-2 depleted orvirus-free. Our analysis resulted in the identification of host proteins, associated with structural proteins of the virus (Gag and Gag-Pol). This research was funded by the European Social Fund under the No.09.3.3-LMT-K-712-19-0157“Development of Competences of Scientists, other Researchers, and Students through Practical Research Activities” measure.

Keywords: totiviridae, killer virus, proteomics, transcriptomics

Procedia PDF Downloads 107
609 Phenotype Prediction of DNA Sequence Data: A Machine and Statistical Learning Approach

Authors: Mpho Mokoatle, Darlington Mapiye, James Mashiyane, Stephanie Muller, Gciniwe Dlamini

Abstract:

Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on $k$-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0%, 80.5%, 80.5%, 63.6%, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanisms.

Keywords: AWD-LSTM, bootstrapping, k-mers, next generation sequencing

Procedia PDF Downloads 133
608 Phenotype Prediction of DNA Sequence Data: A Machine and Statistical Learning Approach

Authors: Darlington Mapiye, Mpho Mokoatle, James Mashiyane, Stephanie Muller, Gciniwe Dlamini

Abstract:

Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on k-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0 %, 80.5 %, 80.5 %, 63.6 %, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanisms

Keywords: AWD-LSTM, bootstrapping, k-mers, next generation sequencing

Procedia PDF Downloads 122