Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 1789

Search results for: 16S rRNA gene sequencing

1639 The Use of Medical Biotechnology to Treat Genetic Disease

Authors: Rachel Matar, Maxime Merheb

Abstract:

Chemical drugs have been used for many centuries as the only way to cure diseases until the novel gene therapy has been created in 1960. Gene therapy is based on the insertion, correction, or inactivation of genes to treat people with genetic illness (1). Gene therapy has made wonders in Parkison’s, Alzheimer and multiple sclerosis. In addition to great promises in the healing of deadly diseases like many types of cancer and autoimmune diseases (2). This method implies the use of recombinant DNA technology with the help of different viral and non-viral vectors (3). It is nowadays used in somatic cells as well as embryos and gametes. Beside all the benefits of gene therapy, this technique is deemed by some opponents as an ethically unacceptable treatment as it implies playing with the genes of living organisms.

Keywords: gene therapy, genetic disease, cancer, multiple sclerosis

Procedia PDF Downloads 505

1638 PRKAG3 and RYR1 Gene in Latvian White Pigs

Authors: Daina Jonkus, Liga Paura, Tatjana Sjakste, Kristina Dokane

Abstract:

The aim of this study was to analyse PRKAG3 and RYR1 gene and genotypes frequencies in Latvian White pigs’ breed. Genotypes of RYR1 gene two loci (rs196953058 and rs323041392) in 89 exon and PRKAG3 gene two loci (rs196958025 and rs344045190) in gene promoter were detected in 103 individuals of Latvian white pigs’ breed. Analysis of RYR1 gene loci rs196953058 shows all individuals are homozygous by T allele and all animals are with genotypes TT, its mean - in 2769 position is Phenylalanine. Analysis of RYR1 gene loci rs323041392 shows all individuals are homozygous by G allele and all animals are with genotypes GG, its mean - in 4119 positions is Asparagine. In loci rs196953058 and rs323041392, there were no gene polymorphisms. All analysed individuals by two loci rs196953058-rs323041392 have TT-GG genotypes or Phe-Asp amino acids. In PRKAG3 gene loci rs196958025 and rs344045190 there was gene polymorphisms. In both loci frequencies for A allele was higher: 84.6% for rs196958025 and 73.0% for rs344045190. Analysis of PRKAG3 gene loci rs196958025 shows 74% of individuals are homozygous by An allele and animals are with genotypes AA. Only 4% of individuals are homozygous by G allele and animals are with genotypes GG, which is associated with pale meat colour and higher drip loss. Analysis of PRKAG3 gene loci rs344045190 shows 46% of individuals are homozygous with genotypes AA and 54% of individuals are heterozygous with genotypes AG. There are no individuals with GG genotypes. According to the results, in Latvian white pigs population there are no rs344435545 (RYR1 gene) CT heterozygous or TT recessive homozygous genotypes, which is related to the meat quality and pigs’ stress syndrome; and there are 4% rs196958025 (PRKAG3 gene) GG recessive homozygote genotypes, which is related to the meat quality. Acknowledgment: the investigation is supported by VPP 2014-2017 AgroBioRes Project No. 3 LIVESTOCK.

Keywords: genotype frequencies, pig, PRKAG3, RYR1

Procedia PDF Downloads 189

1637 Genetic Polymorphism in the Vitamin D Receptor Gene and 25-Hydroxyvitamin D Serum Levels in East Indian Women with Polycystic Ovary Syndrome

Authors: Dipanshu Sur, Ratnabali Chakravorty

Abstract:

Background: Polycystic ovary syndrome (PCOS) is the most common metabolic abnormality such as changes in lipid profile, diabetes, hypertension and metabolic syndrome occurring in young women of reproductive age. Low vitamin D levels were found to be associated with the development of obesity and insulin resistance in women with PCOS. Variants on vitamin D receptor (VDR) gene have also been related to metabolic comorbidities in general population. Aim: The aim of this case-control study was to investigate whether the VDR gene polymorphisms are associated with susceptibility to PCOS. Methods: Women with PCOS and a control group, all aged 16-40 years, were enrolled. Genotyping of VDR Fok-I (rs2228570), VDR Apa-I (rs7975232) as well as GC (rs2282679), DHCR7 (rs12785878) SNPs between groups were determined by using direct sequencing. Serum 25-hydroxyvitamin D [25(OH)] levels were measured by ELISA. Results: Mean serum 25(OH)D in the PCOS and control samples were 19.08±7 and 23.27±6.03 (p=0.048) which were significantly lower in PCOS patients compared with controls. CC genotype of the VDR Apa-I SNP was same frequent in PCOS (25.6%) and controls (25.6%) (OR: 0.9995; 95%CI: 0.528 to 1.8921; p= 0.9987). The CC genotype was also significantly associated with both lower E2 (p=0.031) and Androstenedione levels (p=0.062). We observed a significant association of GC polymorphism with 25(OH)D levels. PCOS women carrying the GG genotype (in GC genes) had significantly higher risk for vitamin D deficiency than women carrying the TT genotype. Conclusions: In conclusion, data from this study indicate that vitamin D levels are lower, and vitamin D deficiency more frequent, in PCOS than in controls. The present findings suggest that the Apa-I, Fok-I polymorphism of the VDR gene is associated with PCOS and seems to modulate ovarian steroid secretion. Further studies are needed to better clarify the biological mechanisms by which the polymorphism influences PCOS risk.

Keywords: vitamin D receptor, polymorphism, vitamin D, polycystic ovary syndrome

Procedia PDF Downloads 279

1636 Bioinformatic Study of Follicle Stimulating Hormone Receptor (FSHR) Gene in Different Buffalo Breeds

Authors: Hamid Mustafa, Adeela Ajmal, Kim EuiSoo, Noor-ul-Ain

Abstract:

World wild, buffalo production is considered as most important component of food industry. Efficient buffalo production is related with reproductive performance of this species. Lack of knowledge of reproductive efficiency and its related genes in buffalo species is a major constraint for sustainable buffalo production. In this study, we performed some bioinformatics analysis on Follicle Stimulating Hormone Receptor (FSHR) gene and explored the possible relationship of this gene among different buffalo breeds and with other farm animals. We also found the evolution pattern for this gene among these species. We investigate CDS lengths, Stop codon variation, homology search, signal peptide, isoelectic point, tertiary structure, motifs and phylogenetic tree. The results of this study indicate 4 different motif in this gene, which are Activin-recp, GS motif, STYKc Protein kinase and transmembrane. The results also indicate that this gene has very close relationship with cattle, bison, sheep and goat. Multiple alignment (MA) showed high conservation of motif which indicates constancy of this gene during evolution. The results of this study can be used and applied for better understanding of this gene for better characterization of Follicle Stimulating Hormone Receptor (FSHR) gene structure in different farm animals, which would be helpful for efficient breeding plans for animal’s production.

Keywords: buffalo, FSHR gene, bioinformatics, production

Procedia PDF Downloads 505

1635 Identification of Babesia ovis Through Polymerase Chain Reaction in Sheep and Goat in District Muzaffargarh, Pakistan

Authors: Muhammad SAFDAR, Mehmet Ozaslan, Musarrat Abbas Khan

Abstract:

Babesiosis is a haemoparasitic disease due to the multiplication of protozoan’s parasite, Babesia ovis in the red blood cells of the host, and contributes numerous economical losses, including sheep and goat ruminants. The early identification and successful treatment of Babesia Ovis spp. belong to the key steps of control and health management of livestock resources. The objective of this study was to construct a polymerase chain reaction (PCR) based method for the detection of Babesia spp. in small ruminants and to determine the risk factors involved in the spreading of babesiosis infections. A total of 100 blood samples were collected from 50 sheep and 50 goats along with different areas of Muzaffargarh, Pakistan, from randomly selected herds. Data on the characteristics of sheep and goats were collected through questionnaires. Of 100 blood samples examined, 18 were positive for Babesia ovis upon microscopic studies, whereas 11 were positive for the presence of Babesia spp. by PCR assay. For the recognition of parasitic DNA, a set of 500bp oligonucleotide was designed by PCR amplification with sequence 18S rRNA gene for B. ovis. The prevalence of babesiosis in small ruminant’s sheep and goat detected by PCR was significantly higher in female animals (28%) than male herds (08%). PCR analysis of the reference samples showed that the detection limit of the PCR assay was 0.01%. Taken together, all data indicated that this PCR assay was a simple, fast, specific detection method for Babesia ovis species in small ruminants compared to other available methods.

Keywords: Babesia ovis, PCR amplification, 18S rRNA, sheep and goat

Procedia PDF Downloads 104

1634 Agarose Amplification Based Sequencing (AG-seq) Characterization Cell-free RNA in Preimplantation Spent Embryo Medium

Authors: Huajuan Shi

Abstract:

Background: The biopsy of the preimplantation embryo may increase the potential risk and concern of embryo viability. Clinically discarded spent embryo medium (SEM) has entered the view of researchers, sparking an interest in noninvasive embryo screening. However, one of the major restrictions is the extremelty low quantity of cf-RNA, which is difficult to efficiently and unbiased amplify cf-RNA using traditional methods. Hence, there is urgently need to an efficient and low bias amplification method which can comprehensively and accurately obtain cf-RNA information to truly reveal the state of SEM cf-RNA. Result: In this present study, we established an agarose PCR amplification system, and has significantly improved the amplification sensitivity and efficiency by ~90 fold and 9.29 %, respectively. We applied agarose to sequencing library preparation (named AG-seq) to quantify and characterize cf-RNA in SEM. The number of detected cf-RNAs (3533 vs 598) and coverage of 3' end were significantly increased, and the noise of low abundance gene detection was reduced. The increasing percentage 5' end adenine and alternative splicing (AS) events of short fragments (< 400 bp) were discovered by AG-seq. Further, the profiles and characterizations of cf-RNA in spent cleavage medium (SCM) and spent blastocyst medium (SBM) indicated that 4‐mer end motifs of cf-RNA fragments could remarkably differentiate different embryo development stages. Significance: This study established an efficient and low-cost SEM amplification and library preparation method. Not only that, we successfully described the characterizations of SEM cf-RNA of preimplantation embryo by using AG-seq, including abundance features fragment lengths. AG-seq facilitates the study of cf-RNA as a noninvasive embryo screening biomarker and opens up potential clinical utilities of trace samples.

Keywords: cell-free RNA, agarose, spent embryo medium, RNA sequencing, non-invasive detection

Procedia PDF Downloads 57

1633 Molecular Diagnosis of Influenza Strains Was Carried Out on Patients of the Social Security Clinic in Karaj Using the RT-PCR Technique

Authors: A. Ferasat, S. Rostampour Yasouri

Abstract:

Seasonal flu is a highly contagious infection caused by influenza viruses. These viruses undergo genetic changes that result in new epidemics across the globe. Medical attention is crucial in severe cases, particularly for the elderly, frail, and those with chronic illnesses, as their immune systems are often weaker. The purpose of this study was to detect new subtypes of the influenza A virus rapidly using a specific RT-PCR method based on the HA gene (hemagglutinin). In the winter and spring of 2022_2023, 120 embryonated egg samples were cultured, suspected of seasonal influenza. RNA synthesis, followed by cDNA synthesis, was performed. Finally, the PCR technique was applied using a pair of specific primers designed based on the HA gene. The PCR product was identified after purification, and the nucleotide sequence of purified PCR products was compared with the sequences in the gene bank. The results showed a high similarity between the sequence of the positive samples isolated from the patients and the sequence of the new strains isolated in recent years. This RT-PCR technique is entirely specific in this study, enabling the detection and multiplication of influenza and its subspecies from clinical samples. The RT-PCR technique based on the HA gene, along with sequencing, is a fast, specific, and sensitive diagnostic method for those infected with influenza viruses and its new subtypes. Rapid molecular diagnosis of influenza is essential for suspected people to control and prevent the spread of the disease to others. It also prevents the occurrence of secondary (sometimes fatal) pneumonia that results from influenza and pathogenic bacteria. The critical role of rapid diagnosis of new strains of influenza is to prepare a drug vaccine against the latest viruses that did not exist in the community last year and are entirely new viruses.

Keywords: influenza, molecular diagnosis, patients, RT-PCR technique

Procedia PDF Downloads 34

1632 Polymorphism of Candidate Genes for Meat Production in Lori Sheep

Authors: Shahram Nanekarania, Majid Goodarzia

Abstract:

Calpastatin and callipyge have been known as one of the candidate genes in meat quality and quantity. Calpastatin gene has been located to chromosome 5 of sheep and callipyge gene has been localized in the telomeric region on ovine chromosome 18. The objective of this study was identification of calpastatin and callipyge genes polymorphism and analysis of genotype structure in population of Lori sheep kept in Iran. Blood samples were taken from 120 Lori sheep breed and genomic DNA was extracted by salting out method. Polymorphism was identified using the PCR-RFLP technique. The PCR products were digested with MspI and FaqI restriction enzymes for calpastatin gene and callipyge gene, respectively. In this population, three patterns were observed and AA, AB, BB genotype have been identified with the 0.32, 0.63, 0.05 frequencies for calpastatin gene. The results obtained for the callipyge gene revealed that only the wild-type allele A was observed, indicating that only genotype AA was present in the population under consideration.

Keywords: polymorphism, calpastatin, callipyge, PCR-RFLP, Lori sheep

Procedia PDF Downloads 583

1631 Identification of the Target Genes to Increase the Immunotherapy Response in Bladder Cancer Patients using Computational and Experimental Approach

Authors: Sahar Nasr, Lin Li, Edwin Wang

Abstract:

Bladder cancer (BLCA) is known as the 13th cause of death among cancer patients worldwide, and ~575,000 new BLCA cases are diagnosed each year. Urothelial carcinoma (UC) is the most prevalent subtype among BLCA patients, which can be categorized into muscle-invasive bladder cancer (MIBC) and non-muscle-invasive bladder cancer (NMIBC). Currently, various therapeutic options are available for UC patients, including (1) transurethral resection followed by intravesical instillation of chemotherapeutics or Bacillus Calmette-Guérin for NMIBC patients, (2) neoadjuvant platinum-based chemotherapy (NAC) plus radical cystectomy is the standard of care for localized MIBC patients, and (3) systematic chemotherapy for metastatic UC. However, conventional treatments may lead to several challenges for treating patients. As an illustration, some patients may suffer from recurrence of the disease after the first line of treatment. Recently, immune checkpoint therapy (ICT) has been introduced as an alternative treatment strategy for the first or second line of treatment in advanced or metastatic BLCA patients. Although ICT showed lucrative results for a fraction of BLCA patients, ~80% of patients were not responsive to it. Therefore, novel treatment methods are required to augment the ICI response rate within BLCA patients. It has been shown that the infiltration of T-cells into the tumor microenvironment (TME) is positively correlated with the response to ICT within cancerous patients. Therefore, the goal of this study is to enhance the infiltration of cytotoxic T-cells into TME through the identification of target genes within the tumor that are responsible for the non-T-cell inflamed TME and their inhibition. BLCA bulk RNA-sequencing data from The Cancer Genome Atlas (TCGA) and immune score for TCGA samples were used to determine the Pearson correlation score between the expression of different genes and immune score for each sample. The genes with strong negative correlations were selected (r < -0.2). Thereafter, the correlation between the expression of each gene and survival in BLCA patients was calculated using the TCGA data and Cox regression method. The genes that are common in both selected gene lists were chosen for further analysis. Afterward, BLCA bulk and single-cell RNA-sequencing data were ranked based on the expression of each selected gene and the top and bottom 25% samples were used for pathway enrichment analysis. If the pathways related to the T-cell infiltration (e.g., antigen presentation, interferon, or chemokine pathways) were enriched within the low-expression group, the gene was included for downstream analysis. Finally, the selected genes will be used to calculate the correlation between their expression and the infiltration rate of the activated CD+8 T-cells, natural killer cells and the activated dendric cells. A list of potential target genes has been identified and ranked based on the above-mentioned analysis and criteria. SUN-1 got the highest score within the gene list and other identified genes in the literature as benchmarks. In conclusion, inhibition of SUN1 may increase the tumor-infiltrating lymphocytes and the efficacy of ICI in BLCA patients. BLCA tumor cells with and without SUN-1 CRISPR/Cas9 knockout will be injected into the syngeneic mouse model to validate the predicted SUN-1 effect on increasing tumor-infiltrating lymphocytes.

Keywords: data analysis, gene expression analysis, gene identification, immunoinformatic, functional genomics, transcriptomics

Procedia PDF Downloads 133

1630 Identification of Anaplasma Species in Cattle of Khouzestan Province from Iran by PCR

Authors: Ali Bagherpour

Abstract:

The aim of this study was to determinate the variety of Anaplasma species among cattle of Khuzestan province, Iran. From April 2013 to June 2013, a total of 200 blood samples were collected via the jugular vein from healthy cattle (100), randomly. The extracted DNA from blood cells were amplified by Anaplasma-all primers, which amplify an approximately 1468bp DNA fragment from region of 16S rRNA gene from various members of the genus Anaplasma. For raising the test sensivity, the PCR products were amplified with the primers, which were designed from the region flanked by the first primers. The amplified nested PCR product had an expected PCR product with 345 nucleotides in length. 44 out of 100 cattle blood samples were Anaplasma spp. positive by first PCR and nested PCR. All cattle positive samples were further analyzed for the presence of A. centrale, A. bovis and A. phagocytophilum by specific nested PCR. A.phagocytophilum was identified by specific nested PCR in 3% of cattle blood samples. The extracted DNA from positive Anaplasma spp. samples were amplified by Anaplasma marginale/ovis specific primers, which amplify an approximately 866bp DNA fragment from region of msp4 gene. 41 out of 100 cattle blood samples (41%) were positive for Anaplasma marginale and Anaplasma ovis, respectively.

Keywords: Iran, Khuzestan, Anaplasma species, Cattle, A. marginale, A. ovis, A. phagocytophilum, PCR

Procedia PDF Downloads 470

1629 Association of AGT (M268T) Gene Polymorphism in Diabetes and Nephropathy in Pakistan

Authors: Syed M. Shahid, Rozeena Shaikh, Syeda N. Nawab, Abid Azhar

Abstract:

Diabetes mellitus (DM) is a prevalent non-communicable disease worldwide. DM may lead to many vascular complications like hypertension, nephropathy, retinopathy, neuropathy and foot infections. Pathogenesis of diabetic nephropathy (DN) is implicated by the polymorphisms in genes encoding the specific components of renin angiotensin aldosterone system (RAAS) which include angiotensinogen (AGT), angiotensin-II receptor and angiotensin converting enzyme (ACE) genes. This study was designed to explore the possible association of AG (M268T) polymorphism in the patients of diabetes and nephropathy in Pakistan. Study subjects included 100 controls, 260 diabetic patients without renal insufficiency and 190 diabetic nephropathy patients with persistent albuminuria. Fasting blood samples were collected from all the subjects after getting institutional ethical approval and informed consent. The biochemical estimations, PCR amplification and direct sequencing for the specific region of AGT gene was carried out. A significantly high frequency of TT genotype and T allele of AGT (M268T) was observed in the patients of diabetes with nephropathy as compared to controls and diabetic patients without any known renal impairment. The TT genotype and T allele of AGT (M268T) polymorphism may be considered as a genetic risk factor for the development and progression of nephropathy in diabetes. Further cross sectional population studies would be of help to establish and confirm the observed possible association of AGT gene variations with development of nephropathy in diabetes.

Keywords: RAAS, AGT (M268T), diabetes, nephropathy

Procedia PDF Downloads 502

1628 Classification of Multiple Cancer Types with Deep Convolutional Neural Network

Authors: Nan Deng, Zhenqiu Liu

Abstract:

Thousands of patients with metastatic tumors were diagnosed with cancers of unknown primary sites each year. The inability to identify the primary cancer site may lead to inappropriate treatment and unexpected prognosis. Nowadays, a large amount of genomics and transcriptomics cancer data has been generated by next-generation sequencing (NGS) technologies, and The Cancer Genome Atlas (TCGA) database has accrued thousands of human cancer tumors and healthy controls, which provides an abundance of resource to differentiate cancer types. Meanwhile, deep convolutional neural networks (CNNs) have shown high accuracy on classification among a large number of image object categories. Here, we utilize 25 cancer primary tumors and 3 normal tissues from TCGA and convert their RNA-Seq gene expression profiling to color images; train, validate and test a CNN classifier directly from these images. The performance result shows that our CNN classifier can archive >80% test accuracy on most of the tumors and normal tissues. Since the gene expression pattern of distant metastases is similar to their primary tumors, the CNN classifier may provide a potential computational strategy on identifying the unknown primary origin of metastatic cancer in order to plan appropriate treatment for patients.

Keywords: bioinformatics, cancer, convolutional neural network, deep leaning, gene expression pattern

Procedia PDF Downloads 266

1627 Cloning, Expression and Protein Purification of AV1 Gene of Okra Leaf Curl Virus Egyptian Isolate and Genetic Diversity between Whitefly and Different Plant Hosts

Authors: Dalia. G. Aseel

Abstract:

Begomoviruses are economically important plant viruses that infect dicotyledonous plants and exclusively transmitted by the whitefly Bemisia tabaci. Here, replicative form was isolated from Okra, Cotton, Tomato plants and whitefly infected with Begomoviruses. Using coat protein specific primers (AV1), the viral infection was verified with amplicon at 450 bp. The sequence of OLCuV-AV1 gene was recorded and received an accession number (FJ441605) from Genebank. The phylogenetic tree of OLCuV was closely related to Okra leaf curl virus previously isolated from Cameroon and USA with nucleotide sequence identity of 92%. The protein purification was carried out using His-Tag methodology by using Affinity Chromatography. The purified protein was separated on SDS-PAGE analysis and an enriched expected size of band at 30 kDa was observed. Furthermore, RAPD and SDS-PAGE were used to detect genetic variability between different hosts of okra leaf curl virus (OLCuV), cotton leaf curl virus (CLCuV), tomato yellow leaf curl virus (TYLCuV) and the whitefly vector. Finally, the present study would help to understand the relationship between the whitefly and different economical crops in Egypt.

Keywords: okra leaf curl virus, AV1 gene, sequencing, phylogenetic, cloning, purified protein, genetic diversity and viral proteins

Procedia PDF Downloads 117

1626 C-eXpress: A Web-Based Analysis Platform for Comparative Functional Genomics and Proteomics in Human Cancer Cell Line, NCI-60 as an Example

Authors: Chi-Ching Lee, Po-Jung Huang, Kuo-Yang Huang, Petrus Tang

Abstract:

Background: Recent advances in high-throughput research technologies such as new-generation sequencing and multi-dimensional liquid chromatography makes it possible to dissect the complete transcriptome and proteome in a single run for the first time. However, it is almost impossible for many laboratories to handle and analysis these “BIG” data without the support from a bioinformatics team. We aimed to provide a web-based analysis platform for users with only limited knowledge on bio-computing to study the functional genomics and proteomics. Method: We use NCI-60 as an example dataset to demonstrate the power of the web-based analysis platform and data delivering system: C-eXpress takes a simple text file that contain the standard NCBI gene or protein ID and expression levels (rpkm or fold) as input file to generate a distribution map of gene/protein expression levels in a heatmap diagram organized by color gradients. The diagram is hyper-linked to a dynamic html table that allows the users to filter the datasets based on various gene features. A dynamic summary chart is generated automatically after each filtering process. Results: We implemented an integrated database that contain pre-defined annotations such as gene/protein properties (ID, name, length, MW, pI); pathways based on KEGG and GO biological process; subcellular localization based on GO cellular component; functional classification based on GO molecular function, kinase, peptidase and transporter. Multiple ways of sorting of column and rows is also provided for comparative analysis and visualization of multiple samples.

Keywords: cancer, visualization, database, functional annotation

Procedia PDF Downloads 588

1625 Unzipping the Stress Response Genes in Moringa oleifera Lam. through Transcriptomics

Authors: Vivian A. Panes, Raymond John S. Rebong, Miel Q. Diaz

Abstract:

Moringa oleifera Lam. is known mainly for its high nutritional value and medicinal properties contributing to its popular reputation as a 'miracle plant' in the tropical climates where it usually grows. The main objective of this study is to discover the genes and gene products involved in abiotic stress-induced activity that may impact the M. oleifera Lam. mature seeds as well as their corresponding functions. In this study, RNA-sequencing and de novo transcriptome assembly were performed using two assemblers, Trinity and Oases, which produced 177,417 and 120,818 contigs respectively. These transcripts were then subjected to various bioinformatics tools such as Blast2GO, UniProt, KEGG, and COG for gene annotation and the analysis of relevant metabolic pathways. Furthermore, FPKM analysis was performed to identify gene expression levels. The sequences were filtered according to the 'response to stress' GO term since this study dealt with stress response. Clustered Orthologous Groups (COG) showed that the highest frequencies of stress response gene functions were those of cytoskeleton which make up approximately 14% and 23% of stress-related sequences under Trinity and Oases respectively, recombination, repair and replication at 11% and 14% respectively, carbohydrate transport and metabolism at 23% and 9% respectively and defense mechanisms 16% and 12% respectively. KEGG pathway analysis determined the most abundant stress-response genes in the phenylpropanoid biosynthesis at counts of 187 and 166 pathways for Oases and Trinity respectively, purine metabolism at 123 and 230 pathways, and biosynthesis of antibiotics at 105 and 102. Unique and cumulative GO term counts revealed that majority of the stress response genes belonged to the category of cellular response to stress at cumulative counts of 1,487 to 2,187 for Oases and Trinity respectively, defense response at 754 and 1,255, and response to heat at 213 and 208, response to water deprivation at 229 and 228, and oxidative stress at 508 and 488. Lastly, FPKM was used to determine the levels of expression of each stress response gene. The most upregulated gene encodes for thiamine thiazole synthase chloroplastic-like enzyme which plays a significant role in DNA damage tolerance. Data analysis implies that M. oleifera stress response genes are directed towards the effects of climate change more than other stresses indicating the potential of M. oleifera for cultivation in harsh environments because it is resistant to climate change, pathogens, and foreign invaders.

Keywords: stress response, genes, Moringa oleifera, transcriptomics

Procedia PDF Downloads 117

1624 Massively Parallel Sequencing Improved Resolution for Paternity Testing

Authors: Xueying Zhao, Ke Ma, Hui Li, Yu Cao, Fan Yang, Qingwen Xu, Wenbin Liu

Abstract:

Massively parallel sequencing (MPS) technologies allow high-throughput sequencing analyses with a relatively affordable price and have gradually been applied to forensic casework. MPS technology identifies short tandem repeat (STR) loci based on sequence so that repeat motif variation within STRs can be detected, which may help one to infer the origin of the mutation in some cases. Here, we report on one case with one three-step mismatch (D18S51) in family trios based on both capillary electrophoresis (CE) and MPS typing. The alleles of the alleged father (AF) are [AGAA]₁₇AGAG[AGAA]₃ and [AGAA]₁₅. The mother’s alleles are [AGAA]₁₉ and [AGAA]₉AGGA[AGAA]₃. The questioned child’s (QC) alleles are [AGAA]₁₉ and [AGAA]₁₂. Given that the sequence variants in repeat regions of AF and mother are not observed in QC’s alleles, the QC’s allele [AGAA]₁₂ was likely inherited from the AF’s allele [AGAA]₁₅ by loss of three repeat [AGAA]. Besides, two new alleles of D18S51 in this study, [AGAA]₁₇AGAG[AGAA]₃ and [AGAA]₉AGGA[AGAA]₃, have not been reported before. All the results in this study were verified using Sanger-type sequencing. In summary, the MPS typing method can offer valuable information for forensic genetics research and play a promising role in paternity testing.

Keywords: family trios analysis, forensic casework, ion torrent personal genome machine (PGM), massively parallel sequencing (MPS)

Procedia PDF Downloads 276

1623 Gene Names Identity Recognition Using Siamese Network for Biomedical Publications

Authors: Micheal Olaolu Arowolo, Muhammad Azam, Fei He, Mihail Popescu, Dong Xu

Abstract:

As the quantity of biological articles rises, so does the number of biological route figures. Each route figure shows gene names and relationships. Annotating pathway diagrams manually is time-consuming. Advanced image understanding models could speed up curation, but they must be more precise. There is rich information in biological pathway figures. The first step to performing image understanding of these figures is to recognize gene names automatically. Classical optical character recognition methods have been employed for gene name recognition, but they are not optimized for literature mining data. This study devised a method to recognize an image bounding box of gene name as a photo using deep Siamese neural network models to outperform the existing methods using ResNet, DenseNet and Inception architectures, the results obtained about 84% accuracy.

Keywords: biological pathway, gene identification, object detection, Siamese network

Procedia PDF Downloads 244

1622 Screening and Evaluation of Plant Growth Promoting Rhizobacteria of Wheat/Faba Bean for Increasing Productivity and Yield

Authors: Yasir Arafat, Asma Shah, Hua Shao

Abstract:

Background and Aims: Legume/cereal intercropping is used worldwide for enhancement in biomass and yield of cereal crops. However, because of intercropping, the belowground biological and chemical interactions and their effect on physiological parameters and yield of crops are limited. Methods: Wheat faba bean (WF) intercropping was designed to understand the underlying changes in the soil's chemical environment, soil microbial communities, and effect on growth and yield parameters. Experimental plots were established as having no root partition (NRP), semi-root partition (SRP), complete root partition (CRP), and their sole cropping (CK). Low molecular weight organic acids (LMWOAs) were determined by GC-MS, and high throughput sequencing of the 16S rRNA gene was carried out to screen microbial structure and composition in different root partitions of the WF intercropping system. Results: We show that intercropping induced a shift in the relative abundance of some genera of plant growth promoting rhizobacteria (PGPR) such as Allorhizobium, Neorhizobium, Pararhizobium, and Rhizobium species and resulted in better growth and yield performance of wheat. Moreover, as the plant's distance of wheat from faba beans decreased, the diversity of microbes increased, and a positive effect was observed on physiological traits and crop yield. Furthermore, an abundance and positive correlations of palmitic acid, arachidic acid, stearic acid, and 9-Octadecenoic with PGPR were recorded in the root zone of WF intercropping, which can play an important role in this facilitative mechanism of enhancing growth and yield of cereals. Conclusion: The two treatments clearly affected soil microbial and chemical composition, which can be reflected in growth and yield enhancement.

Keywords: intercropping, microbial community, LMWOAs, PGPR, soil chemical environment

Procedia PDF Downloads 42

1621 Single Cell and Spatial Transcriptomics: A Beginners Viewpoint from the Conceptual Pipeline

Authors: Leo Nnamdi Ozurumba-Dwight

Abstract:

Messenger ribooxynucleic acid (mRNA) molecules are compositional, protein-based. These proteins, encoding mRNA molecules (which collectively connote the transcriptome), when analyzed by RNA sequencing (RNAseq), unveils the nature of gene expression in the RNA. The obtained gene expression provides clues of cellular traits and their dynamics in presentations. These can be studied in relation to function and responses. RNAseq is a practical concept in Genomics as it enables detection and quantitative analysis of mRNA molecules. Single cell and spatial transcriptomics both present varying avenues for expositions in genomic characteristics of single cells and pooled cells in disease conditions such as cancer, auto-immune diseases, hematopoietic based diseases, among others, from investigated biological tissue samples. Single cell transcriptomics helps conduct a direct assessment of each building unit of tissues (the cell) during diagnosis and molecular gene expressional studies. A typical technique to achieve this is through the use of a single-cell RNA sequencer (scRNAseq), which helps in conducting high throughput genomic expressional studies. However, this technique generates expressional gene data for several cells which lack presentations on the cells’ positional coordinates within the tissue. As science is developmental, the use of complimentary pre-established tissue reference maps using molecular and bioinformatics techniques has innovatively sprung-forth and is now used to resolve this set back to produce both levels of data in one shot of scRNAseq analysis. This is an emerging conceptual approach in methodology for integrative and progressively dependable transcriptomics analysis. This can support in-situ fashioned analysis for better understanding of tissue functional organization, unveil new biomarkers for early-stage detection of diseases, biomarkers for therapeutic targets in drug development, and exposit nature of cell-to-cell interactions. Also, these are vital genomic signatures and characterizations of clinical applications. Over the past decades, RNAseq has generated a wide array of information that is igniting bespoke breakthroughs and innovations in Biomedicine. On the other side, spatial transcriptomics is tissue level based and utilized to study biological specimens having heterogeneous features. It exposits the gross identity of investigated mammalian tissues, which can then be used to study cell differentiation, track cell line trajectory patterns and behavior, and regulatory homeostasis in disease states. Also, it requires referenced positional analysis to make up of genomic signatures that will be sassed from the single cells in the tissue sample. Given these two presented approaches to RNA transcriptomics study in varying quantities of cell lines, with avenues for appropriate resolutions, both approaches have made the study of gene expression from mRNA molecules interesting, progressive, developmental, and helping to tackle health challenges head-on.

Keywords: transcriptomics, RNA sequencing, single cell, spatial, gene expression.

Procedia PDF Downloads 97

1620 Using Gene Expression Programming in Learning Process of Rough Neural Networks

Authors: Sanaa Rashed Abdallah, Yasser F. Hassan

Abstract:

The paper will introduce an approach where a rough sets, gene expression programming and rough neural networks are used cooperatively for learning and classification support. The Objective of gene expression programming rough neural networks (GEP-RNN) approach is to obtain new classified data with minimum error in training and testing process. Starting point of gene expression programming rough neural networks (GEP-RNN) approach is an information system and the output from this approach is a structure of rough neural networks which is including the weights and thresholds with minimum classification error.

Keywords: rough sets, gene expression programming, rough neural networks, classification

Procedia PDF Downloads 348

1619 Human Papillomavirus Type 16 E4 Gene Variation as Risk Factor for Cervical Cancer

Authors: Yudi Zhao, Ziyun Zhou, Yueting Yao, Shuying Dai, Zhiling Yan, Longyu Yang, Chuanyin Li, Li Shi, Yufeng Yao

Abstract:

HPV16 E4 gene plays an important role in viral genome amplification and release. Therefore, a variation of the E4 gene nucleic acid sequence may affect the carcinogenicity of HPV16. In order to understand the relationship between the variation of HPV16 E4 gene and cervical cancer, this study was to amplify and sequence the DNA sequences of E4 genes in 118 HPV16-positive cervical cancer patients and 151 HPV16-positive asymptomatic individuals. After obtaining E4 gene sequences, the phylogenetic trees were constructed by the Neighbor-joining method for gene variation analysis. The results showed that: 1) The distribution of HPV16 variants between the case group and the control group differed greatly (P = 0.015)，and the Asian-American（AA）variant was likely to relate to the occurrence of cervical cancer. 2) DNA sequence analysis showed that there were significant differences in the distribution of 8 variants between the case group and the control group (P < 0.05). And 3) In European (EUR) variant, two variations, C3384T (L18L) and A3449G (P39P), were associated with the initiation and development of cervical cancer. The results suggested that the variation of HPV16 E4 gene may be a contributor affecting the occurrence as well as the development of cervical cancer, and different HPV16 variants may have different carcinogenic capability.

Keywords: cervical cancer, HPV16, E4 gene, variations

Procedia PDF Downloads 142

1618 Isolation and Characterization of Endophytic Bacteria Associated with Root-Nodules of Medicago sativa in Al-Ahasa Region

Authors: Ashraf Y. Z. Khalifa, Mohammed A. Almalki

Abstract:

Medicago sativa (Alfalfa) is an important forage crop legume worldwide including Saudia Arabia due to its high nutritive value. Soil bacteria exist in root or root-nodules of Medicago sativa in either symbiotic relationships or in associations. The aim of the present study was to isolate and characterize endophytic bacteria that live in association with non-nodulated roots of Medicago sativa growing in Al-Ahsaa region, Saudia Arabia. Several bacterial strains were isolated from sterilized roots of Medicago sativa. Strains were characterized using 16S rRNA gene sequences, phylogenetic relationships analysis, morphological and biochemical characteristics. The strains utilized 50% (10 out of 20) of the different chemical substrates contained in the API20E strip. In general, many strains had the ability to ferment/oxidise all the carbohydrate tested except for rhamnose and the polyol carbohydrate, inositol. Comparative sequence analysis of the 16S rDNA gene indicated that the strains were closely related to the genus Bacillus. Furthermore, the growth parameters of Vigna sinensis were enhanced upon single-inoculation of the isolated strains, compared to the uninoculated control plants. The results highlighted that the root-nodules of Medicago sativa harbor non-nodulating bacterial strains that could have significant agricultural applications.

Keywords: Medicago sativa, endophytic bacteria, Pisum sativum, Vigna sinensis

Procedia PDF Downloads 348

1617 Alternative Splicing of an Arabidopsis Gene, At2g24600, Encoding Ankyrin-Repeat Protein

Authors: H. Sakamoto, S. Kurosawa, M. Suzuki, S. Oguri

Abstract:

In Arabidopsis, several genes encoding proteins with ankyrin repeats and trans-membrane domains (AtANKTM) have been identified as mediators of biotic and abiotic stress responses. It has been known that the expression of an AtANKTM gene, At2g24600, is induced in response to abiotic stress and that there are four splicing variants derived from this locus. In this study, by RT-PCR and sequencing analysis, an unknown splicing variant of the At2g24600 transcript was identified. Based on differences in the predicted amino acid sequences, the five splicing variants are divided into three groups. The three predicted proteins are highly homologous, yet have different numbers of ankyrin repeats and trans-membrane domains. It is generally considered that ankyrin repeats mediate protein-protein interaction and that the number of trans-membrane domains affects membrane topology of proteins. The protein variants derived from the At2g24600 locus may have different molecular functions each other.

Keywords: alternative splicing, ankyrin repeats, trans-membrane domains, arabidopsis

Procedia PDF Downloads 347

1616 Removal of Nitrogen Compounds from Industrial Wastewater Using Sequencing Batch Reactor: The Effects of React Time

Authors: Ali W. Alattabi, Khalid S. Hashim, Hassnen M. Jafer, Ali Alzeyadi

Abstract:

This study was performed to optimise the react time (RT) and study its effects on the removal rates of nitrogen compounds in a sequencing batch reactor (SBR) treating synthetic industrial wastewater. The results showed that increasing the RT from 4 h to 10, 16 and 22 h significantly improved the nitrogen compounds’ removal efficiency, it was increased from 69.5% to 95%, 75.7 to 97% and from 54.2 to 80.1% for NH₃-N, NO₃-N and NO₂-N respectively. The results obtained from this study showed that the RT of 22 h was the optimum for nitrogen compounds removal efficiency.

Keywords: ammonia-nitrogen, retention time, nitrate, nitrite, sequencing batch reactor, sludge characteristics

Procedia PDF Downloads 335

1615 Automatic Reporting System for Transcriptome Indel Identification and Annotation Based on Snapshot of Next-Generation Sequencing Reads Alignment

Authors: Shuo Mu, Guangzhi Jiang, Jinsa Chen

Abstract:

The analysis of Indel for RNA sequencing of clinical samples is easily affected by sequencing experiment errors and software selection. In order to improve the efficiency and accuracy of analysis, we developed an automatic reporting system for Indel recognition and annotation based on image snapshot of transcriptome reads alignment. This system includes sequence local-assembly and realignment, target point snapshot, and image-based recognition processes. We integrated high-confidence Indel dataset from several known databases as a training set to improve the accuracy of image processing and added a bioinformatical processing module to annotate and filter Indel artifacts. Subsequently, the system will automatically generate data, including data quality levels and images results report. Sanger sequencing verification of the reference Indel mutation of cell line NA12878 showed that the process can achieve 83% sensitivity and 96% specificity. Analysis of the collected clinical samples showed that the interpretation accuracy of the process was equivalent to that of manual inspection, and the processing efficiency showed a significant improvement. This work shows the feasibility of accurate Indel analysis of clinical next-generation sequencing (NGS) transcriptome. This result may be useful for RNA study for clinical samples with microsatellite instability in immunotherapy in the future.

Keywords: automatic reporting, indel, next-generation sequencing, NGS, transcriptome

Procedia PDF Downloads 153

1614 Language Shapes Thought: An Experimental Study on English and Mandarin Native Speakers' Sequencing of Size

Authors: Hsi Wei

Abstract:

Does the language we speak affect the way we think? This question has been discussed for a long time from different aspects. In this article, the issue is examined with an experiment on how speakers of different languages tend to do different sequencing when it comes to the size of general objects. An essential difference between the usage of English and Mandarin is the way we sequence the size of places or objects. In English, when describing the location of something we may say, for example, ‘The pen is inside the trashcan next to the tree at the park.’ In Mandarin, however, we would say, ‘The pen is at the park next to the tree inside the trashcan.’ It’s clear that generally English use the sequence of small to big while Mandarin the opposite. Therefore, the experiment was conducted to test if the difference of the languages affects the speakers’ ability to do the different sequencing. There were two groups of subjects; one consisted of English native speakers, another of Mandarin native speakers. Within the experiment, three nouns were showed as a group to the subjects as their native languages. Before they saw the nouns, they would first get an instruction of ‘big to small’, ‘small to big’, or ‘repeat’. Therefore, the subjects had to sequence the following group of nouns as the instruction they get or simply repeat the nouns. After completing every sequencing and repetition in their minds, they pushed a button as reaction. The repetition design was to gather the mere reading time of the person. As the result of the experiment showed, English native speakers reacted more quickly to the sequencing of ‘small to big’; on the other hand, Mandarin native speakers reacted more quickly to the sequence ‘big to small’. To conclude, this study may be of importance as a support for linguistic relativism that the language we speak do shape the way we think.

Keywords: language, linguistic relativism, size, sequencing

Procedia PDF Downloads 251

1613 Xylanase Impact beyond Performance: A Prebiotic Approach in Laying Hens

Authors: Veerle Van Hoeck, Ingrid Somers, Dany Morisset

Abstract:

Anti-nutritional factors such as non-starch polysaccharides (NSP) are present in viscous cereals used to feed poultry. Therefore, exogenous carbohydrases are commonly added to monogastric feed to degrade these NSP. Our hypothesis is that xylanase not only improves laying hen performance and digestibility but also induces a significant shift in microbial composition within the intestinal tract and, thereby, can cause a prebiotic effect. In this context, a better understanding of whether and how the chicken gut flora can be modulated by xylanase is needed. To do so, in the herein laying hen study, the effects of dietary supplementation of xylanase on performance, digestibility, and cecal microbiome were evaluated. A total of 96 HiSex laying hens was used in this experiment (3 diets and 16 replicates of 2 hens). Xylanase was added to the diets at concentrations of 0, 45,000 (15 g/t XygestTM HT) and 90,000 U/kg (30 g/t Xygest HT). The diets were based on wheat (~55 %), soybean, and sunflower meal. The lowest dosage, 45,000 U/kg, significantly increased average egg weight and improved feed efficiency compared to the control treatment (p < 0.05). Egg quality parameters were significantly improved in the experiment in response to the xylanase addition. For example, during the last 28 days of the trial, the 45,000 U/kg and the 90,000 U/kg treatments exhibited an increase in Haugh units and albumin heights (p < 0.05). Compared with the control, organic matter digestibility and N retention were drastically improved in the 45,000 U/kg treatment group, which implies better nutrient digestibility at this lowest recommended dosage compared to the control (p < 0.05). Furthermore, gross energy and crude fat digestibility were improved significantly for birds fed 90,000 U/kg group compared to the control. Importantly, 16S rRNA gene analysis revealed that xylanase at 45,000 U/kg dosages can exert a prebiotic effect. This conclusion was drawn based on studying the sequence variation in the 16S rRNA gene in order to characterize diverse microbial communities of the cecal content. A significant increase in beneficial bacteria (Lactobacilli spp and Enterococcus casseliflavus) was documented when adding 45,000 U/kg xylanase to the diet of laying hens. In conclusion, dietary supplementation of xylanase, even at the lowest dose of (45,000 U/kg), significantly improved laying hen performance and digestibility. Furthermore, it is generally accepted that a proper bacterial balance between the number of beneficial bacteria and pathogenic bacteria in the intestine is vital for the host. It seems that the xylanase enzyme is able to modulate the laying hen microbiome beneficially and thus exerts a prebiotic effect. This microbiome plasticity in response to the xylanase provides an attractive target for stimulating intestinal health.

Keywords: laying hen, prebiotic, XygestTM HT, xylanase

Procedia PDF Downloads 103

1612 Ectoine: A Compatible Solute in Radio-Halophilic Stenotrophomonas sp. WMA-LM19 Strain to Prevent Ultraviolet-Induced Protein Damage

Authors: Wasim Sajjad, Manzoor Ahmad, Sundas Qadir, Muhammad Rafiq, Fariha Hasan, Richard Tehan, Kerry L. McPhail, Aamer Ali Shah

Abstract:

Aim: This study aims to investigate the possible radiation protective role of a compatible solute in the tolerance of radio-halophilic bacterium against stresses, like desiccation and exposure to ionizing radiation. Methods and Results: Nine different radio-resistant bacteria were isolated from desert soil, where strain WMA-LM19 was chosen for detailed studies on the basis of its high tolerance for ultraviolet radiation among all these isolates. 16S rRNA gene sequencing indicated that the bacterium was closely related to Stenotrophomonas sp. (KT008383). A bacterial milking strategy was applied for extraction of intracellular compatible solutes in 70% (v/v) ethanol, which were purified by high-performance liquid chromatography (HPLC). The compound was characterized as ectoine by 1H and 13C nuclear magnetic resonance (NMR), and mass spectrometry (MS). Ectoine demonstrated more efficient preventive activity (54.80%) to erythrocyte membranes and also inhibited oxidative damage to proteins and lipids in comparison to the standard ascorbic acid. Furthermore, a high level of ectoine-mediated protection of bovine serum albumin against ionizing radiation (1500-2000 Jm-2) was observed, as indicated by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) analysis. Conclusion: The results indicated that ectoine can be used as a potential mitigator and radio-protective agent to overcome radiation- and salinity-mediated oxidative damage in extreme environments. Significance and Impact of the Study: This study shows that ectoine from radio-halophiles can be used as a potential source in topical creams as sunscreen. The investigation of ectoine as UV protectant also changes the prospective that radiation resistance is specific only to molecular adaptation.

Keywords: ectoine, anti-oxidant, stenotrophomonas sp., ultraviolet radiation

Procedia PDF Downloads 185

1611 Phenotype Prediction of DNA Sequence Data: A Machine and Statistical Learning Approach

Authors: Mpho Mokoatle, Darlington Mapiye, James Mashiyane, Stephanie Muller, Gciniwe Dlamini

Abstract:

Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on $k$-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0%, 80.5%, 80.5%, 63.6%, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanisms.

Keywords: AWD-LSTM, bootstrapping, k-mers, next generation sequencing

Procedia PDF Downloads 134

1610 Phenotype Prediction of DNA Sequence Data: A Machine and Statistical Learning Approach

Authors: Darlington Mapiye, Mpho Mokoatle, James Mashiyane, Stephanie Muller, Gciniwe Dlamini

Abstract:

Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on k-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0 %, 80.5 %, 80.5 %, 63.6 %, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanisms

Keywords: AWD-LSTM, bootstrapping, k-mers, next generation sequencing

Procedia PDF Downloads 124