Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 19183

Search results for: genomic data

19183 Genomic Prediction Reliability Using Haplotypes Defined by Different Methods

Authors: Sohyoung Won, Heebal Kim, Dajeong Lim

Abstract:

Genomic prediction is an effective way to measure the abilities of livestock for breeding based on genomic estimated breeding values, statistically predicted values from genotype data using best linear unbiased prediction (BLUP). Using haplotypes, clusters of linked single nucleotide polymorphisms (SNPs), as markers instead of individual SNPs can improve the reliability of genomic prediction since the probability of a quantitative trait loci to be in strong linkage disequilibrium (LD) with markers is higher. To efficiently use haplotypes in genomic prediction, finding optimal ways to define haplotypes is needed. In this study, 770K SNP chip data was collected from Hanwoo (Korean cattle) population consisted of 2506 cattle. Haplotypes were first defined in three different ways using 770K SNP chip data: haplotypes were defined based on 1) length of haplotypes (bp), 2) the number of SNPs, and 3) k-medoids clustering by LD. To compare the methods in parallel, haplotypes defined by all methods were set to have comparable sizes; in each method, haplotypes defined to have an average number of 5, 10, 20 or 50 SNPs were tested respectively. A modified GBLUP method using haplotype alleles as predictor variables was implemented for testing the prediction reliability of each haplotype set. Also, conventional genomic BLUP (GBLUP) method, which uses individual SNPs were tested to evaluate the performance of the haplotype sets on genomic prediction. Carcass weight was used as the phenotype for testing. As a result, using haplotypes defined by all three methods showed increased reliability compared to conventional GBLUP. There were not many differences in the reliability between different haplotype defining methods. The reliability of genomic prediction was highest when the average number of SNPs per haplotype was 20 in all three methods, implying that haplotypes including around 20 SNPs can be optimal to use as markers for genomic prediction. When the number of alleles generated by each haplotype defining methods was compared, clustering by LD generated the least number of alleles. Using haplotype alleles for genomic prediction showed better performance, suggesting improved accuracy in genomic selection. The number of predictor variables was decreased when the LD-based method was used while all three haplotype defining methods showed similar performances. This suggests that defining haplotypes based on LD can reduce computational costs and allows efficient prediction. Finding optimal ways to define haplotypes and using the haplotype alleles as markers can provide improved performance and efficiency in genomic prediction.

Keywords: best linear unbiased predictor, genomic prediction, haplotype, linkage disequilibrium

Procedia PDF Downloads 72
19182 Genodata: The Human Genome Variation Using BigData

Authors: Surabhi Maiti, Prajakta Tamhankar, Prachi Uttam Mehta

Abstract:

Since the accomplishment of the Human Genome Project, there has been an unparalled escalation in the sequencing of genomic data. This project has been the first major vault in the field of medical research, especially in genomics. This project won accolades by using a concept called Bigdata which was earlier, extensively used to gain value for business. Bigdata makes use of data sets which are generally in the form of files of size terabytes, petabytes, or exabytes and these data sets were traditionally used and managed using excel sheets and RDBMS. The voluminous data made the process tedious and time consuming and hence a stronger framework called Hadoop was introduced in the field of genetic sciences to make data processing faster and efficient. This paper focuses on using SPARK which is gaining momentum with the advancement of BigData technologies. Cloud Storage is an effective medium for storage of large data sets which is generated from the genetic research and the resultant sets produced from SPARK analysis.

Keywords: human genome project, Bigdata, genomic data, SPARK, cloud storage, Hadoop

Procedia PDF Downloads 151
19181 Genomic Evidence for Ancient Human Migrations Along South America's East Coast

Authors: Andre Luiz Campelo dos Santos, Amanda Owings, Henry Socrates Lavalle Sullasi, Omer Gokcumen, Michael DeGiorgio, John Lindo

Abstract:

An increasing body of archaeological and genomic evidence have indicated a complex settlement process of the Americas. Here, four newly sequenced ancient genomes from Northeast Brazil and Uruguay are reported to share strong relationships with previously published samples from Panama and Southeast Brazil. Moreover, an unexpected high genomic affinity with present-day Onge is found in ancient individuals unearthed along the northern portion of South America’s Atlantic coast. These results provide genomic evidence for ancient migrations along South America’s Atlantic coast.

Keywords: archaeogenomics, atlantic coast, paleomigrations, South America

Procedia PDF Downloads 31
19180 Evaluation of Four Different DNA Targets in Polymerase Chain Reaction for Detection and Genotyping of Helicobacter pylori

Authors: Abu Salim Mustafa

Abstract:

Polymerase chain reaction (PCR) assays targeting genomic DNA segments have been established for the detection of Helicobacter pylori in clinical specimens. However, the data on comparative evaluations of various targets in detection of H. pylori are limited. Furthermore, the frequencies of vacA (s1 and s2) and cagA genotypes, which are suggested to be involved in the pathogenesis of H. pylori in other parts of the world, are not well studied in Kuwait. The aim of this study was to evaluate PCR assays for the detection and genotyping of H. pylori by targeting the amplification of DNA targets from four genomic segments. The genomic DNA were isolated from 72 clinical isolates of H. pylori and tested in PCR with four pairs of oligonucleotides primers, i.e. ECH-U/ECH-L, ET-5U/ET-5L, CagAF/CagAR and Vac1F/Vac1XR, which were expected to amplify targets of various sizes (471 bp, 230 bp, 183 bp and 176/203 bp, respectively) from the genomic DNA of H. pylori. The PCR-amplified DNA were analyzed by agarose gel electrophoresis. PCR products of expected size were obtained with all primer pairs by using genomic DNA isolated from H. pylori. DNA dilution experiments showed that the most sensitive PCR target was 471 bp DNA amplified by the primers ECH-U/ECH-L, followed by the targets of Vac1F/Vac1XR (176 bp/203 DNA), CagAF/CagAR (183 bp DNA) and ET-5U/ET-5L (230 bp DNA). However, when tested with undiluted genomic DNA isolated from single colonies of all isolates, the Vac1F/Vac1XR target provided the maximum positive results (71/72 (99% positives)), followed by ECH-U/ECH-L (69/72 (93% positives)), ET-5U/ET-5L (51/72 (71% positives)) and CagAF/CagAR (26/72 (46% positives)). The results of genotyping experiments showed that vacA s1 (46% positive) and vacA s2 (54% positive) genotypes were almost equally associated with VaCA+/CagA- isolates (P > 0.05), but with VacA+/CagA+ isolates, S1 genotype (92% positive) was more frequently detected than S2 genotype (8% positive) (P< 0.0001). In conclusion, among the primer pairs tested, Vac1F/Vac1XR provided the best results for detection of H. pylori. The genotyping experiments showed that vacA s1 and vacA s2 genotypes were almost equally associated with vaCA+/cagA- isolates, but vacA s1 genotype had a significantly increased association with vacA+/cagA+ isolates.

Keywords: H. pylori, PCR, detection, genotyping

Procedia PDF Downloads 62
19179 TARF: Web Toolkit for Annotating RNA-Related Genomic Features

Authors: Jialin Ma, Jia Meng

Abstract:

Genomic features, the genome-based coordinates, are commonly used for the representation of biological features such as genes, RNA transcripts and transcription factor binding sites. For the analysis of RNA-related genomic features, such as RNA modification sites, a common task is to correlate these features with transcript components (5'UTR, CDS, 3'UTR) to explore their distribution characteristics in terms of transcriptomic coordinates, e.g., to examine whether a specific type of biological feature is enriched near transcription start sites. Existing approaches for performing these tasks involve the manipulation of a gene database, conversion from genome-based coordinate to transcript-based coordinate, and visualization methods that are capable of showing RNA transcript components and distribution of the features. These steps are complicated and time consuming, and this is especially true for researchers who are not familiar with relevant tools. To overcome this obstacle, we develop a dedicated web app TARF, which represents web toolkit for annotating RNA-related genomic features. TARF web tool intends to provide a web-based way to easily annotate and visualize RNA-related genomic features. Once a user has uploaded the features with BED format and specified a built-in transcript database or uploaded a customized gene database with GTF format, the tool could fulfill its three main functions. First, it adds annotation on gene and RNA transcript components. For every features provided by the user, the overlapping with RNA transcript components are identified, and the information is combined in one table which is available for copy and download. Summary statistics about ambiguous belongings are also carried out. Second, the tool provides a convenient visualization method of the features on single gene/transcript level. For the selected gene, the tool shows the features with gene model on genome-based view, and also maps the features to transcript-based coordinate and show the distribution against one single spliced RNA transcript. Third, a global transcriptomic view of the genomic features is generated utilizing the Guitar R/Bioconductor package. The distribution of features on RNA transcripts are normalized with respect to RNA transcript landmarks and the enrichment of the features on different RNA transcript components is demonstrated. We tested the newly developed TARF toolkit with 3 different types of genomics features related to chromatin H3K4me3, RNA N6-methyladenosine (m6A) and RNA 5-methylcytosine (m5C), which are obtained from ChIP-Seq, MeRIP-Seq and RNA BS-Seq data, respectively. TARF successfully revealed their respective distribution characteristics, i.e. H3K4me3, m6A and m5C are enriched near transcription starting sites, stop codons and 5’UTRs, respectively. Overall, TARF is a useful web toolkit for annotation and visualization of RNA-related genomic features, and should help simplify the analysis of various RNA-related genomic features, especially those related RNA modifications.

Keywords: RNA-related genomic features, annotation, visualization, web server

Procedia PDF Downloads 122
19178 Evolutionary Genomic Analysis of Adaptation Genomics

Authors: Agostinho Antunes

Abstract:

The completion of the human genome sequencing in 2003 opened a new perspective into the importance of whole genome sequencing projects, and currently multiple species are having their genomes completed sequenced, from simple organisms, such as bacteria, to more complex taxa, such as mammals. This voluminous sequencing data generated across multiple organisms provides also the framework to better understand the genetic makeup of such species and related ones, allowing to explore the genetic changes underlining the evolution of diverse phenotypic traits. Here, recent results from our group retrieved from comparative evolutionary genomic analyses of varied species will be considered to exemplify how gene novelty and gene enhancement by positive selection might have been determinant in the success of adaptive radiations into diverse habitats and lifestyles.

Keywords: adaptation, animals, evolution, genomics

Procedia PDF Downloads 342
19177 Genomics of Adaptation in the Sea

Authors: Agostinho Antunes

Abstract:

The completion of the human genome sequencing in 2003 opened a new perspective into the importance of whole genome sequencing projects, and currently multiple species are having their genomes completed sequenced, from simple organisms, such as bacteria, to more complex taxa, such as mammals. This voluminous sequencing data generated across multiple organisms provides also the framework to better understand the genetic makeup of such species and related ones, allowing to explore the genetic changes underlining the evolution of diverse phenotypic traits. Here, recent results from our group retrieved from comparative evolutionary genomic analyses of selected marine animal species will be considered to exemplify how gene novelty and gene enhancement by positive selection might have been determinant in the success of adaptive radiations into diverse habitats and lifestyles.

Keywords: marine genomics, evolutionary bioinformatics, human genome sequencing, genomic analyses

Procedia PDF Downloads 522
19176 Genomic Diversity and Relationship among Arabian Peninsula Dromedary Camels Using Full Genome Sequencing Approach

Authors: H. Bahbahani, H. Musa, F. Al Mathen

Abstract:

The dromedary camels (Camelus dromedarius) are single-humped even-toed ungulates populating the African Sahara, Arabian Peninsula, and Southwest Asia. The genome of this desert-adapted species has been minimally investigated using autosomal microsatellite and mitochondrial DNA markers. In this study, the genomes of 33 dromedary camel samples from different parts of the Arabian Peninsula were sequenced using Illumina Next Generation Sequencing (NGS) platform. These data were combined with Genotyping-by-Sequencing (GBS) data from African (Sudanese) dromedaries to investigate the genomic relationship between African and Arabian Peninsula dromedary camels. Principle Component Analysis (PCA) and average genome-wide admixture analysis were be conducted on these data to tackle the objectives of these studies. Both of the two analyses conducted revealed phylogeographic distinction between these two camel populations. However, no breed-wise genetic classification has been revealed among the African (Sudanese) camel breeds. The Arabian Peninsula camel populations also show higher heterozygosity than the Sudanese camels. The results of this study explain the evolutionary history and migration of African dromedary camels from their center of domestication in the southern Arabian Peninsula. These outputs help scientists to further understand the evolutionary history of dromedary camels, which might impact in conserving the favorable genetic of this species.

Keywords: dromedary, genotyping-by-sequencing, Arabian Peninsula, Sudan

Procedia PDF Downloads 85
19175 SPARK: An Open-Source Knowledge Discovery Platform That Leverages Non-Relational Databases and Massively Parallel Computational Power for Heterogeneous Genomic Datasets

Authors: Thilina Ranaweera, Enes Makalic, John L. Hopper, Adrian Bickerstaffe

Abstract:

Data are the primary asset of biomedical researchers, and the engine for both discovery and research translation. As the volume and complexity of research datasets increase, especially with new technologies such as large single nucleotide polymorphism (SNP) chips, so too does the requirement for software to manage, process and analyze the data. Researchers often need to execute complicated queries and conduct complex analyzes of large-scale datasets. Existing tools to analyze such data, and other types of high-dimensional data, unfortunately suffer from one or more major problems. They typically require a high level of computing expertise, are too simplistic (i.e., do not fit realistic models that allow for complex interactions), are limited by computing power, do not exploit the computing power of large-scale parallel architectures (e.g. supercomputers, GPU clusters etc.), or are limited in the types of analysis available, compounded by the fact that integrating new analysis methods is not straightforward. Solutions to these problems, such as those developed and implemented on parallel architectures, are currently available to only a relatively small portion of medical researchers with access and know-how. The past decade has seen a rapid expansion of data management systems for the medical domain. Much attention has been given to systems that manage phenotype datasets generated by medical studies. The introduction of heterogeneous genomic data for research subjects that reside in these systems has highlighted the need for substantial improvements in software architecture. To address this problem, we have developed SPARK, an enabling and translational system for medical research, leveraging existing high performance computing resources, and analysis techniques currently available or being developed. It builds these into The Ark, an open-source web-based system designed to manage medical data. SPARK provides a next-generation biomedical data management solution that is based upon a novel Micro-Service architecture and Big Data technologies. The system serves to demonstrate the applicability of Micro-Service architectures for the development of high performance computing applications. When applied to high-dimensional medical datasets such as genomic data, relational data management approaches with normalized data structures suffer from unfeasibly high execution times for basic operations such as insert (i.e. importing a GWAS dataset) and the queries that are typical of the genomics research domain. SPARK resolves these problems by incorporating non-relational NoSQL databases that have been driven by the emergence of Big Data. SPARK provides researchers across the world with user-friendly access to state-of-the-art data management and analysis tools while eliminating the need for high-level informatics and programming skills. The system will benefit health and medical research by eliminating the burden of large-scale data management, querying, cleaning, and analysis. SPARK represents a major advancement in genome research technologies, vastly reducing the burden of working with genomic datasets, and enabling cutting edge analysis approaches that have previously been out of reach for many medical researchers.

Keywords: biomedical research, genomics, information systems, software

Procedia PDF Downloads 188
19174 Phenotype Prediction of DNA Sequence Data: A Machine and Statistical Learning Approach

Authors: Darlington Mapiye, Mpho Mokoatle, James Mashiyane, Stephanie Muller, Gciniwe Dlamini

Abstract:

Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on k-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0 %, 80.5 %, 80.5 %, 63.6 %, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanisms

Keywords: AWD-LSTM, bootstrapping, k-mers, next generation sequencing

Procedia PDF Downloads 58
19173 Phenotype Prediction of DNA Sequence Data: A Machine and Statistical Learning Approach

Authors: Mpho Mokoatle, Darlington Mapiye, James Mashiyane, Stephanie Muller, Gciniwe Dlamini

Abstract:

Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on $k$-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0%, 80.5%, 80.5%, 63.6%, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanisms.

Keywords: AWD-LSTM, bootstrapping, k-mers, next generation sequencing

Procedia PDF Downloads 63
19172 Analysis of Expression Data Using Unsupervised Techniques

Authors: M. A. I Perera, C. R. Wijesinghe, A. R. Weerasinghe

Abstract:

his study was conducted to review and identify the unsupervised techniques that can be employed to analyze gene expression data in order to identify better subtypes of tumors. Identifying subtypes of cancer help in improving the efficacy and reducing the toxicity of the treatments by identifying clues to find target therapeutics. Process of gene expression data analysis described under three steps as preprocessing, clustering, and cluster validation. Feature selection is important since the genomic data are high dimensional with a large number of features compared to samples. Hierarchical clustering and K Means are often used in the analysis of gene expression data. There are several cluster validation techniques used in validating the clusters. Heatmaps are an effective external validation method that allows comparing the identified classes with clinical variables and visual analysis of the classes.

Keywords: cancer subtypes, gene expression data analysis, clustering, cluster validation

Procedia PDF Downloads 56
19171 Bioinformatics High Performance Computation and Big Data

Authors: Javed Mohammed

Abstract:

Right now, bio-medical infrastructure lags well behind the curve. Our healthcare system is dispersed and disjointed; medical records are a bit of a mess; and we do not yet have the capacity to store and process the crazy amounts of data coming our way from widespread whole-genome sequencing. And then there are privacy issues. Despite these infrastructure challenges, some researchers are plunging into bio medical Big Data now, in hopes of extracting new and actionable knowledge. They are doing delving into molecular-level data to discover bio markers that help classify patients based on their response to existing treatments; and pushing their results out to physicians in novel and creative ways. Computer scientists and bio medical researchers are able to transform data into models and simulations that will enable scientists for the first time to gain a profound under-standing of the deepest biological functions. Solving biological problems may require High-Performance Computing HPC due either to the massive parallel computation required to solve a particular problem or to algorithmic complexity that may range from difficult to intractable. Many problems involve seemingly well-behaved polynomial time algorithms (such as all-to-all comparisons) but have massive computational requirements due to the large data sets that must be analyzed. High-throughput techniques for DNA sequencing and analysis of gene expression have led to exponential growth in the amount of publicly available genomic data. With the increased availability of genomic data traditional database approaches are no longer sufficient for rapidly performing life science queries involving the fusion of data types. Computing systems are now so powerful it is possible for researchers to consider modeling the folding of a protein or even the simulation of an entire human body. This research paper emphasizes the computational biology's growing need for high-performance computing and Big Data. It illustrates this article’s indispensability in meeting the scientific and engineering challenges of the twenty-first century, and how Protein Folding (the structure and function of proteins) and Phylogeny Reconstruction (evolutionary history of a group of genes) can use HPC that provides sufficient capability for evaluating or solving more limited but meaningful instances. This article also indicates solutions to optimization problems, and benefits Big Data and Computational Biology. The article illustrates the Current State-of-the-Art and Future-Generation Biology of HPC Computing with Big Data.

Keywords: high performance, big data, parallel computation, molecular data, computational biology

Procedia PDF Downloads 299
19170 Genetic Instabilities in Marine Bivalve Following Benzo(α)pyrene Exposure: Utilization of Combined Random Amplified Polymorphic DNA and Comet Assay

Authors: Mengjie Qu, Yi Wang, Jiawei Ding, Siyu Chen, Yanan Di

Abstract:

Marine ecosystem is facing intensified multiple stresses caused by environmental contaminants from human activities. Xenobiotics, such as benzo(α)pyrene (BaP) have been discharged into marine environment and cause hazardous impacts on both marine organisms and human beings. As a filter-feeder, marine mussels, Mytilus spp., has been extensively used to monitor the marine environment. However, their genomic alterations induced by such xenobiotics are still kept unknown. In the present study, gills, as the first defense barrier in mussels, were selected to evaluate the genetic instability alterations induced by the exposure to BaP both in vivo and in vitro. Both random amplified polymorphic DNA (RAPD) assay and comet assay were applied as the rapid tools to assess the environmental stresses due to their low money- and time-consumption. All mussels were identified to be the single species of Mytilus coruscus before used in BaP exposure at the concentration of 56 μg/l for 1 & 3 days (in vivo exposure) or 1 & 3 hours (in vitro). Both RAPD and comet assay results were showed significantly increased genomic instability with time-specific altering pattern. After the recovery period in 'in vivo' exposure, the genomic status was as same as control condition. However, the relative higher genomic instabilities were still observed in gill cells after the recovery from in vitro exposure condition. Different repair mechanisms or signaling pathway might be involved in the isolated gill cells in the comparison with intact tissues. The study provides the robust and rapid techniques to exam the genomic stability in marine organisms in response to marine environmental changes and provide basic information for further mechanism research in stress responses in marine organisms.

Keywords: genotoxic impacts, in vivo/vitro exposure, marine mussels, RAPD and comet assay

Procedia PDF Downloads 204
19169 Mitigating Ruminal Methanogenesis Through Genomic and Transcriptomic Approaches

Authors: Muhammad Adeel Arshad, Faiz-Ul Hassan, Yanfen Cheng

Abstract:

According to FAO, enteric methane (CH4) production is about 44% of all greenhouse gas emissions from the livestock sector. Ruminants produce CH4 as a result of fermentation of feed in the rumen especially from roughages which yield more CH4 per unit of biomass ingested as compared to concentrates. Efficient ruminal fermentation is not possible without abating CO2 and CH4. Methane abatement strategies are required to curb the predicted rise in emissions associated with greater ruminant production in future to meet ever increasing animal protein requirements. Ecology of ruminal methanogenesis and avenues for its mitigation can be identified through various genomic and transcriptomic techniques. Programs such as Hungate1000 and the Global Rumen Census have been launched to enhance our understanding about global ruminal microbial communities. Through Hungate1000 project, a comprehensive reference set of rumen microbial genome sequences has been developed from cultivated rumen bacteria and methanogenic archaea along with representative rumen anaerobic fungi and ciliate protozoa cultures. But still many species of rumen microbes are underrepresented especially uncultivable microbes. Lack of sequence information specific to the rumen's microbial community has inhibited efforts to use genomic data to identify specific set of species and their target genes involved in methanogenesis. Metagenomic and metatranscriptomic study of entire microbial rumen populations offer new perspectives to understand interaction of methanogens with other rumen microbes and their potential association with total gas and methane production. Deep understanding of methanogenic pathway will help to devise potentially effective strategies to abate methane production while increasing feed efficiency in ruminants.

Keywords: Genome sequences, Hungate1000, methanogens, ruminal fermentation

Procedia PDF Downloads 55
19168 Analysis of Saudi Breast Cancer Patients’ Primary Tumors using Array Comparative Genomic Hybridization

Authors: L. M. Al-Harbi, A. M. Shokry, J. S. M. Sabir, A. Chaudhary, J. Manikandan, K. S. Saini

Abstract:

Breast cancer is the second most common cause of cancer death worldwide and is the most common malignancy among Saudi females. During breast carcinogenesis, a wide-array of cytogenetic changes involving deletions, or amplification, or translocations, of part or whole of chromosome regions have been observed. Because of the limitations of various earlier technologies, newer tools are developed to scan for changes at the genomic level. Recently, Array Comparative Genomic Hybridization (aCGH) technique has been applied for detecting segmental genomic alterations at molecular level. In this study, aCGH was performed on twenty breast cancer tumors and their matching non-tumor (normal) counterparts using the Agilent 2x400K. Several regions were identified to be either amplified or deleted in a tumor-specific manner. Most frequent alterations were amplification of chromosome 1q, chromosome 8q, 20q, and deletions at 16q were also detected. The amplification of genetic events at 1q and 8q were further validated using FISH analysis using probes targeting 1q25 and 8q (MYC gene). The copy number changes at these loci can potentially cause a significant change in the tumor behavior, as deletions in the E-Cadherin (CDH1)-tumor suppressor gene as well as amplification of the oncogenes-Aurora Kinase A. (AURKA) and MYC could make these tumors highly metastatic. This study validates the use of aCGH in Saudi breast cancer patients and sets the foundations necessary for performing larger cohort studies searching for ethnicity-specific biomarkers and gene copy number variations.

Keywords: breast cancer, molecular biology, ecology, environment

Procedia PDF Downloads 278
19167 Isolation and Identification of Diacylglycerol Acyltransferase Type-2 (GAT2) Genes from Three Egyptian Olive Cultivars

Authors: Yahia I. Mohamed, Ahmed I. Marzouk, Mohamed A. Yacout

Abstract:

Aim of this work was to study the genetic basis for oil accumulation in olive fruit via tracking DGAT2 (Diacylglycerol acyltransferase type-2) gene in three Egyptian Origen Olive cultivars namely Toffahi, Hamed and Maraki using molecular marker techniques and bioinformatics tools. Results illustrate that, firstly: specific genomic band of Maraki cultivars was identified as DGAT2 (Diacylglycerol acyltransferase type-2) and identical for this gene in Olea europaea with 100 % of similarity. Secondly, differential genomic band of Maraki cultivars which produced from RAPD fingerprinting technique reflected predicted distinguished sequence which identified as DGAT2 (Diacylglycerol acyltransferase type-2) in Fragaria vesca subsp. Vesca with 76% of sequential similarity. Third and finally, specific genomic specific band of Hamed cultivars was indentified as two fragments, 1-Olea europaea cultivar Koroneiki diacylglycerol acyltransferase type 2 mRNA, complete cds with two matches regions with 99% or 2-PREDICTED: Fragaria vesca subsp. vesca diacylglycerol O-acyltransferase 2-like (LOC101313050), mRNA with 86% of similarity.

Keywords: Olea europaea, fingerprinting, diacylglycerol acyltransferase type-2 (DGAT2), Egypt

Procedia PDF Downloads 404
19166 Suppression Subtractive Hybridization Technique for Identification of the Differentially Expressed Genes

Authors: Tuhina-khatun, Mohamed Hanafi Musa, Mohd Rafii Yosup, Wong Mui Yun, Aktar-uz-Zaman, Mahbod Sahebi

Abstract:

Suppression subtractive hybridization (SSH) method is valuable tool for identifying differentially regulated genes in disease specific or tissue specific genes important for cellular growth and differentiation. It is a widely used method for separating DNA molecules that distinguish two closely related DNA samples. SSH is one of the most powerful and popular methods for generating subtracted cDNA or genomic DNA libraries. It is based primarily on a suppression polymerase chain reaction (PCR) technique and combines normalization and subtraction in a solitary procedure. The normalization step equalizes the abundance of DNA fragments within the target population, and the subtraction step excludes sequences that are common to the populations being compared. This dramatically increases the probability of obtaining low-abundance differentially expressed cDNAs or genomic DNA fragments and simplifies analysis of the subtracted library. SSH technique is applicable to many comparative and functional genetic studies for the identification of disease, developmental, tissue specific, or other differentially expressed genes, as well as for the recovery of genomic DNA fragments distinguishing the samples under comparison.

Keywords: suppression subtractive hybridization, differentially expressed genes, disease specific genes, tissue specific genes

Procedia PDF Downloads 323
19165 Cytogenetic Characterization of the VERO Cell Line Based on Comparisons with the Subline; Implication for Authorization and Quality Control of Animal Cell Lines

Authors: Fumio Kasai, Noriko Hirayama, Jorge Pereira, Azusa Ohtani, Masashi Iemura, Malcolm A. Ferguson Smith, Arihiro Kohara

Abstract:

The VERO cell line was established in 1962 from normal tissue of an African green monkey, Chlorocebus aethiops (2n=60), and has been commonly used worldwide for screening for toxins or as a cell substrate for the production of viral vaccines. The VERO genome was sequenced in 2014; however, its cytogenetic features have not been fully characterized as it contains several chromosome abnormalities and different karyotypes coexist in the cell line. In this study, the VERO cell line (JCRB0111) was compared with one of the sublines. In contrast to 59 chromosomes as the modal chromosome number in the VERO cell line, the subline had two peaks of 56 and 58 chromosomes. M-FISH analysis using human probes revealed that the VERO cell line was characterized by a translocation t(2;25) found in all metaphases, which was absent in the subline. Different abnormalities detected only in the subline show that the cell line is heterogeneous, indicating that the subline has the potential to change its genomic characteristics during cell culture. The various alterations in the two independent lineages suggest that genomic changes in both VERO cells can be accounted for by progressive rearrangements during their evolution in culture. Both t(5;X) and t(8;14) observed in all metaphases of the two cell lines might have a key role in VERO cells and could be used as genetic markers to identify VERO cells. The flow karyotype shows distinct differences from normal. Further analysis of sorted abnormal chromosomes may uncover other characteristics of VERO cells. Because of the absence of STR data, cytogenetic data are important in characterizing animal cell lines and can be an indicator of their quality control.

Keywords: VERO, cell culture passage, chromosome rearrangement, heterogeneous cells

Procedia PDF Downloads 282
19164 Antibody Reactivity of Synthetic Peptides Belonging to Proteins Encoded by Genes Located in Mycobacterium tuberculosis-Specific Genomic Regions of Differences

Authors: Abu Salim Mustafa

Abstract:

The comparisons of mycobacterial genomes have identified several Mycobacterium tuberculosis-specific genomic regions that are absent in other mycobacteria and are known as regions of differences. Due to M. tuberculosis-specificity, the peptides encoded by these regions could be useful in the specific diagnosis of tuberculosis. To explore this possibility, overlapping synthetic peptides corresponding to 39 proteins predicted to be encoded by genes present in regions of differences were tested for antibody-reactivity with sera from tuberculosis patients and healthy subjects. The results identified four immunodominant peptides corresponding to four different proteins, with three of the peptides showing significantly stronger antibody reactivity and rate of positivity with sera from tuberculosis patients than healthy subjects. The fourth peptide was recognized equally well by the sera of tuberculosis patients as well as healthy subjects. Predication of antibody epitopes by bioinformatics analyses using ABCpred server predicted multiple linear epitopes in each peptide. Furthermore, peptide sequence analysis for sequence identity using BLAST suggested M. tuberculosis-specificity for the three peptides that had preferential reactivity with sera from tuberculosis patients, but the peptide with equal reactivity with sera of TB patients and healthy subjects showed significant identity with sequences present in nob-tuberculous mycobacteria. The three identified M. tuberculosis-specific immunodominant peptides may be useful in the serological diagnosis of tuberculosis.

Keywords: genomic regions of differences, Mycobacterium tuberculossis, peptides, serodiagnosis

Procedia PDF Downloads 111
19163 Single Cell and Spatial Transcriptomics: A Beginners Viewpoint from the Conceptual Pipeline

Authors: Leo Nnamdi Ozurumba-Dwight

Abstract:

Messenger ribooxynucleic acid (mRNA) molecules are compositional, protein-based. These proteins, encoding mRNA molecules (which collectively connote the transcriptome), when analyzed by RNA sequencing (RNAseq), unveils the nature of gene expression in the RNA. The obtained gene expression provides clues of cellular traits and their dynamics in presentations. These can be studied in relation to function and responses. RNAseq is a practical concept in Genomics as it enables detection and quantitative analysis of mRNA molecules. Single cell and spatial transcriptomics both present varying avenues for expositions in genomic characteristics of single cells and pooled cells in disease conditions such as cancer, auto-immune diseases, hematopoietic based diseases, among others, from investigated biological tissue samples. Single cell transcriptomics helps conduct a direct assessment of each building unit of tissues (the cell) during diagnosis and molecular gene expressional studies. A typical technique to achieve this is through the use of a single-cell RNA sequencer (scRNAseq), which helps in conducting high throughput genomic expressional studies. However, this technique generates expressional gene data for several cells which lack presentations on the cells’ positional coordinates within the tissue. As science is developmental, the use of complimentary pre-established tissue reference maps using molecular and bioinformatics techniques has innovatively sprung-forth and is now used to resolve this set back to produce both levels of data in one shot of scRNAseq analysis. This is an emerging conceptual approach in methodology for integrative and progressively dependable transcriptomics analysis. This can support in-situ fashioned analysis for better understanding of tissue functional organization, unveil new biomarkers for early-stage detection of diseases, biomarkers for therapeutic targets in drug development, and exposit nature of cell-to-cell interactions. Also, these are vital genomic signatures and characterizations of clinical applications. Over the past decades, RNAseq has generated a wide array of information that is igniting bespoke breakthroughs and innovations in Biomedicine. On the other side, spatial transcriptomics is tissue level based and utilized to study biological specimens having heterogeneous features. It exposits the gross identity of investigated mammalian tissues, which can then be used to study cell differentiation, track cell line trajectory patterns and behavior, and regulatory homeostasis in disease states. Also, it requires referenced positional analysis to make up of genomic signatures that will be sassed from the single cells in the tissue sample. Given these two presented approaches to RNA transcriptomics study in varying quantities of cell lines, with avenues for appropriate resolutions, both approaches have made the study of gene expression from mRNA molecules interesting, progressive, developmental, and helping to tackle health challenges head-on.

Keywords: transcriptomics, RNA sequencing, single cell, spatial, gene expression.

Procedia PDF Downloads 54
19162 Allelic Diversity of Productive, Reproductive and Fertility Traits Genes of Buffalo and Cattle

Authors: M. Moaeen-ud-Din, G. Bilal, M. Yaqoob

Abstract:

Identification of genes of importance regarding production traits in buffalo is impaired by a paucity of genomic resources. Choice to fill this gap is to exploit data available for cow. The cross-species application of comparative genomics tools is potential gear to investigate the buffalo genome. However, this is dependent on nucleotide sequences similarity. In this study gene diversity between buffalo and cattle was determined by using 86 gene orthologues. There was about 3% difference in all genes in term of nucleotide diversity; and 0.267±0.134 in amino acids indicating the possibility for successfully using cross-species strategies for genomic studies. There were significantly higher non synonymous substitutions both in cattle and buffalo however, there was similar difference in term of dN – dS (4.414 vs 4.745) in buffalo and cattle respectively. Higher rate of non-synonymous substitutions at similar level in buffalo and cattle indicated a similar positive selection pressure. Results for relative rate test were assessed with the chi-squared test. There was no significance difference on unique mutations between cattle and buffalo lineages at synonymous sites. However, there was a significance difference on unique mutations for non synonymous sites indicating ongoing mutagenic process that generates substitutional mutation at approximately the same rate at silent sites. Moreover, despite of common ancestry, our results indicate a different divergent time among genes of cattle and buffalo. This is the first demonstration that variable rates of molecular evolution may be present within the family Bovidae.

Keywords: buffalo, cattle, gene diversity, molecular evolution

Procedia PDF Downloads 416
19161 Copy Number Variants in Children with Non-Syndromic Congenital Heart Diseases from Mexico

Authors: Maria Lopez-Ibarra, Ana Velazquez-Wong, Lucelli Yañez-Gutierrez, Maria Araujo-Solis, Fabio Salamanca-Gomez, Alfonso Mendez-Tenorio, Haydeé Rosas-Vargas

Abstract:

Congenital heart diseases (CHD) are the most common congenital abnormalities. These conditions can occur as both an element of distinct chromosomal malformation syndromes or as non-syndromic forms. Their etiology is not fully understood. Genetic variants such copy number variants have been associated with CHD. The aim of our study was to analyze these genomic variants in peripheral blood from Mexican children diagnosed with non-syndromic CHD. We included 16 children with atrial and ventricular septal defects and 5 healthy subjects without heart malformations as controls. To exclude the most common heart disease-associated syndrome alteration, we performed a fluorescence in situ hybridization test to identify the 22q11.2, responsible for congenital heart abnormalities associated with Di-George Syndrome. Then, a microarray based comparative genomic hybridization was used to identify global copy number variants. The identification of copy number variants resulted from the comparison and analysis between our results and data from main genetic variation databases. We identified copy number variants gain in three chromosomes regions from pediatric patients, 4q13.2 (31.25%), 9q34.3 (25%) and 20q13.33 (50%), where several genes associated with cellular, biosynthetic, and metabolic processes are located, UGT2B15, UGT2B17, SNAPC4, SDCCAG3, PMPCA, INPP6E, C9orf163, NOTCH1, C20orf166, and SLCO4A1. In addition, after a hierarchical cluster analysis based on the fluorescence intensity ratios from the comparative genomic hybridization, two congenital heart disease groups were generated corresponding to children with atrial or ventricular septal defects. Further analysis with a larger sample size is needed to corroborate these copy number variants as possible biomarkers to differentiate between heart abnormalities. Interestingly, the 20q13.33 gain was present in 50% of children with these CHD which could suggest that alterations in both coding and non-coding elements within this chromosomal region may play an important role in distinct heart conditions.

Keywords: aCGH, bioinformatics, congenital heart diseases, copy number variants, fluorescence in situ hybridization

Procedia PDF Downloads 219
19160 Integrative Omics-Portrayal Disentangles Molecular Heterogeneity and Progression Mechanisms of Cancer

Authors: Binder Hans

Abstract:

Cancer is no longer seen as solely a genetic disease where genetic defects such as mutations and copy number variations affect gene regulation and eventually lead to aberrant cell functioning which can be monitored by transcriptome analysis. It has become obvious that epigenetic alterations represent a further important layer of (de-)regulation of gene activity. For example, aberrant DNA methylation is a hallmark of many cancer types, and methylation patterns were successfully used to subtype cancer heterogeneity. Hence, unraveling the interplay between different omics levels such as genome, transcriptome and epigenome is inevitable for a mechanistic understanding of molecular deregulation causing complex diseases such as cancer. This objective requires powerful downstream integrative bioinformatics methods as an essential prerequisite to discover the whole genome mutational, transcriptome and epigenome landscapes of cancer specimen and to discover cancer genesis, progression and heterogeneity. Basic challenges and tasks arise ‘beyond sequencing’ because of the big size of the data, their complexity, the need to search for hidden structures in the data, for knowledge mining to discover biological function and also systems biology conceptual models to deduce developmental interrelations between different cancer states. These tasks are tightly related to cancer biology as an (epi-)genetic disease giving rise to aberrant genomic regulation under micro-environmental control and clonal evolution which leads to heterogeneous cellular states. Machine learning algorithms such as self organizing maps (SOM) represent one interesting option to tackle these bioinformatics tasks. The SOMmethod enables recognizing complex patterns in large-scale data generated by highthroughput omics technologies. It portrays molecular phenotypes by generating individualized, easy to interpret images of the data landscape in combination with comprehensive analysis options. Our image-based, reductionist machine learning methods provide one interesting perspective how to deal with massive data in the discovery of complex diseases, gliomas, melanomas and colon cancer on molecular level. As an important new challenge, we address the combined portrayal of different omics data such as genome-wide genomic, transcriptomic and methylomic ones. The integrative-omics portrayal approach is based on the joint training of the data and it provides separate personalized data portraits for each patient and data type which can be analyzed by visual inspection as one option. The new method enables an integrative genome-wide view on the omics data types and the underlying regulatory modes. It is applied to high and low-grade gliomas and to melanomas where it disentangles transversal and longitudinal molecular heterogeneity in terms of distinct molecular subtypes and progression paths with prognostic impact.

Keywords: integrative bioinformatics, machine learning, molecular mechanisms of cancer, gliomas and melanomas

Procedia PDF Downloads 82
19159 Genome Sequencing, Assembly and Annotation of Gelidium Pristoides from Kenton-on-Sea, South Africa

Authors: Sandisiwe Mangali, Graeme Bradley

Abstract:

Genome is complete set of the organism's hereditary information encoded as either deoxyribonucleic acid or ribonucleic acid in most viruses. The three different types of genomes are nuclear, mitochondrial and the plastid genome and their sequences which are uncovered by genome sequencing are known as an archive for all genetic information and enable researchers to understand the composition of a genome, regulation of gene expression and also provide information on how the whole genome works. These sequences enable researchers to explore the population structure, genetic variations, and recent demographic events in threatened species. Particularly, genome sequencing refers to a process of figuring out the exact arrangement of the basic nucleotide bases of a genome and the process through which all the afore-mentioned genomes are sequenced is referred to as whole or complete genome sequencing. Gelidium pristoides is South African endemic Rhodophyta species which has been harvested in the Eastern Cape since the 1950s for its high economic value which is one motivation for its sequencing. Its endemism further motivates its sequencing for conservation biology as endemic species are more vulnerable to anthropogenic activities endangering a species. As sequencing, mapping and annotating the Gelidium pristoides genome is the aim of this study. To accomplish this aim, the genomic DNA was extracted and quantified using the Nucleospin Plank Kit, Qubit 2.0 and Nanodrop. Thereafter, the Ion Plus Fragment Library was used for preparation of a 600bp library which was then sequenced through the Ion S5 sequencing platform for two runs. The produced reads were then quality-controlled and assembled through the SPAdes assembler with default parameters and the genome assembly was quality assessed through the QUAST software. From this assembly, the plastid and the mitochondrial genomes were then sampled out using Gelidiales organellar genomes as search queries and ordered according to them using the Geneious software. The Qubit and the Nanodrop instruments revealed an A260/A280 and A230/A260 values of 1.81 and 1.52 respectively. A total of 30792074 reads were obtained and produced a total of 94140 contigs with resulted into a sequence length of 217.06 Mbp with N50 value of 3072 bp and GC content of 41.72%. A total length of 179281bp and 25734 bp was obtained for plastid and mitochondrial respectively. Genomic data allows a clear understanding of the genomic constituent of an organism and is valuable as foundation information for studies of individual genes and resolving the evolutionary relationships between organisms including Rhodophytes and other seaweeds.

Keywords: Gelidium pristoides, genome, genome sequencing and assembly, Ion S5 sequencing platform

Procedia PDF Downloads 68
19158 Mobile Genetic Elements in Trematode Himasthla Elongata Clonal Polymorphism

Authors: Anna Solovyeva, Ivan Levakin, Nickolai Galaktionov, Olga Podgornaya

Abstract:

Animals that reproduce asexually were thought to have the same genotypes within generations for a long time. However, some refuting examples were found, and mobile genetic elements (MGEs) or transposons are considered to be the most probable source of genetic instability. Dispersed nature and the ability to change their genomic localization enables MGEs to be efficient mutators. Hence the study of MGEs genomic impact requires an appropriate object which comprehends both representative amounts of various MGEs and options to evaluate the genomic influence of MGEs. Animals that reproduce asexually seem to be a decent model to study MGEs impact in genomic variability. We found a small marine trematode Himasthla elongata (Himasthlidae) to be a good model for such investigation as it has a small genome size, diverse MGEs and parthenogenetic stages in the lifecycle. In the current work, clonal diversity of cercaria was traced with an AFLP (Amplified fragment length polymorphism) method, diverse zones from electrophoretic patterns were cloned, and the nature of the fragments explored. Polymorphic patterns of individual cercariae AFLP-based fingerprints are enriched with retrotransposons of different families. The bulk of those sequences are represented by open reading frames of non-Long Terminal Repeats containing elements(non-LTR) yet Long-Terminal Repeats containing elements (LTR), to a lesser extent in variable figments of AFLP array. The CR1 elements expose both in polymorphic and conservative patterns are remarkably more frequent than the other non-LTR retrotransposons. This data was confirmed with shotgun sequencing-based on Illumina HiSeq 2500 platform. Individual cercaria of the same clone (i.e., originated from a single miracidium and inhabiting one host) has a various distribution of MGE families detected in sequenced AFLP patterns. The most numerous are CR1 and RTE-Bov retrotransposons, typical for trematode genomes. Also, we identified LTR-retrotransposons of Pao and Gypsy families among DNA transposons of CMC-EnSpm, Tc1/Mariner, MuLE-MuDR and Merlin families. We detected many of them in H. elongata transcriptome. Such uneven MGEs distribution in AFLP sequences’ sets reflects the different patterns of transposons spreading in cercarial genomes as transposons affect the genome in many ways (ectopic recombination, gene structure interruption, epigenetic silencing). It is considered that they play a key role in the origins of trematode clonal polymorphism. The authors greatly appreciate the help received at the Kartesh White Sea Biological Station of the Russian Academy of Sciences Zoological Institute. This work is funded with RSF 19-74-20102 and RFBR 17-04-02161 grants and the research program of the Zoological Institute of the Russian Academy of Sciences (project number AAAA-A19-119020690109-2).

Keywords: AFLP, clonal polymorphism, Himasthla elongata, mobile genetic elements, NGS

Procedia PDF Downloads 45
19157 Non-Mammalian Pattern Recognition Receptor from Rock Bream (Oplegnathus fasciatus): Genomic Characterization and Transcriptional Profile upon Bacterial and Viral Inductions

Authors: Thanthrige Thiunuwan Priyathilaka, Don Anushka Sandaruwan Elvitigala, Bong-Soo Lim, Hyung-Bok Jeong, Jehee Lee

Abstract:

Toll like receptors (TLRs) are a phylogeneticaly conserved family of pattern recognition receptors, which participates in the host immune responses against various pathogens and pathogen derived mitogen. TLR21, a non-mammalian type, is almost restricted to the fish species even though those can be identified rarely in avians and amphibians. Herein, this study was carried out to identify and characterize TLR21 from rock bream (Oplegnathus fasciatus) designated as RbTLR21, at transcriptional and genomic level. In this study, the full length cDNA and genomic sequence of RbTLR21 was identified using previously constructed cDNA sequence database and BAC library, respectively. Identified RbTLR21 sequence was characterized using several bioinformatics tools. The quantitative real time PCR (qPCR) experiment was conducted to determine tissue specific expressional distribution of RbTLR21. Further, transcriptional modulation of RbTLR21 upon the stimulation with Streptococcus iniae (S. iniae), rock bream iridovirus (RBIV) and Edwardsiella tarda (E. tarda) was analyzed in spleen tissues. The complete coding sequence of RbTLR21 was 2919 bp in length which can encode a protein consisting of 973 amino acid residues with molecular mass of 112 kDa and theoretical isoelectric point of 8.6. The anticipated protein sequence resembled a typical TLR domain architecture including C-terminal ectodomain with 16 leucine rich repeats, a transmembrane domain, cytoplasmic TIR domain and signal peptide with 23 amino acid residues. Moreover, protein folding pattern prediction of RbTLR21 exhibited well-structured and folded ectodomain, transmembrane domain and cytoplasmc TIR domain. According to the pair wise sequence analysis data, RbTLR21 showed closest homology with orange-spotted grouper (Epinephelus coioides) TLR21with 76.9% amino acid identity. Furthermore, our phylogenetic analysis revealed that RbTLR21 shows a close evolutionary relationship with its ortholog from Danio rerio. Genomic structure of RbTLR21 consisted of single exon similar to its ortholog of zebra fish. Sevaral putative transcription factor binding sites were also identified in 5ʹ flanking region of RbTLR21. The RBTLR 21 was ubiquitously expressed in all the tissues we tested. Relatively, high expression levels were found in spleen, liver and blood tissues. Upon induction with rock bream iridovirus, RbTLR21 expression was upregulated at the early phase of post induction period even though RbTLR21 expression level was fluctuated at the latter phase of post induction period. Post Edwardsiella tarda injection, RbTLR transcripts were upregulated throughout the experiment. Similarly, Streptococcus iniae induction exhibited significant upregulations of RbTLR21 mRNA expression in the spleen tissues. Collectively, our findings suggest that RbTLR21 is indeed a homolog of TLR21 family members and RbTLR21 may be involved in host immune responses against bacterial and DNA viral infections.

Keywords: rock bream, toll like receptor 21 (TLR21), pattern recognition receptor, genomic characterization

Procedia PDF Downloads 329
19156 Breeding Cotton for Annual Growth Habit: Remobilizing End-of-season Perennial Reserves for Increased Yield

Authors: Salman Naveed, Nitant Gandhi, Grant Billings, Zachary Jones, B. Todd Campbell, Michael Jones, Sachin Rustgi

Abstract:

Cotton (Gossypium spp.) is the primary source of natural fiber in the U.S. and a major crop in the Southeastern U.S. Despite constant efforts to increase the cotton fiber yield, the yield gain has stagnated. Therefore, we undertook a novel approach to improve the cotton fiber yield by altering its growth habit from perennial to annual. In this effort, we identified genotypes with high-expression alleles of five floral induction and meristem identity genes (FT, SOC1, FUL, LFY, and AP1) from an upland cotton mini-core collection and crossed them in various combinations to develop cotton lines with annual growth habit, optimal flowering time and enhanced productivity. To facilitate the characterization of genotypes with the desired combinations of stacked alleles, we identified markers associated with the gene expression traits via genome-wide association analysis using a 63K SNP Array (Hulse-Kemp et al. 2015 G3 5:1187). Over 14,500 SNPs showed polymorphism and were used for association analysis. A total of 396 markers showed association with expression traits. Out of these 396 markers, 159 mapped to genes, 50 to untranslated regions, and 187 to random genomic regions. Biased genomic distribution of associated markers was observed where more trait-associated markers mapped to the cotton D sub-genome. Many quantitative trait loci coincided at specific genomic regions. This observation has implications as these traits could be bred together. The analysis also allowed the identification of candidate regulators of the expression patterns of these floral induction and meristem identity genes whose functions will be validated via virus-induced gene silencing.

Keywords: cotton, GWAS, QTL, expression traits

Procedia PDF Downloads 61
19155 Processing Big Data: An Approach Using Feature Selection

Authors: Nikat Parveen, M. Ananthi

Abstract:

Big data is one of the emerging technology, which collects the data from various sensors and those data will be used in many fields. Data retrieval is one of the major issue where there is a need to extract the exact data as per the need. In this paper, large amount of data set is processed by using the feature selection. Feature selection helps to choose the data which are actually needed to process and execute the task. The key value is the one which helps to point out exact data available in the storage space. Here the available data is streamed and R-Center is proposed to achieve this task.

Keywords: big data, key value, feature selection, retrieval, performance

Procedia PDF Downloads 164
19154 A Hybrid Feature Selection and Deep Learning Algorithm for Cancer Disease Classification

Authors: Niousha Bagheri Khulenjani, Mohammad Saniee Abadeh

Abstract:

Learning from very big datasets is a significant problem for most present data mining and machine learning algorithms. MicroRNA (miRNA) is one of the important big genomic and non-coding datasets presenting the genome sequences. In this paper, a hybrid method for the classification of the miRNA data is proposed. Due to the variety of cancers and high number of genes, analyzing the miRNA dataset has been a challenging problem for researchers. The number of features corresponding to the number of samples is high and the data suffer from being imbalanced. The feature selection method has been used to select features having more ability to distinguish classes and eliminating obscures features. Afterward, a Convolutional Neural Network (CNN) classifier for classification of cancer types is utilized, which employs a Genetic Algorithm to highlight optimized hyper-parameters of CNN. In order to make the process of classification by CNN faster, Graphics Processing Unit (GPU) is recommended for calculating the mathematic equation in a parallel way. The proposed method is tested on a real-world dataset with 8,129 patients, 29 different types of tumors, and 1,046 miRNA biomarkers, taken from The Cancer Genome Atlas (TCGA) database.

Keywords: cancer classification, feature selection, deep learning, genetic algorithm

Procedia PDF Downloads 42