Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 23

23 Hypoxia Tolerance, Longevity and Cancer-Resistance in the Mole Rat Spalax – a Liver Transcriptomics Approach

Authors: Hanno Schmidt, Assaf Malik, Anne Bicker, Gesa Poetzsch, Aaron Avivi, Imad Shams, Thomas Hankeln


The blind subterranean mole rat Spalax shows a remarkable tolerance to hypoxia, cancer-resistance and longevity. Unravelling the genomic basis of these adaptations will be important for biomedical applications. RNA-Seq gene expression data were obtained from normoxic and hypoxic Spalax and rat liver tissue. Hypoxic Spalax broadly downregulates genes from major liver function pathways. This energy-saving response is likely a crucial adaptation to low oxygen levels. In contrast, the hypoxiasensitive rat shows massive upregulation of energy metabolism genes. Candidate genes with plausible connections to the mole rat’s phenotype, such as important key genes related to hypoxia-tolerance, DNA damage repair, tumourigenesis and ageing, are substantially higher expressed in Spalax than in rat. Comparative liver transcriptomics highlights the importance of molecular adaptations at the gene regulatory level in Spalax and pinpoints a variety of starting points for subsequent functional studies.

Keywords: cancer, hypoxia, longevity, transcriptomics

22 Single Cell and Spatial Transcriptomics: A Beginners Viewpoint from the Conceptual Pipeline

Authors: Leo Nnamdi Ozurumba-Dwight


Messenger ribooxynucleic acid (mRNA) molecules are compositional, protein-based. These proteins, encoding mRNA molecules (which collectively connote the transcriptome), when analyzed by RNA sequencing (RNAseq), unveils the nature of gene expression in the RNA. The obtained gene expression provides clues of cellular traits and their dynamics in presentations. These can be studied in relation to function and responses. RNAseq is a practical concept in Genomics as it enables detection and quantitative analysis of mRNA molecules. Single cell and spatial transcriptomics both present varying avenues for expositions in genomic characteristics of single cells and pooled cells in disease conditions such as cancer, auto-immune diseases, hematopoietic based diseases, among others, from investigated biological tissue samples. Single cell transcriptomics helps conduct a direct assessment of each building unit of tissues (the cell) during diagnosis and molecular gene expressional studies. A typical technique to achieve this is through the use of a single-cell RNA sequencer (scRNAseq), which helps in conducting high throughput genomic expressional studies. However, this technique generates expressional gene data for several cells which lack presentations on the cells’ positional coordinates within the tissue. As science is developmental, the use of complimentary pre-established tissue reference maps using molecular and bioinformatics techniques has innovatively sprung-forth and is now used to resolve this set back to produce both levels of data in one shot of scRNAseq analysis. This is an emerging conceptual approach in methodology for integrative and progressively dependable transcriptomics analysis. This can support in-situ fashioned analysis for better understanding of tissue functional organization, unveil new biomarkers for early-stage detection of diseases, biomarkers for therapeutic targets in drug development, and exposit nature of cell-to-cell interactions. Also, these are vital genomic signatures and characterizations of clinical applications. Over the past decades, RNAseq has generated a wide array of information that is igniting bespoke breakthroughs and innovations in Biomedicine. On the other side, spatial transcriptomics is tissue level based and utilized to study biological specimens having heterogeneous features. It exposits the gross identity of investigated mammalian tissues, which can then be used to study cell differentiation, track cell line trajectory patterns and behavior, and regulatory homeostasis in disease states. Also, it requires referenced positional analysis to make up of genomic signatures that will be sassed from the single cells in the tissue sample. Given these two presented approaches to RNA transcriptomics study in varying quantities of cell lines, with avenues for appropriate resolutions, both approaches have made the study of gene expression from mRNA molecules interesting, progressive, developmental, and helping to tackle health challenges head-on.

Keywords: transcriptomics, RNA sequencing, single cell, spatial, gene expression.

21 Cellular Puzzles: Tissue Profiling with Sparse Machine Learning Models Using Single-Cell RNA Sequencing and Spatial Transcriptomics Datasets

Authors: Nuray S. Erdogan, Deniz Eroglu


The state-of-the-art tools of modern science have turned to collecting large and complex data sets with the increase in technological developments and innovations. Using single-cell RNA sequencing (scRNA-Seq) methods as today's one of the cutting-edge technologies in molecular biology, high-resolution cell population heterogeneity can be accessed through cellular level gene expression data. However, this technology cannot detect the spatial positions of cells. On the other hand, spatially resolved gene expression profiles obtained from spatial transcriptomics (ST) play a key role in understanding tissue organization and function. But ST technology lacks single-cell resolution. We developed a sparse regression model that leverages high-resolution cell type data from scRNA-Seq to deconvolute profiles from ST, which serves spatial locations of cellular information in a lower resolution. This sparse regression is the first model among spatial deconvolution models aiming to perform spatial data deconvolution by predicting the “best sparse solutions”. The performance superiority of this sparse regression model was shown on simulated datasets. Tissue architecture from real datasets were mapped in coherence with the real cellular arrangements. Sparse deconvolution of the developing embryonic human heart correctly mapped ventricular, atrial, and vessel-specific cell profiles and defined overall heart architecture successfully. The principle of our model was driven by the sparsity of nature. This sparse deconvolution model will make essential contributions to personalized medicine as it enables the high-resolution molecular profiling of biological tissues based on natural and biological sparsity.

Keywords: molecular profiling, single-cell RNA sequencing, sparse regression, spatial transcriptomics

20 Transcriptomics Analysis on Comparing Non-Small Cell Lung Cancer versus Normal Lung, and Early Stage Compared versus Late-Stages of Non-Small Cell Lung Cancer

Authors: Achitphol Chookaew, Paramee Thongsukhsai, Patamarerk Engsontia, Narongwit Nakwan, Pritsana Raugrut


Lung cancer is one of the most common malignancies and primary cause of death due to cancer worldwide. Non-small cell lung cancer (NSCLC) is the main subtype in which majority of patients present with advanced-stage disease. Herein, we analyzed differentially expressed genes to find potential biomarkers for lung cancer diagnosis as well as prognostic markers. We used transcriptome data from our 2 NSCLC patients and public data (GSE81089) composing of 8 NSCLC and 10 normal lung tissues. Differentially expressed genes (DEGs) between NSCLC and normal tissue and between early-stage and late-stage NSCLC were analyzed by the DESeq2. Pairwise correlation was used to find the DEGs with false discovery rate (FDR) adjusted p-value £ 0.05 and |log2 fold change| ³ 4 for NSCLC versus normal and FDR adjusted p-value £ 0.05 with |log2 fold change| ³ 2 for early versus late-stage NSCLC. Bioinformatic tools were used for functional and pathway analysis. Moreover, the top ten genes in each comparison group were verified the expression and survival analysis via GEPIA. We found 150 up-regulated and 45 down-regulated genes in NSCLC compared to normal tissues. Many immnunoglobulin-related genes e.g., IGHV4-4, IGHV5-10-1, IGHV4-31, IGHV4-61, and IGHV1-69D were significantly up-regulated. 22 genes were up-regulated, and five genes were down-regulated in late-stage compared to early-stage NSCLC. The top five DEGs genes were KRT6B, SPRR1A, KRT13, KRT6A and KRT5. Keratin 6B (KRT6B) was the most significantly increased gene in the late-stage NSCLC. From GEPIA analysis, we concluded that IGHV4-31 and IGKV1-9 might be used as diagnostic biomarkers, while KRT6B and KRT6A might be used as prognostic biomarkers. However, further clinical validation is needed.

Keywords: differentially expressed genes, early and late-stages, gene ontology, non-small cell lung cancer transcriptomics

19 Unzipping the Stress Response Genes in Moringa oleifera Lam. through Transcriptomics

Authors: Vivian A. Panes, Raymond John S. Rebong, Miel Q. Diaz


Moringa oleifera Lam. is known mainly for its high nutritional value and medicinal properties contributing to its popular reputation as a 'miracle plant' in the tropical climates where it usually grows. The main objective of this study is to discover the genes and gene products involved in abiotic stress-induced activity that may impact the M. oleifera Lam. mature seeds as well as their corresponding functions. In this study, RNA-sequencing and de novo transcriptome assembly were performed using two assemblers, Trinity and Oases, which produced 177,417 and 120,818 contigs respectively. These transcripts were then subjected to various bioinformatics tools such as Blast2GO, UniProt, KEGG, and COG for gene annotation and the analysis of relevant metabolic pathways. Furthermore, FPKM analysis was performed to identify gene expression levels. The sequences were filtered according to the 'response to stress' GO term since this study dealt with stress response. Clustered Orthologous Groups (COG) showed that the highest frequencies of stress response gene functions were those of cytoskeleton which make up approximately 14% and 23% of stress-related sequences under Trinity and Oases respectively, recombination, repair and replication at 11% and 14% respectively, carbohydrate transport and metabolism at 23% and 9% respectively and defense mechanisms 16% and 12% respectively. KEGG pathway analysis determined the most abundant stress-response genes in the phenylpropanoid biosynthesis at counts of 187 and 166 pathways for Oases and Trinity respectively, purine metabolism at 123 and 230 pathways, and biosynthesis of antibiotics at 105 and 102. Unique and cumulative GO term counts revealed that majority of the stress response genes belonged to the category of cellular response to stress at cumulative counts of 1,487 to 2,187 for Oases and Trinity respectively, defense response at 754 and 1,255, and response to heat at 213 and 208, response to water deprivation at 229 and 228, and oxidative stress at 508 and 488. Lastly, FPKM was used to determine the levels of expression of each stress response gene. The most upregulated gene encodes for thiamine thiazole synthase chloroplastic-like enzyme which plays a significant role in DNA damage tolerance. Data analysis implies that M. oleifera stress response genes are directed towards the effects of climate change more than other stresses indicating the potential of M. oleifera for cultivation in harsh environments because it is resistant to climate change, pathogens, and foreign invaders.

Keywords: stress response, genes, Moringa oleifera, transcriptomics

18 BingleSeq: A User-Friendly R Package for Single-Cell RNA-Seq Data Analysis

Authors: Quan Gu, Daniel Dimitrov


BingleSeq was developed as a shiny-based, intuitive, and comprehensive application that enables the analysis of single-Cell RNA-Sequencing count data. This was achieved via incorporating three state-of-the-art software packages for each type of RNA sequencing analysis, alongside functional annotation analysis and a way to assess the overlap of differential expression method results. At its current state, the functionality implemented within BingleSeq is comparable to that of other applications, also developed with the purpose of lowering the entry requirements to RNA Sequencing analyses. BingleSeq is available on GitHub and will be submitted to R/Bioconductor.

Keywords: bioinformatics, functional annotation analysis, single-cell RNA-sequencing, transcriptomics

17 Effects of Epinephrine on Gene Expressions during the Metamorphosis of Pacific Oyster Crassostrea gigas

Authors: Fei Xu, Guofan Zhang, Xiao Liu


Many major marine invertebrate phyla are characterized by indirect development. These animals transit from planktonic larvae to benthic adults via settlement and metamorphosis, which has many advantages for organisms to adapt marine environment. Studying the biological process of metamorphosis is thus a key to understand the origin and evolution of indirect development. Although the mechanism of metamorphosis has been largely studied on their relationships with the marine environment, microorganisms, as well as the neurohormones, little is known on the gene regulation network (GRN) during metamorphosis. We treated competent oyster pediveligers with epinephrine, which was known to be able to effectively induce oyster metamorphosis, and analyzed the dynamics of gene and proteins with transcriptomics and proteomics methods. The result indicated significant upregulation of protein synthesis system, as well as some transcription factors including Homeobox, basic helix-loop-helix, and nuclear receptors. The result suggested the GRN complexity of the transition stage during oyster metamorphosis.

Keywords: indirect development, gene regulation network, protein synthesis, transcription factors

16 In Silico Analysis of Small Heat Shock Protein Gene Family by RNA-Seq during Tomato Fruit Ripening

Authors: Debora P. Arce, Flavia J. Krsticevic, Marco R. Bertolaccini, Joaquín Ezpeleta, Estela M. Valle, Sergio D. Ponce, Elizabeth Tapia


Small Heat Shock Proteins (sHSPs) are low molecular weight chaperones that play an important role during stress response and development in all living organisms. Fruit maturation and oxidative stress can induce sHSP synthesis both in Arabidopsis and tomato plants. RNA-Seq technology is becoming widely used in various transcriptomics studies; however, analyzing and interpreting the RNA-Seq data face serious challenges. In the present work, we de novo assembled the Solanum lycopersicum transcriptome for three different maturation stages (mature green, breaker and red ripe). Differential gene expression analysis was carried out during tomato fruit development. We identified 12 sHSPs differentially expressed that might be involved in breaker and red ripe fruit maturation. Interestingly, these sHSPs have different subcellular localization and suggest a complex regulation of the fruit maturation network process.

Keywords: sHSPs, maturation, tomato, RNA-Seq, assembly

15 Classification of Multiple Cancer Types with Deep Convolutional Neural Network

Authors: Nan Deng, Zhenqiu Liu


Thousands of patients with metastatic tumors were diagnosed with cancers of unknown primary sites each year. The inability to identify the primary cancer site may lead to inappropriate treatment and unexpected prognosis. Nowadays, a large amount of genomics and transcriptomics cancer data has been generated by next-generation sequencing (NGS) technologies, and The Cancer Genome Atlas (TCGA) database has accrued thousands of human cancer tumors and healthy controls, which provides an abundance of resource to differentiate cancer types. Meanwhile, deep convolutional neural networks (CNNs) have shown high accuracy on classification among a large number of image object categories. Here, we utilize 25 cancer primary tumors and 3 normal tissues from TCGA and convert their RNA-Seq gene expression profiling to color images; train, validate and test a CNN classifier directly from these images. The performance result shows that our CNN classifier can archive >80% test accuracy on most of the tumors and normal tissues. Since the gene expression pattern of distant metastases is similar to their primary tumors, the CNN classifier may provide a potential computational strategy on identifying the unknown primary origin of metastatic cancer in order to plan appropriate treatment for patients.

Keywords: bioinformatics, cancer, convolutional neural network, deep leaning, gene expression pattern

14 Transcriptome Analysis of Protestia brevitarsis seulensis with Focus On Wing Development and Metamorphosis in Developmental Stages

Authors: Jihye Hwang, Eun Hwa Choi, Su Youn Baek, Bia Park, Gyeongmin Kim, Chorong Shin, Joon Ha Lee, Jae-Sam Hwang, Ui Wook Hwang


White-spotted flower chafers are widely distributed in Asian countries and traditionally used for the treatment of chronic fatigue, blood circulation, and paralysis in the oriental medicine field. The evolution and development of insect wings and metamorphosis remain under-discovered subjects in arthropod evolutionary researches. Gene expression abundance analyses along with developmental stages based on the large-scale RNA-seq data are also still rarely done. Here we report the de novo assembly of a Protestia brevitarsis seulensis transcriptome along four different developmental stages (egg, larva, pupa, and adult) to explore its development and evolution of wings and metamorphosis. The de novo transcriptome assembly consists of 23,551 high-quality transcripts and is approximately 96.7% complete. Out of 8,545 transcripts, 5,183 correspond to the possible orthologs with Drosophila melanogaster. As a result, we could found 265 genes related to wing development and 19 genes related to metamorphosis. The comparison of transcript expression abundance with different developmental stages revealed developmental stage-specific transcripts especially working at the stage of wing development and metamorphosis of P. b. seulensis. This transcriptome quantification along the developmental stages may provide some meaningful clues to elucidate the genetic modulation mechanism of wing development and metamorphosis obtained during the insect evolution.

Keywords: white-spotted flower chafers, transcriptomics, RNA-seq, network biology, wing development, metamorphosis

13 Role of ABC Transporters in Non-Target Site Herbicide Resistance in Black Grass (Alopecurus myosuroides)

Authors: Alina Goldberg Cavalleri, Sara Franco Ortega, Nawaporn Onkokesung, Richard Dale, Melissa Brazier-Hicks, Robert Edwards


Non-target site based resistance (NTSR) to herbicides in weeds is a polygenic trait associated with the upregulation of proteins involved in xenobiotic detoxification and translocation we have termed the xenome. Among the xenome proteins, ABC transporters play a key role in enhancing herbicide metabolism by effluxing conjugated xenobiotics from the cytoplasm into the vacuole. The importance of ABC transporters is emphasized by the fact that they often contribute to multidrug resistance in human cells and antibiotic resistance in bacteria. They also play a key role in insecticide resistance in major vectors of human diseases and crop pests. By surveying available databases, transcripts encoding ABCs have been identified as being enhanced in populations exhibiting NTSR in several weed species. Based on a transcriptomics data in black grass (Alopecurus myosuroides, Am), we have identified three proteins from the ABC-C subfamily that are upregulated in NTSR populations. ABC-C transporters are poorly characterized proteins in plants, but in Arabidopsis localize to the vacuolar membrane and have functional roles in transporting glutathionylated (GSH)-xenobiotic conjugates. We found that the up-regulation of AmABCs strongly correlates with the up-regulation of a glutathione transferase termed AmGSTU2, which can conjugate GSH to herbicides. The expression profile of the ABC transcripts was profiled in populations of black grass showing different degree of resistance to herbicides. This, together with a phylogenetic analysis, revealed that AmABCs cluster in different groups which might indicate different substrate and roles in the herbicide resistance phenotype in the different populations

Keywords: black grass, herbicide, resistance, transporters

12 Combining Transcriptomics, Bioinformatics, Biosynthesis Networks and Chromatographic Analyses for Cotton Gossypium hirsutum L. Defense Volatiles Study

Authors: Ronald Villamar-Torres, Michael Staudt, Christopher Viot


Cotton Gossypium hirsutum L. is one of the most important industrial crops, producing the world leading natural textile fiber, but is very prone to arthropod attacks that reduce crop yield and quality. Cotton cultivation, therefore, makes an outstanding use of chemical pesticides. In reaction to herbivorous arthropods, cotton plants nevertheless show natural defense reactions, in particular through volatile organic compounds (VOCs) emissions. These natural defense mechanisms are nowadays underutilized but have a very high potential for cotton cultivation, and elucidating their genetic bases will help to improve their use. Simulating herbivory attacks by mechanical wounding of cotton plants in greenhouse, we studied by qPCR the changes in gene expression for genes of the terpenoids biosynthesis pathway. Differentially expressed genes corresponded to higher levels of the terpenoids biosynthesis pathway and not to enzymes synthesizing particular terpenoids. The genes were mapped on the G. hirsutum L. reference genome; their global relationships inside the general metabolic pathways and the biosynthesis of secondary metabolites were visualized with iPath2. The chromatographic profiles of VOCs emissions indicated first monoterpenes and sesquiterpenes emissions, dominantly four molecules known to be involved in plant reactions to arthropod attacks. As a result, the study permitted to identify potential key genes for the emission of volatile terpenoids by cotton plants in reaction to an arthropod attack, opening possibilities for molecular-assisted cotton breeding in benefit of smallholder cotton growers.

Keywords: biosynthesis pathways, cotton, mechanisms of plant defense, terpenoids, volatile organic compounds

11 Impact of Totiviridae L-A dsRNA Virus on Saccharomyces Cerevisiae Host: Transcriptomic and Proteomic Approach

Authors: Juliana Lukša, Bazilė Ravoitytė, Elena Servienė, Saulius Serva


Totiviridae L-A virus is a persistent Saccharomyces cerevisiae dsRNA virus. It encodes the major structural capsid protein Gag and Gag-Pol fusion protein, responsible for virus replication and encapsulation. These features also enable the copying of satellite dsRNAs (called M dsRNAs) encoding a secreted toxin and immunity to it (known as killer toxin). Viral capsid pore presumably functions in nucleotide uptake and viral mRNA release. During cell division, sporogenesis, and cell fusion, the virions remain intracellular and are transferred to daughter cells. By employing high throughput RNA sequencing data analysis, we describe the influence of solely L-A virus on the expression of genes in three different S. cerevisiae hosts. We provide a new perception into Totiviridae L-A virus-related transcriptional regulation, encompassing multiple bioinformatics analyses. Transcriptional responses to L-A infection were similar to those induced upon stress or availability of nutrients. It also delves into the connection between the cell metabolism and L-A virus-conferred demands to the host transcriptome by uncovering host proteins that may be associated with intact virions. To better understand the virus-host interaction, we applied differential proteomic analysis of virus particle-enriched fractions of yeast strains that harboreither complete killer system (L-A-lus and M-2 virus), M-2 depleted orvirus-free. Our analysis resulted in the identification of host proteins, associated with structural proteins of the virus (Gag and Gag-Pol). This research was funded by the European Social Fund under the No.09.3.3-LMT-K-712-19-0157“Development of Competences of Scientists, other Researchers, and Students through Practical Research Activities” measure.

Keywords: totiviridae, killer virus, proteomics, transcriptomics

10 Relative Entropy Used to Determine the Divergence of Cells in Single Cell RNA Sequence Data Analysis

Authors: An Chengrui, Yin Zi, Wu Bingbing, Ma Yuanzhu, Jin Kaixiu, Chen Xiao, Ouyang Hongwei


Single cell RNA sequence (scRNA-seq) is one of the effective tools to study transcriptomics of biological processes. Recently, similarity measurement of cells is Euclidian distance or its derivatives. However, the process of scRNA-seq is a multi-variate Bernoulli event model, thus we hypothesize that it would be more efficient when the divergence between cells is valued with relative entropy than Euclidian distance. In this study, we compared the performances of Euclidian distance, Spearman correlation distance and Relative Entropy using scRNA-seq data of the early, medial and late stage of limb development generated in our lab. Relative Entropy is better than other methods according to cluster potential test. Furthermore, we developed KL-SNE, an algorithm modifying t-SNE whose definition of divergence between cells Euclidian distance to Kullback–Leibler divergence. Results showed that KL-SNE was more effective to dissect cell heterogeneity than t-SNE, indicating the better performance of relative entropy than Euclidian distance. Specifically, the chondrocyte expressing Comp was clustered together with KL-SNE but not with t-SNE. Surprisingly, cells in early stage were surrounded by cells in medial stage in the processing of KL-SNE while medial cells neighbored to late stage with the process of t-SNE. This results parallel to Heatmap which showed cells in medial stage were more heterogenic than cells in other stages. In addition, we also found that results of KL-SNE tend to follow Gaussian distribution compared with those of the t-SNE, which could also be verified with the analysis of scRNA-seq data from another study on human embryo development. Therefore, it is also an effective way to convert non-Gaussian distribution to Gaussian distribution and facilitate the subsequent statistic possesses. Thus, relative entropy is potentially a better way to determine the divergence of cells in scRNA-seq data analysis.

Keywords: Single cell RNA sequence, Similarity measurement, Relative Entropy, KL-SNE, t-SNE

9 Transcriptomine: The Nuclear Receptor Signaling Transcriptome Database

Authors: Scott A. Ochsner, Christopher M. Watkins, Apollo McOwiti, David L. Steffen Lauren B. Becnel, Neil J. McKenna


Understanding signaling by nuclear receptors (NRs) requires an appreciation of their cognate ligand- and tissue-specific transcriptomes. While target gene regulation data are abundant in this field, they reside in hundreds of discrete publications in formats refractory to routine query and analysis and, accordingly, their full value to the NR signaling community has not been realized. One of the mandates of the Nuclear Receptor Signaling Atlas (NURSA) is to facilitate access of the community to existing public datasets. Pursuant to this mandate we are developing a freely-accessible community web resource, Transcriptomine, to bring together the sum total of available expression array and RNA-Seq data points generated by the field in a single location. Transcriptomine currently contains over 25,000,000 gene fold change datapoints from over 1200 contrasts relevant to over 100 NRs, ligands and coregulators in over 200 tissues and cell lines. Transcriptomine is designed to accommodate a spectrum of end users ranging from the bench researcher to those with advanced bioinformatic training. Visualization tools allow users to build custom charts to compare and contrast patterns of gene regulation across different tissues and in response to different ligands. Our resource affords an entirely new paradigm for leveraging gene expression data in the NR signaling field, empowering users to query gene fold changes across diverse regulatory molecules, tissues and cell lines, target genes, biological functions and disease associations, and that would otherwise be prohibitive in terms of time and effort. Transcriptomine will be regularly updated with gene lists from future genome-wide expression array and expression-sequencing datasets in the NR signaling field.

Keywords: target gene database, informatics, gene expression, transcriptomics

8 Elucidating the Genetic Determinism of Seed Protein Plasticity in Response to the Environment Using Medicago truncatula

Authors: K. Cartelier, D. Aime, V. Vernoud, J. Buitink, J. M. Prosperi, K. Gallardo, C. Le Signor


Legumes can produce protein-rich seeds without nitrogen fertilizer through root symbiosis with nitrogen-fixing rhizobia. Rich in lysine, these proteins are used for human nutrition and animal feed. However, the instability of seed protein yield and quality due to environmental fluctuations limits the wider use of legumes such as pea. Breeding efforts are needed to optimize and stabilize seed nutritional value, which requires to identify the genetic determinism of seed protein plasticity in response to the environment. Towards this goal, we have studied the plasticity of protein content and composition of seeds from a collection of 200 Medicago truncatula ecotypes grown under four controlled conditions (optimal, drought, and winter/spring sowing). A quantitative analysis of one-dimensional protein profiles of these mature seeds was performed and plasticity indices were calculated from each abundant protein band. Genome-Wide Association Studies (GWAS) from these data identified major GWAS hotspots, from which a list of candidate genes was obtained. A Gene Ontology Enrichment Analysis revealed an over-representation of genes involved in several amino acid metabolic pathways. This led us to propose that environmental variations are likely to modulate amino acid balance, thus impacting seed protein composition. The selection of candidate genes for controlling the plasticity of seed protein composition was refined using transcriptomics data from developing Medicago truncatula seeds. The pea orthologs of key genes were identified for functional studies by mean of TILLING (Targeting Induced Local Lesions in Genomes) lines in this crop. We will present how this study highlighted mechanisms that could govern seed protein plasticity, providing new cues towards the stabilization of legume seed quality.

Keywords: GWAS, Medicago truncatula, plasticity, seed, storage proteins

7 The Systems Biology Verification Endeavor: Harness the Power of the Crowd to Address Computational and Biological Challenges

Authors: Stephanie Boue, Nicolas Sierro, Julia Hoeng, Manuel C. Peitsch


Systems biology relies on large numbers of data points and sophisticated methods to extract biologically meaningful signal and mechanistic understanding. For example, analyses of transcriptomics and proteomics data enable to gain insights into the molecular differences in tissues exposed to diverse stimuli or test items. Whereas the interpretation of endpoints specifically measuring a mechanism is relatively straightforward, the interpretation of big data is more complex and would benefit from comparing results obtained with diverse analysis methods. The sbv IMPROVER project was created to implement solutions to verify systems biology data, methods, and conclusions. Computational challenges leveraging the wisdom of the crowd allow benchmarking methods for specific tasks, such as signature extraction and/or samples classification. Four challenges have already been successfully conducted and confirmed that the aggregation of predictions often leads to better results than individual predictions and that methods perform best in specific contexts. Whenever the scientific question of interest does not have a gold standard, but may greatly benefit from the scientific community to come together and discuss their approaches and results, datathons are set up. The inaugural sbv IMPROVER datathon was held in Singapore on 23-24 September 2016. It allowed bioinformaticians and data scientists to consolidate their ideas and work on the most promising methods as teams, after having initially reflected on the problem on their own. The outcome is a set of visualization and analysis methods that will be shared with the scientific community via the Garuda platform, an open connectivity platform that provides a framework to navigate through different applications, databases and services in biology and medicine. We will present the results we obtained when analyzing data with our network-based method, and introduce a datathon that will take place in Japan to encourage the analysis of the same datasets with other methods to allow for the consolidation of conclusions.

Keywords: big data interpretation, datathon, systems toxicology, verification

6 Transcriptomic Analysis for Differential Expression of Genes Involved in Secondary Metabolite Production in Narcissus Bulb and in vitro Callus

Authors: Aleya Ferdausi, Meriel Jones, Anthony Halls


The Amaryllidaceae genus Narcissus contains secondary metabolites, which are important sources of bioactive compounds such as pharmaceuticals indicating that their biological activity extends from the native plant to humans. Transcriptome analysis (RNA-seq) is an effective platform for the identification and functional characterization of candidate genes as well as to identify genes encoding uncharacterized enzymes. The biotechnological production of secondary metabolites in plant cell or organ cultures has become a tempting alternative to the extraction of whole plant material. The biochemical pathways for the production of secondary metabolites require primary metabolites to undergo a series of modifications catalyzed by enzymes such as cytochrome P450s, methyltransferases, glycosyltransferases, and acyltransferases. Differential gene expression analysis of Narcissus was obtained from two conditions, i.e. field and in vitro callus. Callus was obtained from modified MS (Murashige and Skoog) media supplemented with growth regulators and twin-scale explants from Narcissus cv. Carlton bulb. A total of 2153 differentially expressed transcripts were detected in Narcissus bulb and in vitro callus, and 78.95% of those were annotated. It showed the expression of genes involved in the biosynthesis of alkaloids were present in both conditions i.e. cytochrome P450s, O-methyltransferase (OMTs), NADP/NADPH dehydrogenases or reductases, SAM-synthetases or decarboxylases, 3-ketoacyl-CoA, acyl-CoA, cinnamoyl-CoA, cinnamate 4-hydroxylase, alcohol dehydrogenase, caffeic acid, N-methyltransferase, and NADPH-cytochrome P450s. However, cytochrome P450s and OMTs involved in the later stage of Amaryllidaceae alkaloids biosynthesis were mainly up-regulated in field samples. Whereas, the enzymes involved in initial biosynthetic pathways i.e. fructose biphosphate adolase, aminotransferases, dehydrogenases, hydroxyl methyl glutarate and glutamate synthase leading to the biosynthesis of precursors; tyrosine, phenylalanine and tryptophan for secondary metabolites were up-regulated in callus. The knowledge of probable genes involved in secondary metabolism and their regulation in different tissues will provide insight into the Narcissus plant biology related to alkaloid production.

Keywords: narcissus, callus, transcriptomics, secondary metabolites

5 Comprehensive Longitudinal Multi-omic Profiling in Weight Gain and Insulin Resistance

Authors: Christine Y. Yeh, Brian D. Piening, Sarah M. Totten, Kimberly Kukurba, Wenyu Zhou, Kevin P. F. Contrepois, Gucci J. Gu, Sharon Pitteri, Michael Snyder


Three million deaths worldwide are attributed to obesity. However, the biomolecular mechanisms that describe the link between adiposity and subsequent disease states are poorly understood. Insulin resistance characterizes approximately half of obese individuals and is a major cause of obesity-mediated diseases such as Type II diabetes, hypertension and other cardiovascular diseases. This study makes use of longitudinal quantitative and high-throughput multi-omics (genomics, epigenomics, transcriptomics, glycoproteomics etc.) methodologies on blood samples to develop multigenic and multi-analyte signatures associated with weight gain and insulin resistance. Participants of this study underwent a 30-day period of weight gain via excessive caloric intake followed by a 60-day period of restricted dieting and return to baseline weight. Blood samples were taken at three different time points per patient: baseline, peak-weight and post weight loss. Patients were characterized as either insulin resistant (IR) or insulin sensitive (IS) before having their samples processed via longitudinal multi-omic technologies. This comparative study revealed a wealth of biomolecular changes associated with weight gain after using methods in machine learning, clustering, network analysis etc. Pathways of interest included those involved in lipid remodeling, acute inflammatory response and glucose metabolism. Some of these biomolecules returned to baseline levels as the patient returned to normal weight whilst some remained elevated. IR patients exhibited key differences in inflammatory response regulation in comparison to IS patients at all time points. These signatures suggest differential metabolism and inflammatory pathways between IR and IS patients. Biomolecular differences associated with weight gain and insulin resistance were identified on various levels: in gene expression, epigenetic change, transcriptional regulation and glycosylation. This study was not only able to contribute to new biology that could be of use in preventing or predicting obesity-mediated diseases, but also matured novel biomedical informatics technologies to produce and process data on many comprehensive omics levels.

Keywords: insulin resistance, multi-omics, next generation sequencing, proteogenomics, type ii diabetes

4 Luteolin Exhibits Anti-Diabetic Effects by Increasing Oxidative Capacity and Regulating Anti-Oxidant Metabolism

Authors: Eun-Young Kwon, Myung-Sook Choi, Su-Jung Cho, Ji-Young Choi, So Young Kim, Youngji Han


Overweight and obesity have been linked to a low-grade chronic inflammatory response and an increased risk of developing metabolic syndrome including insulin resistance, type 2 diabetes mellitus and certain types of cancers. Luteolin is a dietary flavonoid with anti-inflammatory, anti-oxidant, anti-cancer and anti-diabetic properties. However, little is known about the detailed mechanism associated with the effect of luteolin on inflammation-related obesity and its complications. The aim of the present study was to reveal the anti-diabetic effect of luteolin in diet-induced obesity mice using “transcriptomics” tool. Thirty-nine male C57BL/6J mice (4-week-old) were randomly divided into 3 groups and were fed normal diet, high-fat diet (HFD, 20% fat) and HFD+0.005% (w/w) luteolin for 16 weeks. Luteolin improved insulin resistance, as measured by HOMA-IR and glucose tolerance, along with preservation action of pancreatic β-cells, compared to the HFD group. Luteoiln was significantly decreased the levels of leptin and ghrelin that play a pivotal role in energy balance, and the macrophage low-grade inflammation marker sCD163 (soluble Cd antigen 163) in plasma. Activities of hepatic anti-oxidant enzymes (catalase and glutathione peroxidase) were increased, while the levels of plasma transaminase (GOT and GPT) and oxidative damage markers (hepatic mitochondria H2O2 and TBARS) were markedly decreased by luteolin supplementation. In addition, luteolin increased oxidative capacity and fatty acid utilization by presenting decrease in enzyme activities of citrate synthase, cytochrome C oxidase and β-hydroxyacyl CoA dehydrogenase and UCP3 gene expression compared to high-fat diet. Moreover, our microarray results of muscle also revealed down-regulated gene expressions associated with TCA cycle by HFD were reversed to normal level by luteolin treatment. Taken together, our results indicate that luteolin is one of bioactive components for improving insulin resistance by increasing oxidative capacity, modulating anti-oxidant metabolism and suppressing inflammatory signaling cascades in diet-induced obese mice. These results provide possible therapeutic targets for prevention and treatment of diet-induced obesity and its complications.

Keywords: anti-oxidant metabolism, diabetes, luteolin, oxidative capacity

3 Changing the Landscape of Fungal Genomics: New Trends

Authors: Igor V. Grigoriev


Understanding of biological processes encoded in fungi is instrumental in addressing future food, feed, and energy demands of the growing human population. Genomics is a powerful and quickly evolving tool to understand these processes. The Fungal Genomics Program of the US Department of Energy Joint Genome Institute (JGI) partners with researchers around the world to explore fungi in several large scale genomics projects, changing the fungal genomics landscape. The key trends of these changes include: (i) rapidly increasing scale of sequencing and analysis, (ii) developing approaches to go beyond culturable fungi and explore fungal ‘dark matter,’ or unculturables, and (iii) functional genomics and multi-omics data integration. Power of comparative genomics has been recently demonstrated in several JGI projects targeting mycorrhizae, plant pathogens, wood decay fungi, and sugar fermenting yeasts. The largest JGI project ‘1000 Fungal Genomes’ aims at exploring the diversity across the Fungal Tree of Life in order to better understand fungal evolution and to build a catalogue of genes, enzymes, and pathways for biotechnological applications. At this point, at least 65% of over 700 known families have one or more reference genomes sequenced, enabling metagenomics studies of microbial communities and their interactions with plants. For many of the remaining families no representative species are available from culture collections. To sequence genomes of unculturable fungi two approaches have been developed: (a) sequencing DNA from fruiting bodies of ‘macro’ and (b) single cell genomics using fungal spores. The latter has been tested using zoospores from the early diverging fungi and resulted in several near-complete genomes from underexplored branches of the Fungal Tree, including the first genomes of Zoopagomycotina. Genome sequence serves as a reference for transcriptomics studies, the first step towards functional genomics. In the JGI fungal mini-ENCODE project transcriptomes of the model fungus Neurospora crassa grown on a spectrum of carbon sources have been collected to build regulatory gene networks. Epigenomics is another tool to understand gene regulation and recently introduced single molecule sequencing platforms not only provide better genome assemblies but can also detect DNA modifications. For example, 6mC methylome was surveyed across many diverse fungi and the highest among Eukaryota levels of 6mC methylation has been reported. Finally, data production at such scale requires data integration to enable efficient data analysis. Over 700 fungal genomes and other -omes have been integrated in JGI MycoCosm portal and equipped with comparative genomics tools to enable researchers addressing a broad spectrum of biological questions and applications for bioenergy and biotechnology.

Keywords: fungal genomics, single cell genomics, DNA methylation, comparative genomics

2 Incorporating Spatial Transcriptome Data into Ligand-Receptor Analyses to Discover Regional Activation in Cells

Authors: Eric Bang


Interactions between receptors and ligands are crucial for many essential biological processes, including neurotransmission and metabolism. Ligand-receptor analyses that examine cell behavior and interactions often utilize cell type-specific RNA expressions from single-cell RNA sequencing (scRNA-seq) data. Using CellPhoneDB, a public repository consisting of ligands, receptors, and ligand-receptor interactions, the cell-cell interactions were explored in a specific scRNA-seq dataset from kidney tissue and portrayed the results with dot plots and heat maps. Depending on the type of cell, each ligand-receptor pair was aligned with the interacting cell type and calculated the positori probabilities of these associations, with corresponding P values reflecting average expression values between the triads and their significance. Using single-cell data (sample kidney cell references), genes in the dataset were cross-referenced with ones in the existing CellPhoneDB dataset. For example, a gene such as Pleiotrophin (PTN) present in the single-cell data also needed to be present in the CellPhoneDB dataset. Using the single-cell transcriptomics data via slide-seq and reference data, the CellPhoneDB program defines cell types and plots them in different formats, with the two main ones being dot plots and heat map plots. The dot plot displays derived measures of the cell to cell interaction scores and p values. For the dot plot, each row shows a ligand-receptor pair, and each column shows the two interacting cell types. CellPhoneDB defines interactions and interaction levels from the gene expression level, so since the p-value is on a -log10 scale, the larger dots represent more significant interactions. By performing an interaction analysis, a significant interaction was discovered for myeloid and T-cell ligand-receptor pairs, including those between Secreted Phosphoprotein 1 (SPP1) and Fibronectin 1 (FN1), which is consistent with previous findings. It was proposed that an effective protocol would involve a filtration step where cell types would be filtered out, depending on which ligand-receptor pair is activated in that part of the tissue, as well as the incorporation of the CellPhoneDB data in a streamlined workflow pipeline. The filtration step would be in the form of a Python script that expedites the manual process necessary for dataset filtration. Being in Python allows it to be integrated with the CellPhoneDB dataset for future workflow analysis. The manual process involves filtering cell types based on what ligand/receptor pair is activated in kidney cells. One limitation of this would be the fact that some pairings are activated in multiple cells at a time, so the manual manipulation of the data is reflected prior to analysis. Using the filtration script, accurate sorting is incorporated into the CellPhoneDB database rather than waiting until the output is produced and then subsequently applying spatial data. It was envisioned that this would reveal wherein the cell various ligands and receptors are interacting with different cell types, allowing for easier identification of which cells are being impacted and why, for the purpose of disease treatment. The hope is this new computational method utilizing spatially explicit ligand-receptor association data can be used to uncover previously unknown specific interactions within kidney tissue.

Keywords: bioinformatics, Ligands, kidney tissue, receptors, spatial transcriptome

1 Identification of the Target Genes to Increase the Immunotherapy Response in Bladder Cancer Patients using Computational and Experimental Approach

Authors: Sahar Nasr, Lin Li, Edwin Wang


Bladder cancer (BLCA) is known as the 13th cause of death among cancer patients worldwide, and ~575,000 new BLCA cases are diagnosed each year. Urothelial carcinoma (UC) is the most prevalent subtype among BLCA patients, which can be categorized into muscle-invasive bladder cancer (MIBC) and non-muscle-invasive bladder cancer (NMIBC). Currently, various therapeutic options are available for UC patients, including (1) transurethral resection followed by intravesical instillation of chemotherapeutics or Bacillus Calmette-Guérin for NMIBC patients, (2) neoadjuvant platinum-based chemotherapy (NAC) plus radical cystectomy is the standard of care for localized MIBC patients, and (3) systematic chemotherapy for metastatic UC. However, conventional treatments may lead to several challenges for treating patients. As an illustration, some patients may suffer from recurrence of the disease after the first line of treatment. Recently, immune checkpoint therapy (ICT) has been introduced as an alternative treatment strategy for the first or second line of treatment in advanced or metastatic BLCA patients. Although ICT showed lucrative results for a fraction of BLCA patients, ~80% of patients were not responsive to it. Therefore, novel treatment methods are required to augment the ICI response rate within BLCA patients. It has been shown that the infiltration of T-cells into the tumor microenvironment (TME) is positively correlated with the response to ICT within cancerous patients. Therefore, the goal of this study is to enhance the infiltration of cytotoxic T-cells into TME through the identification of target genes within the tumor that are responsible for the non-T-cell inflamed TME and their inhibition. BLCA bulk RNA-sequencing data from The Cancer Genome Atlas (TCGA) and immune score for TCGA samples were used to determine the Pearson correlation score between the expression of different genes and immune score for each sample. The genes with strong negative correlations were selected (r < -0.2). Thereafter, the correlation between the expression of each gene and survival in BLCA patients was calculated using the TCGA data and Cox regression method. The genes that are common in both selected gene lists were chosen for further analysis. Afterward, BLCA bulk and single-cell RNA-sequencing data were ranked based on the expression of each selected gene and the top and bottom 25% samples were used for pathway enrichment analysis. If the pathways related to the T-cell infiltration (e.g., antigen presentation, interferon, or chemokine pathways) were enriched within the low-expression group, the gene was included for downstream analysis. Finally, the selected genes will be used to calculate the correlation between their expression and the infiltration rate of the activated CD+8 T-cells, natural killer cells and the activated dendric cells. A list of potential target genes has been identified and ranked based on the above-mentioned analysis and criteria. SUN-1 got the highest score within the gene list and other identified genes in the literature as benchmarks. In conclusion, inhibition of SUN1 may increase the tumor-infiltrating lymphocytes and the efficacy of ICI in BLCA patients. BLCA tumor cells with and without SUN-1 CRISPR/Cas9 knockout will be injected into the syngeneic mouse model to validate the predicted SUN-1 effect on increasing tumor-infiltrating lymphocytes.

Keywords: data analysis, gene expression analysis, gene identification, immunoinformatic, functional genomics, transcriptomics

