Search results for: omics data analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 42066

Search results for: omics data analysis

42066 DeepOmics: Deep Learning for Understanding Genome Functioning and the Underlying Genetic Causes of Disease

Authors: Vishnu Pratap Singh Kirar, Madhuri Saxena

Abstract:

Advancement in sequence data generation technologies is churning out voluminous omics data and posing a massive challenge to annotate the biological functional features. With so much data available, the use of machine learning methods and tools to make novel inferences has become obvious. Machine learning methods have been successfully applied to a lot of disciplines, including computational biology and bioinformatics. Researchers in computational biology are interested to develop novel machine learning frameworks to classify the huge amounts of biological data. In this proposal, it plan to employ novel machine learning approaches to aid the understanding of how apparently innocuous mutations (in intergenic DNA and at synonymous sites) cause diseases. We are also interested in discovering novel functional sites in the genome and mutations in which can affect a phenotype of interest.

Keywords: genome wide association studies (GWAS), next generation sequencing (NGS), deep learning, omics

Procedia PDF Downloads 97
42065 Integrative Omics-Portrayal Disentangles Molecular Heterogeneity and Progression Mechanisms of Cancer

Authors: Binder Hans

Abstract:

Cancer is no longer seen as solely a genetic disease where genetic defects such as mutations and copy number variations affect gene regulation and eventually lead to aberrant cell functioning which can be monitored by transcriptome analysis. It has become obvious that epigenetic alterations represent a further important layer of (de-)regulation of gene activity. For example, aberrant DNA methylation is a hallmark of many cancer types, and methylation patterns were successfully used to subtype cancer heterogeneity. Hence, unraveling the interplay between different omics levels such as genome, transcriptome and epigenome is inevitable for a mechanistic understanding of molecular deregulation causing complex diseases such as cancer. This objective requires powerful downstream integrative bioinformatics methods as an essential prerequisite to discover the whole genome mutational, transcriptome and epigenome landscapes of cancer specimen and to discover cancer genesis, progression and heterogeneity. Basic challenges and tasks arise ‘beyond sequencing’ because of the big size of the data, their complexity, the need to search for hidden structures in the data, for knowledge mining to discover biological function and also systems biology conceptual models to deduce developmental interrelations between different cancer states. These tasks are tightly related to cancer biology as an (epi-)genetic disease giving rise to aberrant genomic regulation under micro-environmental control and clonal evolution which leads to heterogeneous cellular states. Machine learning algorithms such as self organizing maps (SOM) represent one interesting option to tackle these bioinformatics tasks. The SOMmethod enables recognizing complex patterns in large-scale data generated by highthroughput omics technologies. It portrays molecular phenotypes by generating individualized, easy to interpret images of the data landscape in combination with comprehensive analysis options. Our image-based, reductionist machine learning methods provide one interesting perspective how to deal with massive data in the discovery of complex diseases, gliomas, melanomas and colon cancer on molecular level. As an important new challenge, we address the combined portrayal of different omics data such as genome-wide genomic, transcriptomic and methylomic ones. The integrative-omics portrayal approach is based on the joint training of the data and it provides separate personalized data portraits for each patient and data type which can be analyzed by visual inspection as one option. The new method enables an integrative genome-wide view on the omics data types and the underlying regulatory modes. It is applied to high and low-grade gliomas and to melanomas where it disentangles transversal and longitudinal molecular heterogeneity in terms of distinct molecular subtypes and progression paths with prognostic impact.

Keywords: integrative bioinformatics, machine learning, molecular mechanisms of cancer, gliomas and melanomas

Procedia PDF Downloads 148
42064 Proposing an Architecture for Drug Response Prediction by Integrating Multiomics Data and Utilizing Graph Transformers

Authors: Nishank Raisinghani

Abstract:

Efficiently predicting drug response remains a challenge in the realm of drug discovery. To address this issue, we propose four model architectures that combine graphical representation with varying positions of multiheaded self-attention mechanisms. By leveraging two types of multi-omics data, transcriptomics and genomics, we create a comprehensive representation of target cells and enable drug response prediction in precision medicine. A majority of our architectures utilize multiple transformer models, one with a graph attention mechanism and the other with a multiheaded self-attention mechanism, to generate latent representations of both drug and omics data, respectively. Our model architectures apply an attention mechanism to both drug and multiomics data, with the goal of procuring more comprehensive latent representations. The latent representations are then concatenated and input into a fully connected network to predict the IC-50 score, a measure of cell drug response. We experiment with all four of these architectures and extract results from all of them. Our study greatly contributes to the future of drug discovery and precision medicine by looking to optimize the time and accuracy of drug response prediction.

Keywords: drug discovery, transformers, graph neural networks, multiomics

Procedia PDF Downloads 153
42063 Multi-Omics Investigation of Ferroptosis-Related Gene Expression in Ovarian Aging and the Impact of Nutritional Intervention

Authors: Chia-Jung Li, Kuan-Hao Tsui

Abstract:

As women age, the quality of their oocytes deteriorates irreversibly, leading to reduced fertility. To better understand the role of Ferroptosis-related genes in ovarian aging, we employed a multi-omics analysis approach, including spatial transcriptomics, single-cell RNA sequencing, human ovarian pathology, and clinical biopsies. Our study identified excess lipid peroxide accumulation in aging germ cells, metal ion accumulation via oxidative reduction, and the interaction between ferroptosis and cellular energy metabolism. We used multi-histological prediction of ferroptosis key genes to evaluate 75 patients with ovarian aging insufficiency and then analyzed changes in hub genes after supplementing with DHEA, Ubiquinol CoQ10, and Cleo-20 T3 for two months. Our results demonstrated a significant increase in TFRC, GPX4, NCOA4, and SLC3A2, which were consistent with our multi-component prediction. We theorized that these supplements increase the mitochondrial tricarboxylic acid cycle (TCA) or electron transport chain (ETC), thereby increasing antioxidant enzyme GPX4 levels and reducing lipid peroxide accumulation and ferroptosis. Overall, our findings suggest that supplementation intervention significantly improves IVF outcomes in senescent cells by enhancing metal ion and energy metabolism and enhancing oocyte quality in aging women.

Keywords: multi-omics, nutrients, ferroptosis, ovarian aging

Procedia PDF Downloads 103
42062 Potential Impacts of Maternal Nutrition and Selection for Residual Feed Intake on Metabolism and Fertility Parameters in Angus Bulls

Authors: Aidin Foroutan, David S. Wishart, Leluo L. Guan, Carolyn Fitzsimmons

Abstract:

Maximizing efficiency and growth potential of beef cattle requires not only genetic selection (i.e. residual feed intake (RFI)) but also adequate nutrition throughout all stages of growth and development. Nutrient restriction during gestation has been shown to negatively affect post-natal growth and development as well as fertility of the offspring. This, when combined with RFI may affect progeny traits. This study aims to investigate the impact of selection for divergent genetic potential for RFI and maternal nutrition during early- to mid-gestation, on bull calf traits such as fertility and muscle development using multiple ‘omics’ approaches. Comparisons were made between High-diet vs. Low-diet and between High-RFI vs. Low-RFI animals. An epigenetics experiment on semen samples identified 891 biomarkers associated with growth and development. A gene expression study on Longissimus thoracis muscle, semimembranosus muscle, liver, and testis identified 4 genes associated with muscle development and immunity of which Myocyte enhancer factor 2A [MEF2A; induces myogenesis and control muscle differentiation] was the only differentially expressed gene identified in all four tissues. An initial metabolomics experiment on serum samples using nuclear magnetic resonance (NMR) identified 4 metabolite biomarkers related to energy and protein metabolism. Once all the biomarkers are identified, bioinformatics approaches will be used to create a database covering all the ‘omics’ data collected from this project. This database will be broadened by adding other information obtained from relevant literature reviews. Association analyses with these data sets will be performed to reveal key biological pathways affected by RFI and maternal nutrition. Through these association studies between the genome and metabolome, it is expected that candidate biomarker genes and metabolites for feed efficiency, fertility, and/or muscle development are identified. If these gene/metabolite biomarkers are validated in a larger animal population, they could potentially be used in breeding programs to select superior animals. It is also expected that this work will lead to the development of an online tool that could be used to predict future traits of interest in an animal given its measurable ‘omics’ traits.

Keywords: biomarker, maternal nutrition, omics, residual feed intake

Procedia PDF Downloads 191
42061 Comprehensive Longitudinal Multi-omic Profiling in Weight Gain and Insulin Resistance

Authors: Christine Y. Yeh, Brian D. Piening, Sarah M. Totten, Kimberly Kukurba, Wenyu Zhou, Kevin P. F. Contrepois, Gucci J. Gu, Sharon Pitteri, Michael Snyder

Abstract:

Three million deaths worldwide are attributed to obesity. However, the biomolecular mechanisms that describe the link between adiposity and subsequent disease states are poorly understood. Insulin resistance characterizes approximately half of obese individuals and is a major cause of obesity-mediated diseases such as Type II diabetes, hypertension and other cardiovascular diseases. This study makes use of longitudinal quantitative and high-throughput multi-omics (genomics, epigenomics, transcriptomics, glycoproteomics etc.) methodologies on blood samples to develop multigenic and multi-analyte signatures associated with weight gain and insulin resistance. Participants of this study underwent a 30-day period of weight gain via excessive caloric intake followed by a 60-day period of restricted dieting and return to baseline weight. Blood samples were taken at three different time points per patient: baseline, peak-weight and post weight loss. Patients were characterized as either insulin resistant (IR) or insulin sensitive (IS) before having their samples processed via longitudinal multi-omic technologies. This comparative study revealed a wealth of biomolecular changes associated with weight gain after using methods in machine learning, clustering, network analysis etc. Pathways of interest included those involved in lipid remodeling, acute inflammatory response and glucose metabolism. Some of these biomolecules returned to baseline levels as the patient returned to normal weight whilst some remained elevated. IR patients exhibited key differences in inflammatory response regulation in comparison to IS patients at all time points. These signatures suggest differential metabolism and inflammatory pathways between IR and IS patients. Biomolecular differences associated with weight gain and insulin resistance were identified on various levels: in gene expression, epigenetic change, transcriptional regulation and glycosylation. This study was not only able to contribute to new biology that could be of use in preventing or predicting obesity-mediated diseases, but also matured novel biomedical informatics technologies to produce and process data on many comprehensive omics levels.

Keywords: insulin resistance, multi-omics, next generation sequencing, proteogenomics, type ii diabetes

Procedia PDF Downloads 429
42060 Multi-omics Integrative Analysis with Genome-Scale Metabolic Model Simulation Reveals Reaction Essentiality data in Human Astrocytes Under the Lipotoxic Effect of Palmitic Acid

Authors: Janneth Gonzalez, Andres Pinzon Velasco, Maria Angarita, Nicolas Mendoza

Abstract:

Astrocytes play an important role in various processes in the brain, including pathological conditions such as neurodegenerative diseases. Recent studies have shown that the increase in saturated fatty acids such as palmitic acid (PA) triggers pro-inflammatory pathways in the brain. The use of synthetic neurosteroids such as tibolone has demonstrated neuro-protective mechanisms. However, there are few studies on the neuro-protective mechanisms of tibolone, especially at the systemic (omic) level. In this study, we performed the integration of multi-omic data (transcriptome and proteome) into a human astrocyte genomic scale metabolic model to study the astrocytic response during palmitate treatment. We evaluated metabolic fluxes in three scenarios (healthy, induced inflammation by PA, and tibolone treatment under PA inflammation). We also use control theory to identify those reactions that control the astrocytic system. Our results suggest that PA generates a modulation of central and secondary metabolism, showing a change in energy source use through inhibition of folate cycle and fatty acid β-oxidation and upregulation of ketone bodies formation.We found 25 metabolic switches under PA-mediated cellular regulation, 9 of which were critical only in the inflammatory scenario but not in the protective tibolone one. Within these reactions, inhibitory, total, and directional coupling profiles were key findings, playing a fundamental role in the (de)regulation in metabolic pathways that increase neurotoxicity and represent potential treatment targets. Finally, this study framework facilitates the understanding of metabolic regulation strategies, andit can be used for in silico exploring the mechanisms of astrocytic cell regulation, directing a more complex future experimental work in neurodegenerative diseases.

Keywords: astrocytes, data integration, palmitic acid, computational model, multi-omics, control theory

Procedia PDF Downloads 121
42059 A Single Cell Omics Experiments as Tool for Benchmarking Bioinformatics Oncology Data Analysis Tools

Authors: Maddalena Arigoni, Maria Luisa Ratto, Raffaele A. Calogero, Luca Alessandri

Abstract:

The presence of tumor heterogeneity, where distinct cancer cells exhibit diverse morphological and phenotypic profiles, including gene expression, metabolism, and proliferation, poses challenges for molecular prognostic markers and patient classification for targeted therapies. Understanding the causes and progression of cancer requires research efforts aimed at characterizing heterogeneity, which can be facilitated by evolving single-cell sequencing technologies. However, analyzing single-cell data necessitates computational methods that often lack objective validation. Therefore, the establishment of benchmarking datasets is necessary to provide a controlled environment for validating bioinformatics tools in the field of single-cell oncology. Benchmarking bioinformatics tools for single-cell experiments can be costly due to the high expense involved. Therefore, datasets used for benchmarking are typically sourced from publicly available experiments, which often lack a comprehensive cell annotation. This limitation can affect the accuracy and effectiveness of such experiments as benchmarking tools. To address this issue, we introduce omics benchmark experiments designed to evaluate bioinformatics tools to depict the heterogeneity in single-cell tumor experiments. We conducted single-cell RNA sequencing on six lung cancer tumor cell lines that display resistant clones upon treatment of EGFR mutated tumors and are characterized by driver genes, namely ROS1, ALK, HER2, MET, KRAS, and BRAF. These driver genes are associated with downstream networks controlled by EGFR mutations, such as JAK-STAT, PI3K-AKT-mTOR, and MEK-ERK. The experiment also featured an EGFR-mutated cell line. Using 10XGenomics platform with cellplex technology, we analyzed the seven cell lines together with a pseudo-immunological microenvironment consisting of PBMC cells labeled with the Biolegend TotalSeq™-B Human Universal Cocktail (CITEseq). This technology allowed for independent labeling of each cell line and single-cell analysis of the pooled seven cell lines and the pseudo-microenvironment. The data generated from the aforementioned experiments are available as part of an online tool, which allows users to define cell heterogeneity and generates count tables as an output. The tool provides the cell line derivation for each cell and cell annotations for the pseudo-microenvironment based on CITEseq data by an experienced immunologist. Additionally, we created a range of pseudo-tumor tissues using different ratios of the aforementioned cells embedded in matrigel. These tissues were analyzed using 10XGenomics (FFPE samples) and Curio Bioscience (fresh frozen samples) platforms for spatial transcriptomics, further expanding the scope of our benchmark experiments. The benchmark experiments we conducted provide a unique opportunity to evaluate the performance of bioinformatics tools for detecting and characterizing tumor heterogeneity at the single-cell level. Overall, our experiments provide a controlled and standardized environment for assessing the accuracy and robustness of bioinformatics tools for studying tumor heterogeneity at the single-cell level, which can ultimately lead to more precise and effective cancer diagnosis and treatment.

Keywords: single cell omics, benchmark, spatial transcriptomics, CITEseq

Procedia PDF Downloads 117
42058 Combined Proteomic and Metabolomic Analysis Approaches to Investigate the Modification in the Proteome and Metabolome of in vitro Models Treated with Gold Nanoparticles (AuNPs)

Authors: H. Chassaigne, S. Gioria, J. Lobo Vicente, D. Carpi, P. Barboro, G. Tomasi, A. Kinsner-Ovaskainen, F. Rossi

Abstract:

Emerging approaches in the area of exposure to nanomaterials and assessment of human health effects combine the use of in vitro systems and analytical techniques to study the perturbation of the proteome and/or the metabolome. We investigated the modification in the cytoplasmic compartment of the Balb/3T3 cell line exposed to gold nanoparticles. On one hand, the proteomic approach is quite standardized even if it requires precautions when dealing with in vitro systems. On the other hand, metabolomic analysis is challenging due to the chemical diversity of cellular metabolites that complicate data elaboration and interpretation. Differentially expressed proteins were found to cover a range of functions including stress response, cell metabolism, cell growth and cytoskeleton organization. In addition, de-regulated metabolites were annotated using the HMDB database. The "omics" fields hold huge promises in the interaction of nanoparticles with biological systems. The combination of proteomics and metabolomics data is possible however challenging.

Keywords: data processing, gold nanoparticles, in vitro systems, metabolomics, proteomics

Procedia PDF Downloads 503
42057 TAXAPRO, A Streamlined Pipeline to Analyze Shotgun Metagenomes

Authors: Sofia Sehli, Zainab El Ouafi, Casey Eddington, Soumaya Jbara, Kasambula Arthur Shem, Islam El Jaddaoui, Ayorinde Afolayan, Olaitan I. Awe, Allissa Dillman, Hassan Ghazal

Abstract:

The ability to promptly sequence whole genomes at a relatively low cost has revolutionized the way we study the microbiome. Microbiologists are no longer limited to studying what can be grown in a laboratory and instead are given the opportunity to rapidly identify the makeup of microbial communities in a wide variety of environments. Analyzing whole genome sequencing (WGS) data is a complex process that involves multiple moving parts and might be rather unintuitive for scientists that don’t typically work with this type of data. Thus, to help lower the barrier for less-computationally inclined individuals, TAXAPRO was developed at the first Omics Codeathon held virtually by the African Society for Bioinformatics and Computational Biology (ASBCB) in June 2021. TAXAPRO is an advanced metagenomics pipeline that accurately assembles organelle genomes from whole-genome sequencing data. TAXAPRO seamlessly combines WGS analysis tools to create a pipeline that automatically processes raw WGS data and presents organism abundance information in both a tabular and graphical format. TAXAPRO was evaluated using COVID-19 patient gut microbiome data. Analysis performed by TAXAPRO demonstrated a high abundance of Clostridia and Bacteroidia genera and a low abundance of Proteobacteria genera relative to others in the gut microbiome of patients hospitalized with COVID-19, consistent with the original findings derived using a different analysis methodology. This provides crucial evidence that the TAXAPRO workflow dispenses reliable organism abundance information overnight without the hassle of performing the analysis manually.

Keywords: metagenomics, shotgun metagenomic sequence analysis, COVID-19, pipeline, bioinformatics

Procedia PDF Downloads 221
42056 Multi-Omics Integrative Analysis Coupled to Control Theory and Computational Simulation of a Genome-Scale Metabolic Model Reveal Controlling Biological Switches in Human Astrocytes under Palmitic Acid-Induced Lipotoxicity

Authors: Janneth Gonzalez, Andrés Pinzon Velasco, Maria Angarita

Abstract:

Astrocytes play an important role in various processes in the brain, including pathological conditions such as neurodegenerative diseases. Recent studies have shown that the increase in saturated fatty acids such as palmitic acid (PA) triggers pro-inflammatorypathways in the brain. The use of synthetic neurosteroids such as tibolone has demonstrated neuro-protective mechanisms. However, broad studies with a systemic point of view on the neurodegenerative role of PA and the neuro-protective mechanisms of tibolone are lacking. In this study, we performed the integration of multi-omic data (transcriptome and proteome) into a human astrocyte genomic scale metabolic model to study the astrocytic response during palmitate treatment. We evaluated metabolic fluxes in three scenarios (healthy, induced inflammation by PA, and tibolone treatment under PA inflammation). We also applied a control theory approach to identify those reactions that exert more control in the astrocytic system. Our results suggest that PA generates a modulation of central and secondary metabolism, showing a switch in energy source use through inhibition of folate cycle and fatty acid β‐oxidation and upregulation of ketone bodies formation. We found 25 metabolic switches under PA‐mediated cellular regulation, 9 of which were critical only in the inflammatory scenario but not in the protective tibolone one. Within these reactions, inhibitory, total, and directional coupling profiles were key findings, playing a fundamental role in the (de)regulation of metabolic pathways that may increase neurotoxicity and represent potential treatment targets. Finally, the overall framework of our approach facilitates the understanding of complex metabolic regulation, and it can be used for in silico exploration of the mechanisms of astrocytic cell regulation, directing a more complex future experimental work in neurodegenerative diseases.

Keywords: astrocytes, data integration, palmitic acid, computational model, multi-omics

Procedia PDF Downloads 97
42055 Using OMICs Approaches to Investigate Venomic Insights into the Spider Web Silk

Authors: Franciele G. Esteves, Jose R. A. dos Santos-Pinto, Caroline L. de Souza, Mario S. Palma

Abstract:

Orb-weaving spiders use a very strong, stickiness, and elastic web to catch the prey. These web properties would be enough for the entrapment of prey; however, these spiders may be hiding venomous secrets on the web, which are being revealed now. Here we provide strong proteome, peptidome, and transcriptomic evidence for the presence of toxic components on the web silk from Nephila clavipes. Our scientific outcomes revealed, both in the web silk and in the silk-producing glands, a wide diversity of toxins/neurotoxins, defensins, and proteolytic enzymes. These toxins/neurotoxins are similar to toxins isolated from animal venoms, such as Sphigomyelinase D, Latrotoxins, Zodatoxins, Ctenitoxin Pn and Pk, Agatoxins and Theraphotoxin. Moreover, the insect-toxicity results with the web silk crude extract demonstrated that these toxic components can be lethal and/or cause paralytic effects to the prey. Therefore, through OMICs approaches, the results presented until now may contribute to a better understanding of the chemical and ecological interaction of these compounds in insect-prey capture by spider web N. clavipes, demonstrating that the web is not only a simple mechanical tool but has a chemical-active involvement in prey capture. Moreover, the results can also contribute to future studies of possible development of a selective insecticide or even in possible pharmacological applications.

Keywords: web silk toxins, silk-produncing glands, de novo transcriptome assembly, LCMS-based proteomics

Procedia PDF Downloads 135
42054 Data Transformations in Data Envelopment Analysis

Authors: Mansour Mohammadpour

Abstract:

Data transformation refers to the modification of any point in a data set by a mathematical function. When applying transformations, the measurement scale of the data is modified. Data transformations are commonly employed to turn data into the appropriate form, which can serve various functions in the quantitative analysis of the data. This study addresses the investigation of the use of data transformations in Data Envelopment Analysis (DEA). Although data transformations are important options for analysis, they do fundamentally alter the nature of the variable, making the interpretation of the results somewhat more complex.

Keywords: data transformation, data envelopment analysis, undesirable data, negative data

Procedia PDF Downloads 20
42053 Sportomics Analysis of Metabolic Responses in Olympic Sprint Canoeists

Authors: A. Magno-França, A. M. Magalhães-Neto, F. Bachini, E. Cataldi, A. Bassini, L. C. Cameron

Abstract:

Sprint canoeing (SC) is part of the Olympic Games since 1936. Athletes compete in solo or double races of 200m and 1000m (40 sec and 240 sec, respectively). Due to its high intensity and duration, SC is extremely useful to study the blood kinetics of some metabolites in high energetic demand. Sportomics is a field of study combining “-omics” sciences with classical biochemical analyses in order to understand sports induced systemic changes. Here, we compare Sportomics findings during SC training sessions to describe metabolic responses of five top-level canoeists. Five Olympic world-class male athletes were evaluated during two days of training.

Keywords: biochemistry of exercise, metabolomics, injury markers, sportomics

Procedia PDF Downloads 516
42052 Network Analysis to Reveal Microbial Community Dynamics in the Coral Reef Ocean

Authors: Keigo Ide, Toru Maruyama, Michihiro Ito, Hiroyuki Fujimura, Yoshikatu Nakano, Shoichiro Suda, Sachiyo Aburatani, Haruko Takeyama

Abstract:

Understanding environmental system is one of the important tasks. In recent years, conservation of coral environments has been focused for biodiversity issues. The damage of coral reef under environmental impacts has been observed worldwide. However, the casual relationship between damage of coral and environmental impacts has not been clearly understood. On the other hand, structure/diversity of marine bacterial community may be relatively robust under the certain strength of environmental impact. To evaluate the coral environment conditions, it is necessary to investigate relationship between marine bacterial composition in coral reef and environmental factors. In this study, the Time Scale Network Analysis was developed and applied to analyze the marine environmental data for investigating the relationship among coral, bacterial community compositions and environmental factors. Seawater samples were collected fifteen times from November 2014 to May 2016 at two locations, Ishikawabaru and South of Sesoko in Sesoko Island, Okinawa. The physicochemical factors such as temperature, photosynthetic active radiation, dissolved oxygen, turbidity, pH, salinity, chlorophyll, dissolved organic matter and depth were measured at the coral reef area. Metagenome and metatranscriptome in seawater of coral reef were analyzed as the biological factors. Metagenome data was used to clarify marine bacterial community composition. In addition, functional gene composition was estimated from metatranscriptome. For speculating the relationships between physicochemical and biological factors, cross-correlation analysis was applied to time scale data. Even though cross-correlation coefficients usually include the time precedence information, it also included indirect interactions between the variables. To elucidate the direct regulations between both factors, partial correlation coefficients were combined with cross correlation. This analysis was performed against all parameters such as the bacterial composition, the functional gene composition and the physicochemical factors. As the results, time scale network analysis revealed the direct regulation of seawater temperature by photosynthetic active radiation. In addition, concentration of dissolved oxygen regulated the value of chlorophyll. Some reasonable regulatory relationships between environmental factors indicate some part of mechanisms in coral reef area.

Keywords: coral environment, marine microbiology, network analysis, omics data analysis

Procedia PDF Downloads 254
42051 Medicompills Architecture: A Mathematical Precise Tool to Reduce the Risk of Diagnosis Errors on Precise Medicine

Authors: Adriana Haulica

Abstract:

Powered by Machine Learning, Precise medicine is tailored by now to use genetic and molecular profiling, with the aim of optimizing the therapeutic benefits for cohorts of patients. As the majority of Machine Language algorithms come from heuristics, the outputs have contextual validity. This is not very restrictive in the sense that medicine itself is not an exact science. Meanwhile, the progress made in Molecular Biology, Bioinformatics, Computational Biology, and Precise Medicine, correlated with the huge amount of human biology data and the increase in computational power, opens new healthcare challenges. A more accurate diagnosis is needed along with real-time treatments by processing as much as possible from the available information. The purpose of this paper is to present a deeper vision for the future of Artificial Intelligence in Precise medicine. In fact, actual Machine Learning algorithms use standard mathematical knowledge, mostly Euclidian metrics and standard computation rules. The loss of information arising from the classical methods prevents obtaining 100% evidence on the diagnosis process. To overcome these problems, we introduce MEDICOMPILLS, a new architectural concept tool of information processing in Precise medicine that delivers diagnosis and therapy advice. This tool processes poly-field digital resources: global knowledge related to biomedicine in a direct or indirect manner but also technical databases, Natural Language Processing algorithms, and strong class optimization functions. As the name suggests, the heart of this tool is a compiler. The approach is completely new, tailored for omics and clinical data. Firstly, the intrinsic biological intuition is different from the well-known “a needle in a haystack” approach usually used when Machine Learning algorithms have to process differential genomic or molecular data to find biomarkers. Also, even if the input is seized from various types of data, the working engine inside the MEDICOMPILLS does not search for patterns as an integrative tool. This approach deciphers the biological meaning of input data up to the metabolic and physiologic mechanisms, based on a compiler with grammars issued from bio-algebra-inspired mathematics. It translates input data into bio-semantic units with the help of contextual information iteratively until Bio-Logical operations can be performed on the base of the “common denominator “rule. The rigorousness of MEDICOMPILLS comes from the structure of the contextual information on functions, built to be analogous to mathematical “proofs”. The major impact of this architecture is expressed by the high accuracy of the diagnosis. Detected as a multiple conditions diagnostic, constituted by some main diseases along with unhealthy biological states, this format is highly suitable for therapy proposal and disease prevention. The use of MEDICOMPILLS architecture is highly beneficial for the healthcare industry. The expectation is to generate a strategic trend in Precise medicine, making medicine more like an exact science and reducing the considerable risk of errors in diagnostics and therapies. The tool can be used by pharmaceutical laboratories for the discovery of new cures. It will also contribute to better design of clinical trials and speed them up.

Keywords: bio-semantic units, multiple conditions diagnosis, NLP, omics

Procedia PDF Downloads 70
42050 A Data Envelopment Analysis Model in a Multi-Objective Optimization with Fuzzy Environment

Authors: Michael Gidey Gebru

Abstract:

Most of Data Envelopment Analysis models operate in a static environment with input and output parameters that are chosen by deterministic data. However, due to ambiguity brought on shifting market conditions, input and output data are not always precisely gathered in real-world scenarios. Fuzzy numbers can be used to address this kind of ambiguity in input and output data. Therefore, this work aims to expand crisp Data Envelopment Analysis into Data Envelopment Analysis with fuzzy environment. In this study, the input and output data are regarded as fuzzy triangular numbers. Then, the Data Envelopment Analysis model with fuzzy environment is solved using a multi-objective method to gauge the Decision Making Units' efficiency. Finally, the developed Data Envelopment Analysis model is illustrated with an application on real data 50 educational institutions.

Keywords: efficiency, Data Envelopment Analysis, fuzzy, higher education, input, output

Procedia PDF Downloads 57
42049 The Application of Data Mining Technology in Building Energy Consumption Data Analysis

Authors: Liang Zhao, Jili Zhang, Chongquan Zhong

Abstract:

Energy consumption data, in particular those involving public buildings, are impacted by many factors: the building structure, climate/environmental parameters, construction, system operating condition, and user behavior patterns. Traditional methods for data analysis are insufficient. This paper delves into the data mining technology to determine its application in the analysis of building energy consumption data including energy consumption prediction, fault diagnosis, and optimal operation. Recent literature are reviewed and summarized, the problems faced by data mining technology in the area of energy consumption data analysis are enumerated, and research points for future studies are given.

Keywords: data mining, data analysis, prediction, optimization, building operational performance

Procedia PDF Downloads 852
42048 Multi-Source Data Fusion for Urban Comprehensive Management

Authors: Bolin Hua

Abstract:

In city governance, various data are involved, including city component data, demographic data, housing data and all kinds of business data. These data reflects different aspects of people, events and activities. Data generated from various systems are different in form and data source are different because they may come from different sectors. In order to reflect one or several facets of an event or rule, data from multiple sources need fusion together. Data from different sources using different ways of collection raised several issues which need to be resolved. Problem of data fusion include data update and synchronization, data exchange and sharing, file parsing and entry, duplicate data and its comparison, resource catalogue construction. Governments adopt statistical analysis, time series analysis, extrapolation, monitoring analysis, value mining, scenario prediction in order to achieve pattern discovery, law verification, root cause analysis and public opinion monitoring. The result of Multi-source data fusion is to form a uniform central database, which includes people data, location data, object data, and institution data, business data and space data. We need to use meta data to be referred to and read when application needs to access, manipulate and display the data. A uniform meta data management ensures effectiveness and consistency of data in the process of data exchange, data modeling, data cleansing, data loading, data storing, data analysis, data search and data delivery.

Keywords: multi-source data fusion, urban comprehensive management, information fusion, government data

Procedia PDF Downloads 393
42047 Metabolic Profiling in Breast Cancer Applying Micro-Sampling of Biological Fluids and Analysis by Gas Chromatography – Mass Spectrometry

Authors: Mónica P. Cala, Juan S. Carreño, Roland J.W. Meesters

Abstract:

Recently, collection of biological fluids on special filter papers has become a popular micro-sampling technique. Especially, the dried blood spot (DBS) micro-sampling technique has gained much attention and is momently applied in various life sciences reserach areas. As a result of this popularity, DBS are not only intensively competing with the venous blood sampling method but are at this moment widely applied in numerous bioanalytical assays. In particular, in the screening of inherited metabolic diseases, pharmacokinetic modeling and in therapeutic drug monitoring. Recently, microsampling techniques were also introduced in “omics” areas, whereunder metabolomics. For a metabolic profiling study we applied micro-sampling of biological fluids (blood and plasma) from healthy controls and from women with breast cancer. From blood samples, dried blood and plasma samples were prepared by spotting 8uL sample onto pre-cutted 5-mm paper disks followed by drying of the disks for 100 minutes. Dried disks were then extracted by 100 uL of methanol. From liquid blood and plasma samples 40 uL were deproteinized with methanol followed by centrifugation and collection of supernatants. Supernatants and extracts were evaporated until dryness by nitrogen gas and residues derivated by O-methyxyamine and MSTFA. As internal standard C17:0-methylester in heptane (10 ppm) was used. Deconvolution and alignment of and full scan (m/z 50-500) MS data were done by AMDIS and SpectConnect (http://spectconnect.mit.edu) software, respectively. Statistical Data analysis was done by Principal Component Analysis (PCA) using R software. The results obtained from our preliminary study indicate that the use of dried blood/plasma on paper disks could be a powerful new tool in metabolic profiling. Many of the metabolites observed in plasma (liquid/dried) were also positively identified in whole blood samples (liquid/dried). Whole blood could be a potential substitute matrix for plasma in Metabolomic profiling studies as well also micro-sampling techniques for the collection of samples in clinical studies. It was concluded that the separation of the different sample methodologies (liquid vs. dried) as observed by PCA was due to different sample treatment protocols applied. More experiments need to be done to confirm obtained observations as well also a more rigorous validation .of these micro-sampling techniques is needed. The novelty of our approach can be found in the application of different biological fluid micro-sampling techniques for metabolic profiling.

Keywords: biofluids, breast cancer, metabolic profiling, micro-sampling

Procedia PDF Downloads 411
42046 Implementation of an IoT Sensor Data Collection and Analysis Library

Authors: Jihyun Song, Kyeongjoo Kim, Minsoo Lee

Abstract:

Due to the development of information technology and wireless Internet technology, various data are being generated in various fields. These data are advantageous in that they provide real-time information to the users themselves. However, when the data are accumulated and analyzed, more various information can be extracted. In addition, development and dissemination of boards such as Arduino and Raspberry Pie have made it possible to easily test various sensors, and it is possible to collect sensor data directly by using database application tools such as MySQL. These directly collected data can be used for various research and can be useful as data for data mining. However, there are many difficulties in using the board to collect data, and there are many difficulties in using it when the user is not a computer programmer, or when using it for the first time. Even if data are collected, lack of expert knowledge or experience may cause difficulties in data analysis and visualization. In this paper, we aim to construct a library for sensor data collection and analysis to overcome these problems.

Keywords: clustering, data mining, DBSCAN, k-means, k-medoids, sensor data

Procedia PDF Downloads 378
42045 Customer Data Analysis Model Using Business Intelligence Tools in Telecommunication Companies

Authors: Monica Lia

Abstract:

This article presents a customer data analysis model using business intelligence tools for data modelling, transforming, data visualization and dynamic reports building. Economic organizational customer’s analysis is made based on the information from the transactional systems of the organization. The paper presents how to develop the data model starting for the data that companies have inside their own operational systems. The owned data can be transformed into useful information about customers using business intelligence tool. For a mature market, knowing the information inside the data and making forecast for strategic decision become more important. Business Intelligence tools are used in business organization as support for decision-making.

Keywords: customer analysis, business intelligence, data warehouse, data mining, decisions, self-service reports, interactive visual analysis, and dynamic dashboards, use cases diagram, process modelling, logical data model, data mart, ETL, star schema, OLAP, data universes

Procedia PDF Downloads 430
42044 Predictive Analysis for Big Data: Extension of Classification and Regression Trees Algorithm

Authors: Ameur Abdelkader, Abed Bouarfa Hafida

Abstract:

Since its inception, predictive analysis has revolutionized the IT industry through its robustness and decision-making facilities. It involves the application of a set of data processing techniques and algorithms in order to create predictive models. Its principle is based on finding relationships between explanatory variables and the predicted variables. Past occurrences are exploited to predict and to derive the unknown outcome. With the advent of big data, many studies have suggested the use of predictive analytics in order to process and analyze big data. Nevertheless, they have been curbed by the limits of classical methods of predictive analysis in case of a large amount of data. In fact, because of their volumes, their nature (semi or unstructured) and their variety, it is impossible to analyze efficiently big data via classical methods of predictive analysis. The authors attribute this weakness to the fact that predictive analysis algorithms do not allow the parallelization and distribution of calculation. In this paper, we propose to extend the predictive analysis algorithm, Classification And Regression Trees (CART), in order to adapt it for big data analysis. The major changes of this algorithm are presented and then a version of the extended algorithm is defined in order to make it applicable for a huge quantity of data.

Keywords: predictive analysis, big data, predictive analysis algorithms, CART algorithm

Procedia PDF Downloads 142
42043 Exploring Emerging Viruses From a Protected Reserve

Authors: Nemat Sokhandan Bashir

Abstract:

Threats from viruses to agricultural crops could be even larger than the losses caused by the other pathogens because, in many cases, the viral infection is latent but crucial from an epidemic point of view. Wild vegetation can be a source of many viruses that eventually find their destiny in crop plants. Although often asymptomatic in wild plants due to adaptation, they can potentially cause serious losses in crops. Therefore, exploring viruses in wild vegetation is very important. Recently, omics have been quite useful for exploring plant viruses from various plant sources, especially wild vegetation. For instance, we have discovered viruses such as Ambrossia asymptomatic virus I (AAV-1) through the application of metagenomics from Oklahoma Prairie Reserve. Accordingly, extracts from randomly-sampled plants are subjected to high speed and ultracentrifugation to separated virus-like particles (VLP), then nucleic acids in the form of DNA or RNA are extracted from such VLPs by treatment with phenol—chloroform and subsequent precipitation by ethanol. The nucleic acid preparations are separately treated with RNAse or DNAse in order to determine the genome component of VLPs. In the case of RNAs, the complementary cDNAs are synthesized before submitting to DNA sequencing. However, for VLPs with DNA contents, the procedure would be relatively straightforward without making cDNA. Because the length of the nucleic acid content of VPLs can be different, various strategies are employed to achieve sequencing. Techniques similar to so-called "chromosome walking" may be used to achieve sequences of long segments. When the nucleotide sequence data were obtained, they were subjected to BLAST analysis to determine the most related previously reported virus sequences. In one case, we determined that the novel virus was AAV-l because the sequence comparison and analysis revealed that the reads were the closest to the Indian citrus ringspot virus (ICRSV). AAV—l had an RNA genome with 7408 nucleotides in length and contained six open reading frames (ORFs). Based on phylogenies inferred from the replicase and coat protein ORFs of the virus, it was placed in the genus Mandarivirus.

Keywords: wild, plant, novel, metagenomics

Procedia PDF Downloads 80
42042 Quantified Metabolomics for the Determination of Phenotypes and Biomarkers across Species in Health and Disease

Authors: Miroslava Cuperlovic-Culf, Lipu Wang, Ketty Boyle, Nadine Makley, Ian Burton, Anissa Belkaid, Mohamed Touaibia, Marc E. Surrette

Abstract:

Metabolic changes are one of the major factors in the development of a variety of diseases in various species. Metabolism of agricultural plants is altered the following infection with pathogens sometimes contributing to resistance. At the same time, pathogens use metabolites for infection and progression. In humans, metabolism is a hallmark of cancer development for example. Quantified metabolomics data combined with other omics or clinical data and analyzed using various unsupervised and supervised methods can lead to better diagnosis and prognosis. It can also provide information about resistance as well as contribute knowledge of compounds significant for disease progression or prevention. In this work, different methods for metabolomics quantification and analysis from Nuclear Magnetic Resonance (NMR) measurements that are used for investigation of disease development in wheat and human cells will be presented. One-dimensional 1H NMR spectra are used extensively for metabolic profiling due to their high reliability, wide range of applicability, speed, trivial sample preparation and low cost. This presentation will describe a new method for metabolite quantification from NMR data that combines alignment of spectra of standards to sample spectra followed by multivariate linear regression optimization of spectra of assigned metabolites to samples’ spectra. Several different alignment methods were tested and multivariate linear regression result has been compared with other quantification methods. Quantified metabolomics data can be analyzed in the variety of ways and we will present different clustering methods used for phenotype determination, network analysis providing knowledge about the relationships between metabolites through metabolic network as well as biomarker selection providing novel markers. These analysis methods have been utilized for the investigation of fusarium head blight resistance in wheat cultivars as well as analysis of the effect of estrogen receptor and carbonic anhydrase activation and inhibition on breast cancer cell metabolism. Metabolic changes in spikelet’s of wheat cultivars FL62R1, Stettler, MuchMore and Sumai3 following fusarium graminearum infection were explored. Extensive 1D 1H and 2D NMR measurements provided information for detailed metabolite assignment and quantification leading to possible metabolic markers discriminating resistance level in wheat subtypes. Quantification data is compared to results obtained using other published methods. Fusarium infection induced metabolic changes in different wheat varieties are discussed in the context of metabolic network and resistance. Quantitative metabolomics has been used for the investigation of the effect of targeted enzyme inhibition in cancer. In this work, the effect of 17 β -estradiol and ferulic acid on metabolism of ER+ breast cancer cells has been compared to their effect on ER- control cells. The effect of the inhibitors of carbonic anhydrase on the observed metabolic changes resulting from ER activation has also been determined. Metabolic profiles were studied using 1D and 2D metabolomic NMR experiments, combined with the identification and quantification of metabolites, and the annotation of the results is provided in the context of biochemical pathways.

Keywords: metabolic biomarkers, metabolic network, metabolomics, multivariate linear regression, NMR quantification, quantified metabolomics, spectral alignment

Procedia PDF Downloads 338
42041 Changing the Landscape of Fungal Genomics: New Trends

Authors: Igor V. Grigoriev

Abstract:

Understanding of biological processes encoded in fungi is instrumental in addressing future food, feed, and energy demands of the growing human population. Genomics is a powerful and quickly evolving tool to understand these processes. The Fungal Genomics Program of the US Department of Energy Joint Genome Institute (JGI) partners with researchers around the world to explore fungi in several large scale genomics projects, changing the fungal genomics landscape. The key trends of these changes include: (i) rapidly increasing scale of sequencing and analysis, (ii) developing approaches to go beyond culturable fungi and explore fungal ‘dark matter,’ or unculturables, and (iii) functional genomics and multi-omics data integration. Power of comparative genomics has been recently demonstrated in several JGI projects targeting mycorrhizae, plant pathogens, wood decay fungi, and sugar fermenting yeasts. The largest JGI project ‘1000 Fungal Genomes’ aims at exploring the diversity across the Fungal Tree of Life in order to better understand fungal evolution and to build a catalogue of genes, enzymes, and pathways for biotechnological applications. At this point, at least 65% of over 700 known families have one or more reference genomes sequenced, enabling metagenomics studies of microbial communities and their interactions with plants. For many of the remaining families no representative species are available from culture collections. To sequence genomes of unculturable fungi two approaches have been developed: (a) sequencing DNA from fruiting bodies of ‘macro’ and (b) single cell genomics using fungal spores. The latter has been tested using zoospores from the early diverging fungi and resulted in several near-complete genomes from underexplored branches of the Fungal Tree, including the first genomes of Zoopagomycotina. Genome sequence serves as a reference for transcriptomics studies, the first step towards functional genomics. In the JGI fungal mini-ENCODE project transcriptomes of the model fungus Neurospora crassa grown on a spectrum of carbon sources have been collected to build regulatory gene networks. Epigenomics is another tool to understand gene regulation and recently introduced single molecule sequencing platforms not only provide better genome assemblies but can also detect DNA modifications. For example, 6mC methylome was surveyed across many diverse fungi and the highest among Eukaryota levels of 6mC methylation has been reported. Finally, data production at such scale requires data integration to enable efficient data analysis. Over 700 fungal genomes and other -omes have been integrated in JGI MycoCosm portal and equipped with comparative genomics tools to enable researchers addressing a broad spectrum of biological questions and applications for bioenergy and biotechnology.

Keywords: fungal genomics, single cell genomics, DNA methylation, comparative genomics

Procedia PDF Downloads 208
42040 Big Data Analysis with RHadoop

Authors: Ji Eun Shin, Byung Ho Jung, Dong Hoon Lim

Abstract:

It is almost impossible to store or analyze big data increasing exponentially with traditional technologies. Hadoop is a new technology to make that possible. R programming language is by far the most popular statistical tool for big data analysis based on distributed processing with Hadoop technology. With RHadoop that integrates R and Hadoop environment, we implemented parallel multiple regression analysis with different sizes of actual data. Experimental results showed our RHadoop system was much faster as the number of data nodes increases. We also compared the performance of our RHadoop with lm function and big lm packages available on big memory. The results showed that our RHadoop was faster than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases.

Keywords: big data, Hadoop, parallel regression analysis, R, RHadoop

Procedia PDF Downloads 437
42039 Incremental Learning of Independent Topic Analysis

Authors: Takahiro Nishigaki, Katsumi Nitta, Takashi Onoda

Abstract:

In this paper, we present a method of applying Independent Topic Analysis (ITA) to increasing the number of document data. The number of document data has been increasing since the spread of the Internet. ITA was presented as one method to analyze the document data. ITA is a method for extracting the independent topics from the document data by using the Independent Component Analysis (ICA). ICA is a technique in the signal processing; however, it is difficult to apply the ITA to increasing number of document data. Because ITA must use the all document data so temporal and spatial cost is very high. Therefore, we present Incremental ITA which extracts the independent topics from increasing number of document data. Incremental ITA is a method of updating the independent topics when the document data is added after extracted the independent topics from a just previous the data. In addition, Incremental ITA updates the independent topics when the document data is added. And we show the result applied Incremental ITA to benchmark datasets.

Keywords: text mining, topic extraction, independent, incremental, independent component analysis

Procedia PDF Downloads 309
42038 A Review of Spatial Analysis as a Geographic Information Management Tool

Authors: Chidiebere C. Agoha, Armstong C. Awuzie, Chukwuebuka N. Onwubuariri, Joy O. Njoku

Abstract:

Spatial analysis is a field of study that utilizes geographic or spatial information to understand and analyze patterns, relationships, and trends in data. It is characterized by the use of geographic or spatial information, which allows for the analysis of data in the context of its location and surroundings. It is different from non-spatial or aspatial techniques, which do not consider the geographic context and may not provide as complete of an understanding of the data. Spatial analysis is applied in a variety of fields, which includes urban planning, environmental science, geosciences, epidemiology, marketing, to gain insights and make decisions about complex spatial problems. This review paper explores definitions of spatial analysis from various sources, including examples of its application and different analysis techniques such as Buffer analysis, interpolation, and Kernel density analysis (multi-distance spatial cluster analysis). It also contrasts spatial analysis with non-spatial analysis.

Keywords: aspatial technique, buffer analysis, epidemiology, interpolation

Procedia PDF Downloads 318
42037 On Pooling Different Levels of Data in Estimating Parameters of Continuous Meta-Analysis

Authors: N. R. N. Idris, S. Baharom

Abstract:

A meta-analysis may be performed using aggregate data (AD) or an individual patient data (IPD). In practice, studies may be available at both IPD and AD level. In this situation, both the IPD and AD should be utilised in order to maximize the available information. Statistical advantages of combining the studies from different level have not been fully explored. This study aims to quantify the statistical benefits of including available IPD when conducting a conventional summary-level meta-analysis. Simulated meta-analysis were used to assess the influence of the levels of data on overall meta-analysis estimates based on IPD-only, AD-only and the combination of IPD and AD (mixed data, MD), under different study scenario. The percentage relative bias (PRB), root mean-square-error (RMSE) and coverage probability were used to assess the efficiency of the overall estimates. The results demonstrate that available IPD should always be included in a conventional meta-analysis using summary level data as they would significantly increased the accuracy of the estimates. On the other hand, if more than 80% of the available data are at IPD level, including the AD does not provide significant differences in terms of accuracy of the estimates. Additionally, combining the IPD and AD has moderating effects on the biasness of the estimates of the treatment effects as the IPD tends to overestimate the treatment effects, while the AD has the tendency to produce underestimated effect estimates. These results may provide some guide in deciding if significant benefit is gained by pooling the two levels of data when conducting meta-analysis.

Keywords: aggregate data, combined-level data, individual patient data, meta-analysis

Procedia PDF Downloads 375