Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 27916

Search results for: bioinformatics analysis

27886 Identification and Characterization of Small Peptides Encoded by Small Open Reading Frames using Mass Spectrometry and Bioinformatics

Abstract:

Short open reading frames (sORFs) located in 5’UTR of mRNAs are known as uORFs. Characterization of uORF-encoded peptides (uPEPs) i.e., a subset of short open reading frame encoded peptides (sPEPs) and their translation regulation lead to understanding of causes of genetic disease, proteome complexity and development of treatments. Existence of uORFs within cellular proteome could be detected by LC-MS/MS. The ability of uORF to be translated into uPEP and achievement of uPEP identification will allow uPEP’s characterization, structures, functions, subcellular localization, evolutionary maintenance (conservation in human and other species) and abundance in cells. It is hypothesized that a subset of sORFs are translatable and that their encoded sPEPs are functional and are endogenously expressed contributing to the eukaryotic cellular proteome complexity. This project aimed to investigate whether sORFs encode functional peptides. Liquid chromatography-mass spectrometry (LC-MS) and bioinformatics were thus employed. Due to probable low abundance of sPEPs and small in sizes, the need for efficient peptide enrichment strategies for enriching small proteins and depleting the sub-proteome of large and abundant proteins is crucial for identifying sPEPs. Low molecular weight proteins were extracted using SDS-PAGE from Human Embryonic Kidney (HEK293) cells and Strong Cation Exchange Chromatography (SCX) from secreted HEK293 cells. Extracted proteins were digested by trypsin to peptides, which were detected by LC-MS/MS. The MS/MS data obtained was searched against Swiss-Prot using MASCOT version 2.4 to filter out known proteins, and all unmatched spectra were re-searched against human RefSeq database. ProteinPilot v5.0.1 was used to identify sPEPs by searching against human RefSeq, Vanderperre and Human Alternative Open Reading Frame (HaltORF) databases. Potential sPEPs were analyzed by bioinformatics. Since SDS PAGE electrophoresis could not separate proteins <20kDa, this could not identify sPEPs. All MASCOT-identified peptide fragments were parts of main open reading frame (mORF) by ORF Finder search and blastp search. No sPEP was detected and existence of sPEPs could not be identified in this study. 13 translated sORFs in HEK293 cells by mass spectrometry in previous studies were characterized by bioinformatics. Identified sPEPs from previous studies were <100 amino acids and <15 kDa. Bioinformatics results showed that sORFs are translated to sPEPs and contribute to proteome complexity. uPEP translated from uORF of SLC35A4 was strongly conserved in human and mouse while uPEP translated from uORF of MKKS was strongly conserved in human and Rhesus monkey. Cross-species conserved uORFs in association with protein translation strongly suggest evolutionary maintenance of coding sequence and indicate probable functional expression of peptides encoded within these uORFs. Translation of sORFs was confirmed by mass spectrometry and sPEPs were characterized with bioinformatics.

Keywords: bioinformatics, HEK293 cells, liquid chromatography-mass spectrometry, ProteinPilot, Strong Cation Exchange Chromatography, SDS-PAGE, sPEPs

Procedia PDF Downloads 188

27885 Nutrigenetic and Bioinformatic Analysis of Rice Bran Bioactives for the Treatment of Lifestyle Related Disease Diabetes and Hypertension

Authors: Md. Alauddin, Md. Ruhul Amin, Md. Omar Faruque, Muhammad Ali Siddiquee, Zakir Hossain Howlader, Mohammad Asaduzzaman

Abstract:

Diabetes and hypertension are the major lifestyle related diseases. The α-amylase and angiotensin converting enzymes (ACE) are the key enzymes that regulate diabetes and hypertension. The aim was to develop a drug for the treatment of diabetes and hypertension. The Rice Bran (RB) sample (Oryza sativa; BRRI-Dhan-84) was collected from the Bangladesh Rice Research Institute (BRRI), and rice bran proteins were isolated and hydrolyzed by hydrolyzing enzyme alcalase and trypsin. In vivo experiment suggested that rice bran bioactives has an effect on regulating the expression of several key gluconeogenesis and lipogenesis-regulating genes, such as glucose-6-phosphatase, phosphoenolpyruvate carboxykinase, and fatty acid synthase. The above genes have a connection of regulating the glucose level, lipids profile as well as act as an anti-inflammatory agent. A molecular docking, bioinformatics and in vitro experiments were performed. We found rice bran protein hydrolysates significantly (<0.05) influence the peptide concentration in the case of trypsin, alcalase, and (trypsin + alcalase) digestion. The in vitro analysis found that protein hydrolysate significantly (<0.05) reduced diabetic and hypertension as well as oxidative stress. A molecular docking study showed that the YY and IP peptide have a significantly strong binding affinity to the active site of the ACE enzyme and α-amylase with -7.8Kcal/mol and -6.2Kcal/mol, respectively. The Molecular dynamics (MD) simulation and Swiss ADME data analysis showed that less toxicity risk, good physicochemical properties, pharmacokinetics, and drug-likeness with drug scores 0.45 and 0.55 of YY and IP peptides, respectively. Thus, rice bran bioactive could be a good candidate for the treatment of diabetes and hypertension.

Keywords: anti-hypertensive and anti-hyperglycemic, anti-oxidative, bioinformatics, in vitro study, rice bran proteins and peptides

Procedia PDF Downloads 61

27884 Molecular Portraits: The Role of Posttranslational Modification in Cancer Metastasis

Authors: Navkiran Kaur, Apoorva Mathur, Abhishree Agarwal, Sakshi Gupta, Tuhin Rashmi

Abstract:

Aim: Breast cancer is the most common cancer in women worldwide, and resistance to the current therapeutics, often concurrently, is an increasing clinical challenge. Glycosylation of proteins is one of the most important post-translational modifications. It is widely known that aberrant glycosylation has been implicated in many different diseases due to changes associated with biological function and protein folding. Alterations in cell surface glycosylation, can promote invasive behavior of tumor cells that ultimately lead to the progression of cancer. In breast cancer, there is an increasing evidence pertaining to the role of glycosylation in tumor formation and metastasis. In the present study, an attempt has been made to study the disease associated sialoglycoproteins in breast cancer by using bioinformatics tools. The sequence will be retrieved from UniProt database. A database in the form of a word document was made by a collection of FASTA sequences of breast cancer gene sequence. Glycosylation was studied using yinOyang tool on ExPASy and Differential genes expression and protein analysis was done in context of breast cancer metastasis. The number of residues predicted O-glc NAc threshold containing 50 aberrant glycosylation sites or more was detected and recorded for individual sequence. We found that the there is a significant change in the expression profiling of glycosylation patterns of various proteins associated with breast cancer. Differential aberrant glycosylated proteins in breast cancer cells with respect to non-neoplastic cells are an important factor for the overall progression and development of cancer.

Keywords: breast cancer, bioinformatics, cancer, metastasis, glycosylation

Procedia PDF Downloads 294

27883 Evaluation of Important Transcription Factors and Kinases in Regulating the Signaling Pathways of Cancer Stem Cells With Low and High Proliferation Rate Derived From Colorectal Cancer

Authors: Mohammad Hossein Habibi, Atena Sadat Hosseini

Abstract:

Colorectal cancer is the third leading cause of cancer-related death in the world. Colorectal cancer screening, early detection, and treatment programs could benefit from the most up-to-date information on the disease's burden, given the present worldwide trend of increasing colorectal cancer incidence. Tumor recurrence and resistance are exacerbated by the presence of chemotherapy-resistant cancer stem cells that can generate rapidly proliferating tumor cells. In addition, tumor cells can evolve chemoresistance through adaptation mechanisms. In this work, we used in silico analysis to select suitable GEO datasets. In this study, we compared slow-growing cancer stem cells with high-growth colorectal cancer-derived cancer stem cells. We then evaluated the signal pathways, transcription factors, and kinases associated with these two types of cancer stem cells. A total of 980 upregulated genes and 870 downregulated genes were clustered. MAPK signaling pathway, AGE-RAGE signaling pathway in diabetic complications, Fc gamma R-mediated phagocytosis, and Steroid biosynthesis signaling pathways were observed in upregulated genes. Also, caffeine metabolism, amino sugar and nucleotide sugar metabolism, TNF signaling pathway, and cytosolic DNA-sensing pathway were involved in downregulated genes. In the next step, we evaluated the best transcription factors and kinases in two types of cancer stem cells. In this regard, NR2F2, ZEB2, HEY1, and HDGF as transcription factors and PRDM5, SMAD, CBP, and KDM2B as critical kinases in upregulated genes. On the other hand, IRF1, SPDEF, NCOA1, and STAT1 transcription factors and CTNNB1 and CDH7 kinases were regulated low expression genes. Using bioinformatics analysis in the present study, we conducted an in-depth study of colorectal cancer stem cells at low and high growth rates so that we could take further steps to detect and even target these cells. Naturally, more additional tests are needed in this direction.

Keywords: colorectal cancer, bioinformatics analysis, transcription factor, kinases, cancer stem cells

Procedia PDF Downloads 126

27882 Whole Exome Sequencing Data Analysis of Rare Diseases: Non-Coding Variants and Copy Number Variations

Authors: S. Fahiminiya, J. Nadaf, F. Rauch, L. Jerome-Majewska, J. Majewski

Abstract:

Background: Sequencing of protein coding regions of human genome (Whole Exome Sequencing; WES), has demonstrated a great success in the identification of causal mutations for several rare genetic disorders in human. Generally, most of WES studies have focused on rare variants in coding exons and splicing-sites where missense substitutions lead to the alternation of protein product. Although focusing on this category of variants has revealed the mystery behind many inherited genetic diseases in recent years, a subset of them remained still inconclusive. Here, we present the result of our WES studies where analyzing only rare variants in coding regions was not conclusive but further investigation revealed the involvement of non-coding variants and copy number variations (CNV) in etiology of the diseases. Methods: Whole exome sequencing was performed using our standard protocols at Genome Quebec Innovation Center, Montreal, Canada. All bioinformatics analyses were done using in-house WES pipeline. Results: To date, we successfully identified several disease causing mutations within gene coding regions (e.g. SCARF2: Van den Ende-Gupta syndrome and SNAP29: 22q11.2 deletion syndrome) by using WES. In addition, we showed that variants in non-coding regions and CNV have also important value and should not be ignored and/or filtered out along the way of bioinformatics analysis on WES data. For instance, in patients with osteogenesis imperfecta type V and in patients with glucocorticoid deficiency, we identified variants in 5'UTR, resulting in the production of longer or truncating non-functional proteins. Furthermore, CNVs were identified as the main cause of the diseases in patients with metaphyseal dysplasia with maxillary hypoplasia and brachydactyly and in patients with osteogenesis imperfecta type VII. Conclusions: Our study highlights the importance of considering non-coding variants and CNVs during interpretation of WES data, as they can be the only cause of disease under investigation.

Keywords: whole exome sequencing data, non-coding variants, copy number variations, rare diseases

Procedia PDF Downloads 419

27881 Knowledge Engineering Based Smart Healthcare Solution

Authors: Rhaed Khiati, Muhammad Hanif

Abstract:

In the past decade, smart healthcare systems have been on an ascendant drift, especially with the evolution of hospitals and their increasing reliance on bioinformatics and software specializing in healthcare. Doctors have become reliant on technology more than ever, something that in the past would have been looked down upon, as technology has become imperative in reducing overall costs and improving the quality of patient care. With patient-doctor interactions becoming more necessary and more complicated than ever, systems must be developed while taking into account costs, patient comfort, and patient data, among other things. In this work, we proposed a smart hospital bed, which mixes the complexity and big data usage of traditional healthcare systems with the comfort found in soft beds while taking certain concerns like data confidentiality, security, and maintaining SLA agreements, etc. into account. This research work potentially provides users, namely patients and doctors, with a seamless interaction with to their respective nurses, as well as faster access to up-to-date personal data, including prescriptions and severity of the condition in contrast to the previous research in the area where there is lack of consideration of such provisions.

Keywords: big data, smart healthcare, distributed systems, bioinformatics

Procedia PDF Downloads 198

27880 Characterizing and Developing the Clinical Grade Microbiome Assay with a Robust Bioinformatics Pipeline for Supporting Precision Medicine Driven Clinical Development

Authors: Danyi Wang, Andrew Schriefer, Dennis O'Rourke, Brajendra Kumar, Yang Liu, Fei Zhong, Juergen Scheuenpflug, Zheng Feng

Abstract:

Purpose: It has been recognized that the microbiome plays critical roles in disease pathogenesis, including cancer, autoimmune disease, and multiple sclerosis. To develop a clinical-grade assay for exploring microbiome-derived clinical biomarkers across disease areas, a two-phase approach is implemented. 1) Identification of the optimal sample preparation reagents using pre-mixed bacteria and healthy donor stool samples coupled with proprietary Sigma-Aldrich® bioinformatics solution. 2) Exploratory analysis of patient samples for enabling precision medicine. Study Procedure: In phase 1 study, we first compared the 16S sequencing results of two ATCC® microbiome standards (MSA 2002 and MSA 2003) across five different extraction kits (Kit A, B, C, D & E). Both microbiome standards samples were extracted in triplicate across all extraction kits. Following isolation, DNA quantity was determined by Qubit assay. DNA quality was assessed to determine purity and to confirm extracted DNA is of high molecular weight. Bacterial 16S ribosomal ribonucleic acid (rRNA) amplicons were generated via amplification of the V3/V4 hypervariable region of the 16S rRNA. Sequencing was performed using a 2x300 bp paired-end configuration on the Illumina MiSeq. Fastq files were analyzed using the Sigma-Aldrich® Microbiome Platform. The Microbiome Platform is a cloud-based service that offers best-in-class 16S-seq and WGS analysis pipelines and databases. The Platform and its methods have been extensively benchmarked using microbiome standards generated internally by MilliporeSigma and other external providers. Data Summary: The DNA yield using the extraction kit D and E is below the limit of detection (100 pg/µl) of Qubit assay as both extraction kits are intended for samples with low bacterial counts. The pre-mixed bacterial pellets at high concentrations with an input of 2 x106 cells for MSA-2002 and 1 x106 cells from MSA-2003 were not compatible with the kits. Among the remaining 3 extraction kits, kit A produced the greatest yield whereas kit B provided the least yield (Kit-A/MSA-2002: 174.25 ± 34.98; Kit-A/MSA-2003: 179.89 ± 30.18; Kit-B/MSA-2002: 27.86 ± 9.35; Kit-B/MSA-2003: 23.14 ± 6.39; Kit-C/MSA-2002: 55.19 ± 10.18; Kit-C/MSA-2003: 35.80 ± 11.41 (Mean ± SD)). Also, kit A produced the greatest yield, whereas kit B provided the least yield. The PCoA 3D visualization of the Weighted Unifrac beta diversity shows that kits A and C cluster closely together while kit B appears as an outlier. The kit A sequencing samples cluster more closely together than both the other kits. The taxonomic profiles of kit B have lower recall when compared to the known mixture profiles indicating that kit B was inefficient at detecting some of the bacteria. Conclusion: Our data demonstrated that the DNA extraction method impacts DNA concentration, purity, and microbial communities detected by next-generation sequencing analysis. Further microbiome analysis performance comparison of using healthy stool samples is underway; also, colorectal cancer patients' samples will be acquired for further explore the clinical utilities. Collectively, our comprehensive qualification approach, including the evaluation of optimal DNA extraction conditions, the inclusion of positive controls, and the implementation of a robust qualified bioinformatics pipeline, assures accurate characterization of the microbiota in a complex matrix for deciphering the deep biology and enabling precision medicine.

Keywords: 16S rRNA sequencing, analytical validation, bioinformatics pipeline, metagenomics

Procedia PDF Downloads 170

27879 Microbial Dark Matter Analysis Using 16S rRNA Gene Metagenomics Sequences

Authors: Hana Barak, Alex Sivan, Ariel Kushmaro

Abstract:

Microorganisms are the most diverse and abundant life forms on Earth and account for a large portion of the Earth’s biomass and biodiversity. To date though, our knowledge regarding microbial life is lacking, as it is based mainly on information from cultivated organisms. Indeed, microbiologists have borrowed from astrophysics and termed the ‘uncultured microbial majority’ as ‘microbial dark matter’. The realization of how diverse and unexplored microorganisms are, actually stems from recent advances in molecular biology, and in particular from novel methods for sequencing microbial small subunit ribosomal RNA genes directly from environmental samples termed next-generation sequencing (NGS). This has led us to use NGS that generates several gigabases of sequencing data in a single experimental run, to identify and classify environmental samples of microorganisms. In metagenomics sequencing analysis (both 16S and shotgun), sequences are compared to reference databases that contain only small part of the existing microorganisms and therefore their taxonomy assignment may reveal groups of unknown microorganisms or origins. These unknowns, or the ‘microbial sequences dark matter’, are usually ignored in spite of their great importance. The goal of this work was to develop an improved bioinformatics method that enables more complete analyses of the microbial communities in numerous environments. Therefore, NGS was used to identify previously unknown microorganisms from three different environments (industrials wastewater, Negev Desert’s rocks and water wells at the Arava valley). 16S rRNA gene metagenome analysis of the microorganisms from those three environments produce about ~4 million reads for 75 samples. Between 0.1-12% of the sequences in each sample were tagged as ‘Unassigned’. Employing relatively simple methodology for resequencing of original gDNA samples through Sanger or MiSeq Illumina with specific primers, this study demonstrates that the mysterious ‘Unassigned’ group apparently contains sequences of candidate phyla. Those unknown sequences can be located on a phylogenetic tree and thus provide a better understanding of the ‘sequences dark matter’ and its role in the research of microbial communities and diversity. Studying this ‘dark matter’ will extend the existing databases and could reveal the hidden potential of the ‘microbial dark matter’.

Keywords: bacteria, bioinformatics, dark matter, Next Generation Sequencing, unknown

Procedia PDF Downloads 257

27878 Bioinformatics Approach to Support Genetic Research in Autism in Mali

Authors: M. Kouyate, M. Sangare, S. Samake, S. Keita, H. G. Kim, D. H. Geschwind

Abstract:

Background & Objectives: Human genetic studies can be expensive, even unaffordable, in developing countries, partly due to the sequencing costs. Our aim is to pilot the use of bioinformatics tools to guide scientifically valid, locally relevant, and economically sound autism genetic research in Mali. Methods: The following databases, NCBI, HGMD, and LSDB, were used to identify hot point mutations. Phenotype, transmission pattern, theoretical protein expression in the brain, the impact of the mutation on the 3D structure of the protein) were used to prioritize selected autism genes. We used the protein database, Modeller, and clustal W. Results: We found Mef2c (Gly27Ala/Leu38Gln), Pten (Thr131IIle), Prodh (Leu289Met), Nme1 (Ser120Gly), and Dhcr7 (Pro227Thr/Glu224Lys). These mutations were associated with endonucleases BseRI, NspI, PfrJS2IV, BspGI, BsaBI, and SpoDI, respectively. Gly27Ala/Leu38Gln mutations impacted the 3D structure of the Mef2c protein. Mef2c protein sequences across species showed a high percentage of similarity with a highly conserved MADS domain. Discussion: Mef2c, Pten, Prodh, Nme1, and Dhcr 7 gene mutation frequencies in the Malian population will be very informative. PCR coupled with restriction enzyme digestion can be used to screen the targeted gene mutations. Sanger sequencing will be used for confirmation only. This will cut down considerably the sequencing cost for gene-to-gene mutation screening. The knowledge of the 3D structure and potential impact of the mutations on Mef2c protein informed the protein family and altered function (ex. Leu38Gln). Conclusion & Future Work: Bio-informatics will positively impact autism research in Mali. Our approach can be applied to another neuropsychiatric disorder.

Keywords: bioinformatics, endonucleases, autism, Sanger sequencing, point mutations

Procedia PDF Downloads 83

27877 Investigate the Side Effects of Patients With Severe COVID-19 and Choose the Appropriate Medication Regimens to Deal With Them

Authors: Rasha Ahmadi

Abstract:

In December 2019, a coronavirus, currently identified as SARS-CoV-2, produced a series of acute atypical respiratory illnesses in Wuhan, Hubei Province, China. The sickness induced by this virus was named COVID-19. The virus is transmittable between humans and has caused pandemics worldwide. The number of death tolls continues to climb and a huge number of countries have been obliged to perform social isolation and lockdown. Lack of focused therapy continues to be a problem. Epidemiological research showed that senior patients were more susceptible to severe diseases, whereas children tend to have milder symptoms. In this study, we focus on other possible side effects of COVID-19 and more detailed treatment strategies. Using bioinformatics analysis, we first isolated the gene expression profile of patients with severe COVID-19 from the GEO database. Patients' blood samples were used in the GSE183071 dataset. We then categorized the genes with high and low expression. In the next step, we uploaded the genes separately to the Enrichr database and evaluated our data for signs and symptoms as well as related medication regimens. The results showed that 138 genes with high expression and 108 genes with low expression were observed differentially in the severe COVID-19 VS control group. Symptoms and diseases such as embolism and thrombosis of the abdominal aorta, ankylosing spondylitis, suicidal ideation or attempt, regional enteritis were observed in genes with high expression and in genes with low expression of acute and subacute forms of ischemic heart, CNS infection and poliomyelitis, synovitis and tenosynovitis. Following the detection of diseases and possible signs and symptoms, Carmustine, Bithionol, Leflunomide were evaluated more significantly for high-expression genes and Chlorambucil, Ifosfamide, Hydroxyurea, Bisphenol for low-expression genes. In general, examining the different and invisible aspects of COVID-19 and identifying possible treatments can help us significantly in the emergency and hospitalization of patients.

Keywords: phenotypes, drug regimens, gene expression profiles, bioinformatics analysis, severe COVID-19

Procedia PDF Downloads 142

27876 Bioinformatic Prediction of Hub Genes by Analysis of Signaling Pathways, Transcriptional Regulatory Networks and DNA Methylation Pattern in Colon Cancer

Authors: Ankan Roy, Niharika, Samir Kumar Patra

Abstract:

Anomalous nexus of complex topological assemblies and spatiotemporal epigenetic choreography at chromosomal territory may forms the most sophisticated regulatory layer of gene expression in cancer. Colon cancer is one of the leading malignant neoplasms of the lower gastrointestinal tract worldwide. There is still a paucity of information about the complex molecular mechanisms of colonic cancerogenesis. Bioinformatics prediction and analysis helps to identify essential genes and significant pathways for monitoring and conquering this deadly disease. The present study investigates and explores potential hub genes as biomarkers and effective therapeutic targets for colon cancer treatment. Colon cancer patient sample containing gene expression profile datasets, such as GSE44076, GSE20916, and GSE37364 were downloaded from Gene Expression Omnibus (GEO) database and thoroughly screened using the GEO2R tool and Funrich software to find out common 2 differentially expressed genes (DEGs). Other approaches, including Gene Ontology (GO) and KEGG pathway analysis, Protein-Protein Interaction (PPI) network construction and hub gene investigation, Overall Survival (OS) analysis, gene correlation analysis, methylation pattern analysis, and hub gene-Transcription factors regulatory network construction, were performed and validated using various bioinformatics tool. Initially, we identified 166 DEGs, including 68 up-regulated and 98 down-regulated genes. Up-regulated genes are mainly associated with the Cytokine-cytokine receptor interaction, IL17 signaling pathway, ECM-receptor interaction, Focal adhesion and PI3K-Akt pathway. Downregulated genes are enriched in metabolic pathways, retinol metabolism, Steroid hormone biosynthesis, and bile secretion. From the protein-protein interaction network, thirty hub genes with high connectivity are selected using the MCODE and cytoHubba plugin. Survival analysis, expression validation, correlation analysis, and methylation pattern analysis were further verified using TCGA data. Finally, we predicted COL1A1, COL1A2, COL4A1, SPP1, SPARC, and THBS2 as potential master regulators in colonic cancerogenesis. Moreover, our experimental data highlights that disruption of lipid raft and RAS/MAPK signaling cascade affects this gene hub at mRNA level. We identified COL1A1, COL1A2, COL4A1, SPP1, SPARC, and THBS2 as determinant hub genes in colon cancer progression. They can be considered as biomarkers for diagnosis and promising therapeutic targets in colon cancer treatment. Additionally, our experimental data advertise that signaling pathway act as connecting link between membrane hub and gene hub.

Keywords: hub genes, colon cancer, DNA methylation, epigenetic engineering, bioinformatic predictions

Procedia PDF Downloads 128

27875 Genome Sequencing and Analysis of the Spontaneous Nanosilver Resistant Bacterium Proteus mirabilis Strain scdr1

Authors: Amr Saeb, Khalid Al-Rubeaan, Mohamed Abouelhoda, Manojkumar Selvaraju, Hamsa Tayeb

Abstract:

Background: P. mirabilis is a common uropathogenic bacterium that can cause major complications in patients with long-standing indwelling catheters or patients with urinary tract anomalies. In addition, P. mirabilis is a common cause of chronic osteomyelitis in diabetic foot ulcer (DFU) patients. Methodology: P. mirabilis SCDR1 was isolated from a diabetic ulcer patient. We examined P. mirabilis SCDR1 levels of resistance against nano-silver colloids, the commercial nano-silver and silver containing bandages and commonly used antibiotics. We utilized next generation sequencing techniques (NGS), bioinformatics, phylogenetic analysis and pathogenomics in the identification and characterization of the infectious pathogen. Results: P. mirabilis SCDR1 is a multi-drug resistant isolate that also showed high levels of resistance against nano-silver colloids, nano-silver chitosan composite and the commercially available nano-silver and silver bandages. The P. mirabilis-SCDR1 genome size is 3,815,621 bp with G+C content of 38.44%. P. mirabilis-SCDR1 genome contains a total of 3,533 genes, 3,414 coding DNA sequence genes, 11, 10, 18 rRNAs (5S, 16S, and 23S), and 76 tRNAs. Our isolate contains all the required pathogenicity and virulence factors to establish a successful infection. P. mirabilis SCDR1 isolate is a potential virulent pathogen that despite its original isolation site, wound, it can establish kidney infection and its associated complications. P. mirabilis SCDR1 contains several mechanisms for antibiotics and metals resistance including, biofilm formation, swarming mobility, efflux systems, and enzymatic detoxification. Conclusion: P. mirabilis SCDR1 is the spontaneous nano-silver resistant bacterial strain. P. mirabilis SCDR1 strain contains all reported pathogenic and virulence factors characteristic for the species. In addition, it possesses several mechanisms that may lead to the observed nano-silver resistance.

Keywords: Proteus mirabilis, multi-drug resistance, silver nanoparticles, resistance, next generation sequencing techniques, genome analysis, bioinformatics, phylogeny, pathogenomics, diabetic foot ulcer, xenobiotics, multidrug resistance efflux, biofilm formation, swarming mobility, resistome, glutathione S-transferase, copper/silver efflux system, altruism

Procedia PDF Downloads 333

27874 Meanings and Concepts of Standardization in Systems Medicine

Authors: Imme Petersen, Wiebke Sick, Regine Kollek

Abstract:

In systems medicine, high-throughput technologies produce large amounts of data on different biological and pathological processes, including (disturbed) gene expressions, metabolic pathways and signaling. The large volume of data of different types, stored in separate databases and often located at different geographical sites have posed new challenges regarding data handling and processing. Tools based on bioinformatics have been developed to resolve the upcoming problems of systematizing, standardizing and integrating the various data. However, the heterogeneity of data gathered at different levels of biological complexity is still a major challenge in data analysis. To build multilayer disease modules, large and heterogeneous data of disease-related information (e.g., genotype, phenotype, environmental factors) are correlated. Therefore, a great deal of attention in systems medicine has been put on data standardization, primarily to retrieve and combine large, heterogeneous datasets into standardized and incorporated forms and structures. However, this data-centred concept of standardization in systems medicine is contrary to the debate in science and technology studies (STS) on standardization that rather emphasizes the dynamics, contexts and negotiations of standard operating procedures. Based on empirical work on research consortia that explore the molecular profile of diseases to establish systems medical approaches in the clinic in Germany, we trace how standardized data are processed and shaped by bioinformatics tools, how scientists using such data in research perceive such standard operating procedures and which consequences for knowledge production (e.g. modeling) arise from it. Hence, different concepts and meanings of standardization are explored to get a deeper insight into standard operating procedures not only in systems medicine, but also beyond.

Keywords: data, science and technology studies (STS), standardization, systems medicine

Procedia PDF Downloads 341

27873 Syntax and Words as Evolutionary Characters in Comparative Linguistics

Authors: Nancy Retzlaff, Sarah J. Berkemer, Trudie Strauss

Abstract:

In the last couple of decades, the advent of digitalization of any kind of data was probably one of the major advances in all fields of study. This paves the way for also analysing these data even though they might come from disciplines where there was no initial computational necessity to do so. Especially in linguistics, one can find a rather manual tradition. Still when considering studies that involve the history of language families it is hard to overlook the striking similarities to bioinformatics (phylogenetic) approaches. Alignments of words are such a fairly well studied example of an application of bioinformatics methods to historical linguistics. In this paper we will not only consider alignments of strings, i.e., words in this case, but also alignments of syntax trees of selected Indo-European languages. Based on initial, crude alignments, a sophisticated scoring model is trained on both letters and syntactic features. The aim is to gain a better understanding on which features in two languages are related, i.e., most likely to have the same root. Initially, all words in two languages are pre-aligned with a basic scoring model that primarily selects consonants and adjusts them before fitting in the vowels. Mixture models are subsequently used to filter ‘good’ alignments depending on the alignment length and the number of inserted gaps. Using these selected word alignments it is possible to perform tree alignments of the given syntax trees and consequently find sentences that correspond rather well to each other across languages. The syntax alignments are then filtered for meaningful scores—’good’ scores contain evolutionary information and are therefore used to train the sophisticated scoring model. Further iterations of alignments and training steps are performed until the scoring model saturates, i.e., barely changes anymore. A better evaluation of the trained scoring model and its function in containing evolutionary meaningful information will be given. An assessment of sentence alignment compared to possible phrase structure will also be provided. The method described here may have its flaws because of limited prior information. This, however, may offer a good starting point to study languages where only little prior knowledge is available and a detailed, unbiased study is needed.

Keywords: alignments, bioinformatics, comparative linguistics, historical linguistics, statistical methods

Procedia PDF Downloads 154

27872 Routing Metrics and Protocols for Wireless Mesh Networks

Authors: Samira Kalantary, Zohre Saatzade

Abstract:

Wireless Mesh Networks (WMNs) are low-cost access networks built on cooperative routing over a backbone composed of stationary wireless routers. WMNs must deal with the highly unstable wireless medium. Thus, routing metrics and protocols are evolving by designing algorithms that consider link quality to choose the best routes. In this work, we analyse the state of the art in WMN metrics and propose taxonomy for WMN routing protocols. Performance measurements of a wireless mesh network deployed using various routing metrics are presented and corroborate our analysis.

Keywords: wireless mesh networks, routing protocols, routing metrics, bioinformatics

Procedia PDF Downloads 453

27871 Identification of Disease Causing DNA Motifs in Human DNA Using Clustering Approach

Authors: G. Tamilpavai, C. Vishnuppriya

Abstract:

Studying DNA (deoxyribonucleic acid) sequence is useful in biological processes and it is applied in the fields such as diagnostic and forensic research. DNA is the hereditary information in human and almost all other organisms. It is passed to their generations. Earlier stage detection of defective DNA sequence may lead to many developments in the field of Bioinformatics. Nowadays various tedious techniques are used to identify defective DNA. The proposed work is to analyze and identify the cancer-causing DNA motif in a given sequence. Initially the human DNA sequence is separated as k-mers using k-mer separation rule. The separated k-mers are clustered using Self Organizing Map (SOM). Using Levenshtein distance measure, cancer associated DNA motif is identified from the k-mer clusters. Experimental results of this work indicate the presence or absence of cancer causing DNA motif. If the cancer associated DNA motif is found in DNA, it is declared as the cancer disease causing DNA sequence. Otherwise the input human DNA is declared as normal sequence. Finally, elapsed time is calculated for finding the presence of cancer causing DNA motif using clustering formation. It is compared with normal process of finding cancer causing DNA motif. Locating cancer associated motif is easier in cluster formation process than the other one. The proposed work will be an initiative aid for finding genetic disease related research.

Keywords: bioinformatics, cancer motif, DNA, k-mers, Levenshtein distance, SOM

Procedia PDF Downloads 188

27870 Integrative Omics-Portrayal Disentangles Molecular Heterogeneity and Progression Mechanisms of Cancer

Authors: Binder Hans

Abstract:

Cancer is no longer seen as solely a genetic disease where genetic defects such as mutations and copy number variations affect gene regulation and eventually lead to aberrant cell functioning which can be monitored by transcriptome analysis. It has become obvious that epigenetic alterations represent a further important layer of (de-)regulation of gene activity. For example, aberrant DNA methylation is a hallmark of many cancer types, and methylation patterns were successfully used to subtype cancer heterogeneity. Hence, unraveling the interplay between different omics levels such as genome, transcriptome and epigenome is inevitable for a mechanistic understanding of molecular deregulation causing complex diseases such as cancer. This objective requires powerful downstream integrative bioinformatics methods as an essential prerequisite to discover the whole genome mutational, transcriptome and epigenome landscapes of cancer specimen and to discover cancer genesis, progression and heterogeneity. Basic challenges and tasks arise ‘beyond sequencing’ because of the big size of the data, their complexity, the need to search for hidden structures in the data, for knowledge mining to discover biological function and also systems biology conceptual models to deduce developmental interrelations between different cancer states. These tasks are tightly related to cancer biology as an (epi-)genetic disease giving rise to aberrant genomic regulation under micro-environmental control and clonal evolution which leads to heterogeneous cellular states. Machine learning algorithms such as self organizing maps (SOM) represent one interesting option to tackle these bioinformatics tasks. The SOMmethod enables recognizing complex patterns in large-scale data generated by highthroughput omics technologies. It portrays molecular phenotypes by generating individualized, easy to interpret images of the data landscape in combination with comprehensive analysis options. Our image-based, reductionist machine learning methods provide one interesting perspective how to deal with massive data in the discovery of complex diseases, gliomas, melanomas and colon cancer on molecular level. As an important new challenge, we address the combined portrayal of different omics data such as genome-wide genomic, transcriptomic and methylomic ones. The integrative-omics portrayal approach is based on the joint training of the data and it provides separate personalized data portraits for each patient and data type which can be analyzed by visual inspection as one option. The new method enables an integrative genome-wide view on the omics data types and the underlying regulatory modes. It is applied to high and low-grade gliomas and to melanomas where it disentangles transversal and longitudinal molecular heterogeneity in terms of distinct molecular subtypes and progression paths with prognostic impact.

Keywords: integrative bioinformatics, machine learning, molecular mechanisms of cancer, gliomas and melanomas

Procedia PDF Downloads 148

27869 LTF Expression Profiling Which is Essential for Cancer Cell Proliferation and Metastasis, Correlating with Clinical Features, as Well as Early Stages of Breast Cancer

Authors: Azar Heidarizadi, Mahdieh Salimi, Hossein Mozdarani

Abstract:

Introduction: As a complex disease, breast cancer results from several genetic and epigenetic changes. Lactoferrin, a member of the transferrin family, is reported to have a number of biological functions, including DNA synthesis, immune responses, iron transport, etc., any of which could play a role in tumor progression. The aim of this study was to investigate the bioinformatics data and experimental assay to find the pattern of promoter methylation and gene expression of LTF in breast cancer in order to study its potential role in cancer management. Material and Methods: In order to evaluate the methylation status of the LTF promoter, we studied the MS-PCR and Real-Time PCR on samples from patients with breast cancer and normal cases. 67 patient samples were conducted for this study, including tumoral, plasma, and normal tissue adjacent samples, as well as 30 plasma from normal cases and 10 tissue breast reduction cases. Subsequently, bioinformatics analyses such as cBioPortal databases, string, and genomatix were conducted to disclose the prognostic value of LTF in breast cancer progression. Results: The analysis of LTF expression showed an inverse relationship between the expression level of LTF and the stages of tissues of breast cancer patients (p<0.01). In fact, stages 1 and 2 had a high expression in LTF, while, in stages 3 and 4, a significant reduction was observable (p < 0.0001). LTF expression frequently alters with a decrease in the expression in ER⁺, PR⁺, and HER2⁺ patients (P < 0.01) and an increase in the expression in the TNBC, LN¯, ER¯, and PR- patients (P < 0.001). Also, LTF expression is significantly associated with metastasis and lymph node involvement factors (P < 0.0001). The sensitivity and specificity of LTF were detected, respectively. A negative correlation was detected between the results of level expression and methylation of the LTF promoter. Conclusions: The altered expression of LTF observed in breast cancer patients could be considered as a promotion in cell proliferation and metastasis even in the early stages of cancer.

Keywords: LTF, expression, methylation, breast cancer

Procedia PDF Downloads 71

27868 Genome-Wide Isoform Specific KDM5A/JARID1A/RBP2 Location Analysis Reveals Contribution of Chromatin-Interacting PHD Domain in Protein Recruitment to Binding Sites

Authors: Abul B. M. M. K. Islam, Nuria Lopez-Bigas, Elizaveta V. Benevolenskaya

Abstract:

RBP2 has shown to be important for cell differentiation control through epigenetic mechanism. The main aim of the present study is genome-wide location analysis of human RBP2 isoforms that differ in a histone-binding domain by ChIPseq. It is conceivable that the larger isoform (LI) of RBP2, which contains a specific H3K4me3 interacting domain, differs from the smaller isoform (SI) in genomic location, may account for the observed diversity in RBP2 function. To distinguish the two RBP2 isoforms, we used the fact that the SI lacks the C-terminal PHD domain and hence used the antibodies detecting both RBP2 isoforms (AI) through a common central domain, and the antibodies detecting only LI but not SI, through a C-terminal PHD domain. Overall our analysis suggests that RBP2 occupies about 77 nucleotides and binds GC rich motifs of active genes, does not bind to centromere, telomere, or enhancer regions, and binding sites are conserved compare to random. A striking difference between the only-SI and only-LI is that a large number of only-SI peaks are located in CpG islands and close to TSS compared to only-LI peaks. Enrichment analysis of the related genes indicates that several oncogenic pathways and metabolic pathways/processes are significantly enriched among only-SI/AI targets, but not LI/only-LI peak’s targets.

Keywords: bioinformatics, cancer, ChIP-seq, KDM5A

Procedia PDF Downloads 307

27867 Imputation Technique for Feature Selection in Microarray Data Set

Authors: Younies Saeed Hassan Mahmoud, Mai Mabrouk, Elsayed Sallam

Abstract:

Analysing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.

Keywords: DNA microarray, feature selection, missing data, bioinformatics

Procedia PDF Downloads 574

27866 Solar Heating System to Promote the Disinfection of Water

Authors: Elmo Thiago Lins Cöuras Ford, Valentina Alessandra Carvalho do Vale

Abstract:

It presents a heating system using low cost alternative solar collectors to promote the disinfection of water in low income communities that take water contaminated by bacteria. The system consists of two solar collectors, with total area of 4 m² and was built using PET bottles and cans of beer and soft drinks. Each collector is made up of 8 PVC tubes, connected in series and work in continuous flow. It will determine the flux the most appropriate to generate the temperature to promote the disinfection. It will be presented results of the efficiency and thermal loss of system and results of analysis of water after undergoing the process of heating.

Keywords: Disinfection of water, solar heating system, poor communities, bioinformatics, biomedicine

Procedia PDF Downloads 485

27865 A Survey of Semantic Integration Approaches in Bioinformatics

Authors: Chaimaa Messaoudi, Rachida Fissoune, Hassan Badir

Abstract:

Technological advances of computer science and data analysis are helping to provide continuously huge volumes of biological data, which are available on the web. Such advances involve and require powerful techniques for data integration to extract pertinent knowledge and information for a specific question. Biomedical exploration of these big data often requires the use of complex queries across multiple autonomous, heterogeneous and distributed data sources. Semantic integration is an active area of research in several disciplines, such as databases, information-integration, and ontology. We provide a survey of some approaches and techniques for integrating biological data, we focus on those developed in the ontology community.

Keywords: biological ontology, linked data, semantic data integration, semantic web

Procedia PDF Downloads 449

27864 Hybrid Structure Learning Approach for Assessing the Phosphate Laundries Impact

Authors: Emna Benmohamed, Hela Ltifi, Mounir Ben Ayed

Abstract:

Bayesian Network (BN) is one of the most efficient classification methods. It is widely used in several fields (i.e., medical diagnostics, risk analysis, bioinformatics research). The BN is defined as a probabilistic graphical model that represents a formalism for reasoning under uncertainty. This classification method has a high-performance rate in the extraction of new knowledge from data. The construction of this model consists of two phases for structure learning and parameter learning. For solving this problem, the K2 algorithm is one of the representative data-driven algorithms, which is based on score and search approach. In addition, the integration of the expert's knowledge in the structure learning process allows the obtainment of the highest accuracy. In this paper, we propose a hybrid approach combining the improvement of the K2 algorithm called K2 algorithm for Parents and Children search (K2PC) and the expert-driven method for learning the structure of BN. The evaluation of the experimental results, using the well-known benchmarks, proves that our K2PC algorithm has better performance in terms of correct structure detection. The real application of our model shows its efficiency in the analysis of the phosphate laundry effluents' impact on the watershed in the Gafsa area (southwestern Tunisia).

Keywords: Bayesian network, classification, expert knowledge, structure learning, surface water analysis

Procedia PDF Downloads 128

27863 Cellular RNA-Binding Domains with Distant Homology in Viral Proteomes

Authors: German Hernandez-Alonso, Antonio Lazcano, Arturo Becerra

Abstract:

Until today, viruses remain controversial and poorly understood; about their origin, this problem represents an enigma and one of the great challenges for the contemporary biology. Three main theories have tried to explain the origin of viruses: regressive evolution, escaped host gene, and pre-cellular origin. Under the perspective of the escaped host gene theory, it can be assumed a cellular origin of viral components, like protein RNA-binding domains. These universal distributed RNA-binding domains are related to the RNA metabolism processes, including transcription, processing, and modification of transcripts, translation, RNA degradation and its regulation. In the case of viruses, these domains are present in important viral proteins like helicases, nucleases, polymerases, capsid proteins or regulation factors. Therefore, they are implicated in the replicative cycle and parasitic processes of viruses. That is why it is possible to think that those domains present low levels of divergence due to selective pressures. For these reasons, the main goal for this project is to create a catalogue of the RNA-binding domains found in all the available viral proteomes, using bioinformatics tools in order to analyze its evolutionary process, and thus shed light on the general virus evolution. ProDom database was used to obtain larger than six thousand RNA-binding domain families that belong to the three cellular domains of life and some viral groups. From the sequences of these families, protein profiles were created using HMMER 3.1 tools in order to find distant homologous within greater than four thousand viral proteomes available in GenBank. Once accomplished the analysis, almost three thousand hits were obtained in the viral proteomes. The homologous sequences were found in proteomes of the principal Baltimore viral groups, showing interesting distribution patterns that can contribute to understand the evolution of viruses and their host-virus interactions. Presence of cellular RNA-binding domains within virus proteomes seem to be explained by closed interactions between viruses and their hosts. Recruitment of these domains is advantageous for the viral fitness, allowing viruses to be adapted to the host cellular environment.

Keywords: bioinformatics tools, distant homology, RNA-binding domains, viral evolution

Procedia PDF Downloads 387

27862 A Study on Big Data Analytics, Applications and Challenges

Authors: Chhavi Rana

Abstract:

The aim of the paper is to highlight the existing development in the field of big data analytics. Applications like bioinformatics, smart infrastructure projects, Healthcare, and business intelligence contain voluminous and incremental data, which is hard to organise and analyse and can be dealt with using the framework and model in this field of study. An organization's decision-making strategy can be enhanced using big data analytics and applying different machine learning techniques and statistical tools on such complex data sets that will consequently make better things for society. This paper reviews the current state of the art in this field of study as well as different application domains of big data analytics. It also elaborates on various frameworks in the process of Analysis using different machine-learning techniques. Finally, the paper concludes by stating different challenges and issues raised in existing research.

Keywords: big data, big data analytics, machine learning, review

Procedia PDF Downloads 83

27861 A Study on Big Data Analytics, Applications, and Challenges

Authors: Chhavi Rana

Abstract:

The aim of the paper is to highlight the existing development in the field of big data analytics. Applications like bioinformatics, smart infrastructure projects, healthcare, and business intelligence contain voluminous and incremental data which is hard to organise and analyse and can be dealt with using the framework and model in this field of study. An organisation decision-making strategy can be enhanced by using big data analytics and applying different machine learning techniques and statistical tools to such complex data sets that will consequently make better things for society. This paper reviews the current state of the art in this field of study as well as different application domains of big data analytics. It also elaborates various frameworks in the process of analysis using different machine learning techniques. Finally, the paper concludes by stating different challenges and issues raised in existing research.

Keywords: big data, big data analytics, machine learning, review

Procedia PDF Downloads 95

27860 The Development of an Automated Computational Workflow to Prioritize Potential Resistance Variants in HIV Integrase Subtype C

Authors: Keaghan Brown

Abstract:

The prioritization of drug resistance mutations impacting protein folding or protein-drug and protein-DNA interactions within macromolecular systems is critical to the success of treatment regimens. With a continual increase in computational tools to assess these impacts, the need for scalability and reproducibility became an essential component of computational analysis and experimental research. Here it introduce a bioinformatics pipeline that combines several structural analysis tools in a simplified workflow, by optimizing the present computational hardware and software to automatically ease the flow of data transformations. Utilizing preestablished software tools, it was possible to develop a pipeline with a set of pre-defined functions that will automate mutation introduction into the HIV-1 Integrase protein structure, calculate the gain and loss of polar interactions and calculate the change in energy of protein fold. Additionally, an automated molecular dynamics analysis was implemented which reduces the constant need for user input and output management. The resulting pipeline, Automated Mutation Introduction and Analysis (AMIA) is an open source set of scripts designed to introduce and analyse the effects of mutations on the static protein structure as well as the results of the multi-conformational states from molecular dynamic simulations. The workflow allows the user to visualize all outputs in a user friendly manner thereby successfully enabling the prioritization of variant systems for experimental validation.

Keywords: automated workflow, variant prioritization, drug resistance, HIV Integrase

Procedia PDF Downloads 77

27859 Easymodel: Web-based Bioinformatics Software for Protein Modeling Based on Modeller

Authors: Alireza Dantism

Abstract:

Presently, describing the function of a protein sequence is one of the most common problems in biology. Usually, this problem can be facilitated by studying the three-dimensional structure of proteins. In the absence of a protein structure, comparative modeling often provides a useful three-dimensional model of the protein that is dependent on at least one known protein structure. Comparative modeling predicts the three-dimensional structure of a given protein sequence (target) mainly based on its alignment with one or more proteins of known structure (templates). Comparative modeling consists of four main steps 1. Similarity between the target sequence and at least one known template structure 2. Alignment of target sequence and template(s) 3. Build a model based on alignment with the selected template(s). 4. Prediction of model errors 5. Optimization of the built model There are many computer programs and web servers that automate the comparative modeling process. One of the most important advantages of these servers is that it makes comparative modeling available to both experts and non-experts, and they can easily do their own modeling without the need for programming knowledge, but some other experts prefer using programming knowledge and do their modeling manually because by doing this they can maximize the accuracy of their modeling. In this study, a web-based tool has been designed to predict the tertiary structure of proteins using PHP and Python programming languages. This tool is called EasyModel. EasyModel can receive, according to the user's inputs, the desired unknown sequence (which we know as the target) in this study, the protein sequence file (template), etc., which also has a percentage of similarity with the primary sequence, and its third structure Predict the unknown sequence and present the results in the form of graphs and constructed protein files.

Keywords: structural bioinformatics, protein tertiary structure prediction, modeling, comparative modeling, modeller

Procedia PDF Downloads 97

27858 Prediction and Analysis of Human Transmembrane Transporter Proteins Based on SCM

Authors: Hui-Ling Huang, Tamara Vasylenko, Phasit Charoenkwan, Shih-Hsiang Chiu, Shinn-Ying Ho

Abstract:

The knowledge of the human transporters is still limited due to technically demanding procedure of crystallization for the structural characterization of transporters by spectroscopic methods. It is desirable to develop bioinformatics tools for effective analysis of available sequences in order to identify human transmembrane transporter proteins (HMTPs). This study proposes a scoring card method (SCM) based method for predicting HMTPs. We estimated a set of propensity scores of dipeptides to be HMTPs using SCM from the training dataset (HTS732) consisting of 366 HMTPs and 366 non-HMTPs. SCM using the estimated propensity scores of 20 amino acids and 400 dipeptides -as HMTPs, has a training accuracy of 87.63% and a test accuracy of 66.46%. The five top-ranked dipeptides include LD, NV, LI, KY, and MN with scores 996, 992, 989, 987, and 985, respectively. Five amino acids with the highest propensity scores are Ile, Phe, Met, Gly, and Leu, that hydrophobic residues are mostly highly-scored. Furthermore, obtained propensity scores were used to analyze physicochemical properties of human transporters.

Keywords: dipeptide composition, physicochemical property, human transmembrane transporter proteins, human transmembrane transporters binding propensity, scoring card method

Procedia PDF Downloads 368

27857 C-eXpress: A Web-Based Analysis Platform for Comparative Functional Genomics and Proteomics in Human Cancer Cell Line, NCI-60 as an Example

Authors: Chi-Ching Lee, Po-Jung Huang, Kuo-Yang Huang, Petrus Tang

Abstract:

Background: Recent advances in high-throughput research technologies such as new-generation sequencing and multi-dimensional liquid chromatography makes it possible to dissect the complete transcriptome and proteome in a single run for the first time. However, it is almost impossible for many laboratories to handle and analysis these “BIG” data without the support from a bioinformatics team. We aimed to provide a web-based analysis platform for users with only limited knowledge on bio-computing to study the functional genomics and proteomics. Method: We use NCI-60 as an example dataset to demonstrate the power of the web-based analysis platform and data delivering system: C-eXpress takes a simple text file that contain the standard NCBI gene or protein ID and expression levels (rpkm or fold) as input file to generate a distribution map of gene/protein expression levels in a heatmap diagram organized by color gradients. The diagram is hyper-linked to a dynamic html table that allows the users to filter the datasets based on various gene features. A dynamic summary chart is generated automatically after each filtering process. Results: We implemented an integrated database that contain pre-defined annotations such as gene/protein properties (ID, name, length, MW, pI); pathways based on KEGG and GO biological process; subcellular localization based on GO cellular component; functional classification based on GO molecular function, kinase, peptidase and transporter. Multiple ways of sorting of column and rows is also provided for comparative analysis and visualization of multiple samples.

Keywords: cancer, visualization, database, functional annotation

Procedia PDF Downloads 618