Full-genomic Network Inference for Non-model organisms: A Case Study for the Fungal Pathogen Candida albicans
Reverse engineering of full-genomic interaction networks based on compendia of expression data has been successfully applied for a number of model organisms. This study adapts these approaches for an important non-model organism: The major human fungal pathogen Candida albicans. During the infection process, the pathogen can adapt to a wide range of environmental niches and reversibly changes its growth form. Given the importance of these processes, it is important to know how they are regulated. This study presents a reverse engineering strategy able to infer fullgenomic interaction networks for C. albicans based on a linear regression, utilizing the sparseness criterion (LASSO). To overcome the limited amount of expression data and small number of known interactions, we utilize different prior-knowledge sources guiding the network inference to a knowledge driven solution. Since, no database of known interactions for C. albicans exists, we use a textmining system which utilizes full-text research papers to identify known regulatory interactions. By comparing with these known regulatory interactions, we find an optimal value for global modelling parameters weighting the influence of the sparseness criterion and the prior-knowledge. Furthermore, we show that soft integration of prior-knowledge additionally improves the performance. Finally, we compare the performance of our approach to state of the art network inference approaches.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1070029Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
 J. J. Faith et al., "Large-scale mapping and validation of escherichia coli transcriptional regulation from a compendium of expression profiles." PLoS Biol, vol. 5, no. 1, p. e8, Jan 2007. (Online). Available: http://dx.doi.org/10.1371/journal.pbio.0050008
 R. Opgen-Rhein and K. Strimmer, "From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data." BMC Syst Biol, vol. 1, p. 37, 2007. (Online). Available: http://dx.doi.org/10.1186/ 1752-0509-1-37
 A. A. Margolin et al., "Aracne: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context." BMC Bioinformatics, vol. 7 Suppl 1, p. S7, 2006. (Online). Available: http://dx.doi.org/10.1186/1471-2105-7-S1-S7
 A. J. Butte and I. S. Kohane, "Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements." Pac Symp Biocomput, pp. 418-429, 2000.
 M. Gustafsson, M. H┬¿ornquist, and A. Lombardi, "Constructing and analyzing a large-scale gene-to-gene regulatory network-lassoconstrained inference and biological validation." IEEE/ACM Trans Comput Biol Bioinform, vol. 2, no. 3, pp. 254-261, 2005. (Online). Available: http://dx.doi.org/10.1109/TCBB.2005.35
 M. Hecker et al., "Integrative modeling of transcriptional regulation in response to antirheumatic therapy." BMC Bioinformatics, vol. 10, p. 262, 2009. (Online). Available: http://dx.doi.org/10.1186/1471-2105-10-262
 M. Gustafsson, M. H┬¿ornquist, J. Bj┬¿orkegren, and J. Tegnr, "Soft integration of data for reverse engineering," in International Conference on Systems Biology,2008, 2008, pp. 127-127.
 J. Linde, D. Wilson, B. Hube, and R. Guthke, "Regulatory network modelling of iron acquisition by a fungal pathogen in contact with epithelial cells." BMC Syst Biol, vol. 4, no. 1, p. 148, 2010. (Online). Available: http://dx.doi.org/10.1186/1752-0509-4-148
 H. Yoon et al., "Coordinated regulation of virulence during systemic infection of salmonella enterica serovar typhimurium." PLoS Pathog, vol. 5, no. 2, p. e1000306, Feb 2009. (Online). Available: http://dx.doi.org/10.1371/journal.ppat.1000306
 R. Guthke et al., "Discovery of gene regulatory networks in aspergillus fumigatus ." Lect Notes Bioinf, vol. 4366, pp. 22-41, 2007.
 Y.-C. Wang et al., "Global screening of potential candida albicans biofilm-related transcription factors via network comparison." BMC Bioinformatics, vol. 11, p. 53, 2010. (Online). Available: http: //dx.doi.org/10.1186/1471-2105-11-53
 A. M. Huerta, H. Salgado, D. Thieffry, and J. Collado-Vides, "RegulonDB: a database on transcriptional regulation in Escherichia coli," Nucleic Acids Res., vol. 26, no. 1, pp. 55-59, 1998.
 R. Edgar, M. Domrachev, and A. E. Lash, "Gene expression omnibus: Ncbi gene expression and hybridization array data repository." Nucleic Acids Res, vol. 30, no. 1, pp. 207-210, Jan 2002.
 L. S. Wilson et al., "The direct cost and incidence of systemic fungal infections." Value Health, vol. 5, no. 1, pp. 26-34, 2002.
 B. Hube, "From commensal to pathogen: stage- and tissuespecific gene expression of candida albicans." Curr Opin Microbiol, vol. 7, no. 4, pp. 336-341, Aug 2004. (Online). Available: http://dx.doi.org/10.1016/j.mib.2004.06.003
 K. Zakikhany et al., "In vivo transcript profiling of candida albicans identifies a gene essential for interepithelial dissemination." Cell Microbiol, vol. 9, no. 12, pp. 2938-2954, Dec 2007. (Online). Available: http://dx.doi.org/10.1111/j.1462-5822.2007.01009.x
 W. A. Baumgartner(Jr.) et al., "Manual curation is not sufficient for annotation of genomic databases." in ISMB/ECCB (Supplement of Bioinformatics), 2007, pp. 41-48.
 U. Hahn, J. Wermter, R. Blasczyk, and P. A. Horn, "Text mining: Powering the database revolution (correspondence)," Nature, vol. 448, no. 7150, p. 130, 2007.
 L. Hirschman, A. S. Yeh, C. Blaschke, and A. Valencia, "Overview of biocreative: Critical assessment of information extraction for biology," BMC Bioinformatics, vol. 6, no. Supplement 1: S1, 2005.
 J.-D. Kim et al., "Overview of BioNLP-09 Shared Task on Event Extraction," in Proceedings BioNLP 2009. Companion Volume: Shared Task on Event Extraction. Boulder, Colorado, USA, June 4-5, 2009, 2009, pp. 1-9.
 C. Rodr´ıguez-Penagos, H. Salgado, I. Mart´ınez-Flores, and J. Collado- Vides, "Automatic reconstruction of a bacterial regulatory network using natural language processing," BMC Bioinformatics, vol. 8, no. 293, 2007. (Online). Available: http://www.biomedcentral.com/1471-2105/8/293
 U. Hahn et al., "How feasible and robust is the automatic extraction of gene regulation events? a cross-method evaluation under lab and real-life conditions," in Proceedings of the NAACL workshop on BioNLP 2009. Association for Computational Linguistics, 2009, pp. 37-45.
 D. Albrecht, O. Kniemeyer, A. A. Brakhage, and R. Guthke, "Missing values in gel-based proteomics." Proteomics, vol. 10, no. 6, pp. 1202-1211, Mar 2010. (Online). Available: http://dx.doi.org/10.1002/ pmic.200800576
 T. Hastie et al., "Imputing missing data for gene expression arrays," 1999.
 W. Stacklies et al., "pcamethods-a bioconductor package providing pca methods for incomplete data." Bioinformatics, vol. 23, no. 9, pp. 1164-1167, May 2007. (Online). Available: http://dx.doi.org/10.1093/ bioinformatics/btm069
 M. B. Arnaud et al., "Candida genome database," http://www.candidagenome.org/.
 U. G┬¿uldener et al., "MPact: the MIPS protein interaction resource on yeast," Nucleic Acids Research, vol. 34, no. Database issue, pp. D436-441, Jan. 2006, PMID: 16381906.
[Online]. Available: http://www.ncbi.nlm.nih.gov/pubmed/16381906
 G. D. Bader, D. Betel, and C. W. V. Hogue, "Bind: the biomolecular interaction network database." Nucleic Acids Res, vol. 31, no. 1, pp. 248-250, Jan 2003.
 S. Balaji et al., "Comprehensive analysis of combinatorial regulation using the transcriptional regulatory network of yeast." J Mol Biol, vol. 360, no. 1, pp. 213-227, Jun 2006. (Online). Available: http://dx.doi.org/10.1016/j.jmb.2006.04.029
 E. Wingender, P. Dietze, H. Karas, and R. Kn┬¿uppel, "TRANSFAC: a database on transcription factors and their DNA binding sites," Nucleic Acids Research, vol. 24, no. 1, pp. 238 -241, Jan. 1996. (Online). Available: http://nar.oxfordjournals.org/content/24/1/238.abstract
 E. Buyko, E. Faessler, J. Wermter, and U. Hahn, "Syntactic simplification and semantic enrichment - Trimming dependency graphs for event extraction," Computational Intelligence, in print, 2011.
 E. Buyko, E. Faessler, J. Wermter, and U. Hahn, "Event extraction from trimmed dependency graphs," in Proceedings BioNLP 2009. Companion Volume: Shared Task on Event Extraction. Boulder, Colorado, USA, June 4-5, 2009, 2009, pp. 19-27.
 R. Tibshirani, "Regression shrinkage and selection via the lasso," JOURNAL OF THE ROYAL STATISTICAL SOCIETY, SERIES B, vol. 58, pp. 267-288, 1994. (Online). Available: http://citeseerx.ist. psu.edu/viewdoc/summary?doi=10.1.1.35.7574
 H. Zou, "The adaptive lasso and its oracle properties," Journal of the American Statistical Association, vol. 101, pp. 1418-1429, December 2006. (Online). Available: http://ideas.repec.org/a/bes/jnlasa/ v101y2006p1418-1429.html
 I. J. B Efron, T Hastie and R. Tibshirani, "Least angle regression," The Annals of Statistics, vol. 32, no. 2, pp. 407-499, 2004.
 J. Friedman, T. Hastie, and R. Tibshirani, "Regularization paths for generalized linear models via coordinate descent," 2009. (Online). Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.149. 3333
 G. Stolovitzky, D. Monroe, and A. Califano, "Dialogue on reverseengineering assessment and methods: the dream of high-throughput pathway inference." Ann N Y Acad Sci, vol. 1115, pp. 1-22, Dec 2007. (Online). Available: http://dx.doi.org/10.1196/annals.1407.021