An Integrated Predictor for Cis-Regulatory Modules
Authors: Darby Tien-Hao Chang, Guan-Yu Shiu, You-Jie Sun
Abstract:
Various cis-regulatory module (CRM) predictors have been proposed in the last decade. Several well-established CRM predictors adopted different categories of prediction strategies, including window clustering, probabilistic modeling and phylogenetic footprinting. Appropriate integration of them has a potential to achieve high quality CRM prediction. This study analyzed four existing CRM predictors (ClusterBuster, MSCAN, CisModule and MultiModule) to seek a predictor combination that delivers a higher accuracy than individual CRM predictors. 465 CRMs across 140 Drosophila melanogaster genes from the RED fly database were used to evaluate the integrated CRM predictor proposed in this study. The results show that four predictor combinations achieved superior performance than the best individual CRM predictor.
Keywords: Cis-regulatory module, transcription factor binding site.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1088588
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1650References:
[1] Su, J., S. A. Teichmann, and T.A. Down, Assessing computational methods of cis-regulatory module prediction. PLoS Computational Biology, 2010. 6(12): p. e1001020.
[2] Davidson, E. H., The regulatory genome: gene regulatory networks in development and evolution. 2010: Academic Press.
[3] Kazemian, M., M. H. Brodsky, and S. Sinha, Genome Surveyor 2.0: cis-regulatory analysis in Drosophila. Nucleic Acids Res, 2011. 39(Web Server issue): p. W79-85.
[4] Levine, M. and E. H. Davidson, Gene regulatory networks for development. Proc Natl Acad Sci U S A, 2005. 102(14): p. 4936-42.
[5] Johansson, O., et al., Identification of functional clusters of transcription factor binding motifs in genome sequences: the MSCAN algorithm. Bioinformatics, 2003. 19(Suppl 1): p. i169-i176.
[6] Zhou, Q. and W. H. Wong, CisModule: de novo discovery of cis-regulatory modules by hierarchical mixture modeling. Proc Natl Acad Sci U S A, 2004. 101(33): p. 12114-9.
[7] Frith, M.C., Cluster-Buster: finding dense clusters of motifs in DNA sequences. Nucleic Acids Research, 2003. 31(13): p. 3666-3668.
[8] Zhou, Q. and W. H. Wong, Coupling hidden Markov models for the discovery of Cis -regulatory modules in multiple species. The Annals of Applied Statistics, 2007. 1(1): p. 36-65.
[9] Baum, L. E. and T. Petrie, Statistical inference for probabilistic functions of finite state Markov chains. The annals of mathematical statistics, 1966. 37(6): p. 1554-1563.
[10] Gallo, S. M., et al., REDfly v3.0: toward a comprehensive database of transcriptional regulatory elements in Drosophila. Nucleic Acids Res, 2011. 39(Database issue): p. D118-23.
[11] Gallo, S. M., et al., REDfly: a Regulatory Element Database for Drosophila. Bioinformatics, 2006. 22(3): p. 381-3.
[12] Drysdale, R. A. and M. A. Crosby, FlyBase: genes and gene models. Nucleic Acids Res, 2005. 33(Database issue): p. D390-5.
[13] Marygold, S. J., et al., FlyBase: improvements to the bibliography. Nucleic Acids Res, 2013. 41(Database issue): p. D751-7.
[14] Karolchik, D., The UCSC Genome Browser Database. Nucleic Acids Research, 2003. 31(1): p. 51-54.
[15] Fujita, P. A., et al., The UCSC Genome Browser database: update 2011. Nucleic Acids Res, 2011. 39(Database issue): p. D876-82.
[16] Kulakovskiy, I. V. and V.J. Makeev, Discovery of DNA motifs recognized by transcription factors through integration of different experimental sources. Biophysics, 2010. 54(6): p. 667-674.
[17] Portales-Casamar, E., et al., JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Res, 2010. 38(Database issue): p. D105-10.
[18] Witten, I. H. and E. Frank, Data mining : practical machine learning tools and techniques. 2nd ed. Morgan Kaufmann series in data management systems. 2005, Amsterdam ; Boston, MA: Morgan Kaufman. xxxi, 525.