Proteins Length and their Phenotypic Potential
Authors: Tom Snir, Eitan Rubin
Abstract:
Mendelian Disease Genes represent a collection of single points of failure for the various systems they constitute. Such genes have been shown, on average, to encode longer proteins than 'non-disease' proteins. Existing models suggest that this results from the increased likeli-hood of longer genes undergoing mutations. Here, we show that in saturated mutagenesis experiments performed on model organisms, where the likelihood of each gene mutating is one, a similar relationship between length and the probability of a gene being lethal was observed. We thus suggest an extended model demonstrating that the likelihood of a mutated gene to produce a severe phenotype is length-dependent. Using the occurrence of conserved domains, we bring evidence that this dependency results from a correlation between protein length and the number of functions it performs. We propose that protein length thus serves as a proxy for protein cardinality in different networks required for the organism's survival and well-being. We use this example to argue that the collection of Mendelian Disease Genes can, and should, be used to study the rules governing systems vulnerability in living organisms.
Keywords: Systems Biology, Protein Length
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1075705
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1803References:
[1] Botstein, D. and Risch, N. (2003) Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease, Nat Genet, 33 Suppl, pp 228-237.
[2] Lopez-Bigas, N. and Ouzounis, C.A. (2004) Genome-wide identification of genes likely to be involved in human genetic disease, Nucleic Acids Res., 32, pp. 3108-3114.
[3] Kondrashov, F.A., Ogurtsov, A.Y. and Kondrashov, A.S. (2004) Bioinformatical assay of human gene morbidity, Nucleic Acids Res., 32, pp. 1731-1737.
[4] Adie, E.A., Adams, R.R., Evans, K.L., Porteous, D.J. and Pickard, B.S. (2005) Speeding disease gene discovery by sequence based candidate prioritization, BMC Bioinformatics, 6, pp. 55-88.
[5] Jimenez-Sanchez, G., Childs, B. and Valle, D. (2001) Human disease genes, Nature, 409, pp. 853-855.
[6] Oti, M., Snel, B., Huynen, M.A. and Brunner, H.G. (2006) Predicting disease genes using protein-protein interactions, J Med Genet. 43, pp. 691-8.
[7] Perez-Iratxeta, C., Bork, P. and Andrade, M.A. (2002) Association of genes to genetically inherited diseases using data mining, Nat Genet, 31, pp. 316-319.
[8] Turner, F.S., Clutterbuck, D.R. and Semple, C.A. (2003) POCUS: mining genomic sequence annotation to predict disease genes, Genome Biol, 4, pp. R75.
[9] Seringhaus, M., Paccanaro, A., Borneman, A., Snyder, M. and Gerstein, M. (2006) Predicting essential genes in fungal genomes. Genome Res., 16, pp 1126-1135
[10] Lopez-Bigas, N., Audit, B., Ouzounis, C., Parra, G. and Guigo, R. (2005) Are splicing mutations the most frequent cause of hereditary disease?, FEBS Lett, 579, pp. 1900-1903.
[11] Hamosh, A., Scott, A.F., Amberger, J.S., Bocchini, C.A. and McKusick, V.A. (2005) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res., pp. D514-517.
[12] Drysdale, R. and The FlyBase Consortium, (2008). FlyBase : a database for the Drosophila research community. Methods Molec. Biol. 420, pp 45-59
[13] Chen, N. et. al (2005) WormBase: a comprehensive data resource for Caenorhabditis biology and genomics, Nucleic Acids Res, 33, pp. D383- 389.
[14] Cherry, J.M., Adler, C., Ball, C., Chervitz, S.A., Dwight, S.S., Hester, E.T., Jia, Y., Juvik, G., Roe, T., Schroeder, M., Weng, S. and Botstein, D. (2006) SGD: Saccharomyces Genome Database. Nucleic Acids Res. 26, pp. 73-79.
[15] Ihaka, R. and Gentleman, R. (1996) R: A language for data analysis and graphics, Journal of Computational and Graphical Statistics 5, pp. 299- 314.
[16] Karlin, S., Chen, C., Gentles, A.j. and Cleary, M. (2002) Associations between human disease genes and overlapping gene groups and multiple amino acid runs. Proc. Nat. Acad Sci 99, pp. 17008-17013
[17] Jeong, H., Mason, S.P., Barabási, A..L. and Oltvai, Z.N. (2001) Lethality and centrality in protein networks. Nature 411, pp. 41-42.
[18] Batada, N.N., Hurst, L.D. and Tyers, M. (2006) Evolutionary and Physiological Importance of Hub Proteins. PLoS Comput Biol 2, pp. e88.