Statistics of Exon Lengths in Animals, Plants, Fungi, and Protists
Eukaryotic protein-coding genes are interrupted by spliceosomal introns, which are removed from the RNA transcripts before translation into a protein. The exon-intron structures of different eukaryotic species are quite different from each other, and the evolution of such structures raises many questions. We try to address some of these questions using statistical analysis of whole genomes. We go through all the protein-coding genes in a genome and study correlations between the net length of all the exons in a gene, the number of the exons, and the average length of an exon. We also take average values of these features for each chromosome and study correlations between those averages on the chromosomal level. Our data show universal features of exon-intron structures common to animals, plants, and protists (specifically, Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Cryptococcus neoformans, Homo sapiens, Mus musculus, Oryza sativa, and Plasmodium falciparum). We have verified linear correlation between the number of exons in a gene and the length of a protein coded by the gene, while the protein length increases in proportion to the number of exons. On the other hand, the average length of an exon always decreases with the number of exons. Finally, chromosome clustering based on average chromosome properties and parameters of linear regression between the number of exons in a gene and the net length of those exons demonstrates that these average chromosome properties are genome-specific features.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1061342Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2078
 H. Naora and N.J. Deacon, "Relationship between the total size of exon and introns in the protein-coding genes of higher eukaryotes," Proc. Natl. Acad. Sci.. USA, vol. 79: pp. 6196-6200, 1982.
 J.D. Hawkins, "A survey on intron and exon lengths," Nucleic Acids Res., vol. 16: pp. 9893-9908, 1988.
 M. Deutsch and M. Long, "Intron-exon structures of eukaryotic model organisms," Nucleic Acids Res., vol. 27: p. 3219-3228, 1999.
 E.V. Kriventseva and M.S. Gelfand, "Statistical analysis of the exonintron structure of higher and lower eukaryote genes," J. Biomol. Struct. Dyn., vol. 17, pp. 281-288, 1999.
 A.A. Mironov and M.S. Gelfand, "Prediction and computer analysis of the exon-intron structure of human genes," Mol. Biol., vol. 38, pp. 70- 77, 2004.
 A.T. Ivashchenko and S.A. Atambayeva, "Variation in lengths of introns and exons in genes of the Arabidopsis thaliana nuclear genome," Russian Journal of Genetics, vol. 40, pp. 1179-1181, 2004.
 S.W. Roy and D. Penny, "Intron length distributions and gene prediction," Nucleic Acids Res., vol. 35, pp. 4737-4742, 2007.
 S.A. Atambayeva, V.A. Khailenko, and A.T. Ivashchenko, "Intron and exon length variation in arabidopsis, rice, nematode, and human," Mol. Biol., vol. 42, pp. 312-320, 2008.
 A.T Ivashchenko,. V.A. Khailenko, and S.A. Atambayeva, "Variation of the lengths of exons and introns in Human Genome genes," Russian Journal of Genetics, vol. 45, pp.16-22, 2009.
 F.S. Collins et al., "Finishing the euchromatic sequence of the human genome. International Human Genome Sequencing Consortium," Nature, vol. 431, pp. 931-945, 2004.
 E.M. Schwarz et al., "WormBase: better software, richer content," Nucleic Acids Res., vol. 34 (Database), pp. D475-D478, 2006.
 R.A. Drysdale and M.A. Crosby, "FlyBase: genes and gene models," Nucleic Acids Res., vol. 33 (Database), pp. D390-D395, 2005.
 B.J. Haas et al., "Complete reannotation of the Arabidopsis genome: methods, tools, protocols and the final release," BMC Biol., vol. 3, p. 7, 2005.
 J.M.J. Logsdon, A. Stoltzfus, and W.F. Doolittle, "Molecular evolution: recent cases of spliceosomal intron gain?" Curr. Biol., vol. 8: pp. R560- R563, 1998.
 J.M Archibald,. C.J. O'Kelly, and W.F. Doolittle, "The chaperonin genes of jakobid and jakobid-like flagellates: implications for eukaryotic evolution," Mol. Biol. Evol., vol. 19, pp. 422-431, 2002.
 A.T Ivashchenko, M.I. Tauasarova, and S.A. Atambayeva, "Exon-Intron Structure of Genes in Complete Fungal Genomes," Mol. Biol., vol. 43, pp. 24-31, 2009.
 B.J. Loftus et al., "The genome of the basidiomycetous yeast and human pathogen Cryptococcus neoformans," Science, vol. 307, pp. 1321-1324, 2005.
 D. Martinez et al., "Genome sequence of the lignocellulose degrading fungus Phanerochaete chrysosporium strain RP78," Nat. Biotechnol., vol. 22, pp. 695-700, 2004.
 S. Gudlaugsdottir, D.R. Boswell, G.R. Wood, and J. Ma, "Exon size distribution and the origin of introns," Genetica, vol. 131, pp. 299-306, 2007.
 Y. Ryabov and M. Gribskov, "Spontaneous symmetry breaking in genome evolution," Nucleic Acids Res., vol. 36, pp. 2756-2763, 2008.
 G. Cho and R.F. Doolittle, "Intron distribution in ancient paralogs supports random insertion and not random loss," J. Mol. Evol., vol. 44, pp. 573-584, 1997.
 S.W. Roy, "The origin of recent introns: transposons?" Genome Biol., vol. 5, p. 251, 2004.
 W. Gilbert, "The exon theory of genes," in Symp. Quant. Biol., Cold Spring Harbor, vol.52, 1987, pp.901-905.
 T. Cavalier-Smith, "Selfish DNA and the origin of introns," Nature, vol. 315, pp. 283-284, 1985.
 J.M. Logsdon and J.D. Palmer, "Origin of introns - early or late?" Nature, vol. 369, pp. 526-528, 1994.
 M.K. Sakharkar, V.T. Chow, and P. Kangueane, "Distributions of exons and introns in the human genome," In Silico Biol., vol. 4, pp. 387-393, 2004.