Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32146
Exons and Introns Classification in Human and Other Organisms

Authors: Benjamin Y. M. Kwan, Jennifer Y. Y. Kwan, Hon Keung Kwan


In the paper, the relative performances on spectral classification of short exon and intron sequences of the human and eleven model organisms is studied. In the simulations, all combinations of sixteen one-sequence numerical representations, four threshold values, and four window lengths are considered. Sequences of 150-base length are chosen and for each organism, a total of 16,000 sequences are used for training and testing. Results indicate that an appropriate combination of one-sequence numerical representation, threshold value, and window length is essential for arriving at top spectral classification results. For fixed-length sequences, the precisions on exon and intron classification obtained for different organisms are not the same because of their genomic differences. In general, precision increases as sequence length increases.

Keywords: Exons and introns classification, Human genome, Model organism genome, Spectral analysis

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1915


1] H. K. Kwan, B. Y. M. Kwan, and J. Y. Y. Kwan, "Novel methodologies for spectral classification of exon and intron sequences," EURASIP Journal on Advances in Signal Processing, vol. 2011, 2011 (in press).
[2] R. A. Dalloul, J. A. Long, A. V. Zimin, et al. "Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): Genome assembly and analysis", PLoS Biology, vol. 8, pii: e1000475, 2010.
[3] P. D. Cristea, "Genetic signal representation and analysis," in Proceedings of Society of Photo-Optical Instrumentation Engineers (SPIE) Conference, vol. 4623, January 2002, pp. 77- 84.
[4] M. Akhtar, J. Epps, and E. Ambikairajah, "Signal processing in sequence analysis: Advances in eukaryotic gene prediction," IEEE Journal of Selected Topics in Signal Processing, vol. 2, pp. 310-321, June 2008.
[5] T. Holden, R. Subramaniam, R. Sullivan, E. Cheng, C. Sneider, G. Tremberger, Jr. A. Flamholz, D. H. Leiberman, and T. D. Cheung, "ATCG nucleotide fluctuation of Deinococcus radiodurans radiation genes," in Proceedings of Society of Photo-Optical Instrumentation Engineers (SPIE), vol. 6694, August 2007, pp. 669417-1 to 669417-10.
[6] H. E. Stanley, S. V. Buldyrev, A. L. Goldberger, Z. D. Goldberger, S, Havlin, S. M. Ossadnik, C.-K. Peng, and M. Simmons, "Statistical mechanics in biology: How ubiquitous are long-range correlations?" Physica A, vol. 205, pp. 214-253, April 1994.
[7] A. S. Nair and S. S. Pillai, "A coding measure scheme employing electron-ion interaction pseudo potential (EIIP)," Bioinformation, vol. 1, pp. 197-202, October 2006.
[8] N. Chakravarthy, A. Spanias, L. D. Lasemidis, and K. Tsakalis, "Autoregressive modeling and feature analysis of DNA sequences," EURASIP Journal of Genomic Signal Processing, vol. 1, pp. 13-28, January 2004.
[9] P. D. Cristea, "Conversion of nucleotides sequences into genomic signals," Journal of Cellular and Molecular Medicine, vol. 6, pp. 279-303, April-June 2002.
[10] S. Tiwari, S. Ramachandran, A. Bhattacharya, S. Bhattacharya, and R. Ramaswamy, "Prediction of probable genes by Fourier analysis of genomic sequences," Bioinformatics (CABIOS), vol. 13, issue 3, pp. 263-270, 1997.
[11] D. Karolchik, A. S. Hinrichs, T. S. Furey, K. M. Roskin, C. W. Sugnet, D. Haussler, and W. J. Kent, "The UCSC Table Browser data retrieval tool," Nucleic Acids Research, vol. 32 (Database issue), pp. D493-496, 1 January 2004.
[12] J. Goecks, A. Nekrutenko, J. Taylor, and The Galaxy Team, "Galaxy: A comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences," Genome Biology, vol. 11, issue 8, article R86, 25 August 2010.
[13] D. Blankenberg, G. Von Kuster, N. Coraor, G. Ananda, R. Lazarus, M. Mangan, A. Nekrutenko, and J. Taylor, "Galaxy: A web-based genome analysis tool for experimentalists," Current Protocols in Molecular Biology, chapter 19, unit 19.10.1-21, January 2010.
[14] B. Giardine, C. Riemer, R. C. Hardison, R. Burhans, L. Elnitski, P. Shah, Y. Zhang, D. Blankenberg, I. Albert, J. Taylor, W. Miller, W. J. Kent, and A. Nekrutenko, "Galaxy: A platform for interactive large-scale genome analysis," Genome Research, vol. 15, issue 10, pp. 1451-1455, 15 October 2005.
[15] J. E. Allen and S. L. Salzberg, "JIGSAW: Integration of multiple sources of evidence for gene prediction," Bioinformatics, vol. 21, no. 18, pp. 3596-603, 2005.
[16] H. Jiang and W. H. Wong, "SeqMap: Mapping massive amount of oligonucleotides to the genome," Bioinformatics, vol. 24, no. 20, pp. 2395-2396, 2008.