A New Predictor of Coding Regions in Genomic Sequences using a Combination of Different Approaches
Authors: Aníbal Rodríguez Fuentes, Juan V. Lorenzo Ginori, Ricardo Grau Ábalo
Abstract:
Identifying protein coding regions in DNA sequences is a basic step in the location of genes. Several approaches based on signal processing tools have been applied to solve this problem, trying to achieve more accurate predictions. This paper presents a new predictor that improves the efficacy of three techniques that use the Fourier Transform to predict coding regions, and that could be computed using an algorithm that reduces the computation load. Some ideas about the combination of the predictor with other methods are discussed. ROC curves are used to demonstrate the efficacy of the proposed predictor, based on the computation of 25 DNA sequences from three different organisms.
Keywords: Bioinformatics, Coding region prediction, Computational load reduction, Digital Signal Processing, Fourier Transform.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1078931
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1657References:
[1] S. Tiwari, S. Ramachandran, A. Bhattacharya, S. Bhattacharya, and R. Ramaswamy, "Prediction of probable genes by Fourier analysis of genomic sequences," CABIOS, vol. 113, pp. 263-270, 1997.
[2] D. Anastassiou, "Genomic signal processing," IEEE Signal Processing Magazine, vol. 18, pp. 8-20, 2001.
[3] D. Kotlar and Y. Lavner, "Gene Prediction by Spectral Rotation Measure: A New Method for Identifying Protein-Coding Regions," Genome Research, vol. 13, pp. 1930-1937, 2003.
[4] P. P. Vaidyanathan and B.-J. Yoon, "Gene and exon prediction using allpass-based filters," ONR, 2002.
[5] M. Akhtar, E. Ambikairajah, and J. Epps, "Detection of Period-3 Behavior in Genomic Sequences Using Singular Value Decomposition," IEEE-International Conference on Emerging Technologies, pp. 13-17, 2005.
[6] J. A. Berger, S. K. Mitra, and J. Astola, "Power spectrum analysis for DNA sequences," Proceedings of the International Symposium on Signal Processing and its Applications (ISSPA 2003), Paris, France, pp. 29-32, 2003.
[7] G. Dodin, P. vanderghenynst, P. Levoir, C. Cordier, and L. Marcourt, "Fourier and Wavelet Transform Analysis, a Tool for Visualizing Regular Patterns in DNA Sequences," J. Theor. Biol, vol. 206, pp. 323- 326, 2000.
[8] J. A. Berger, S. K. Mitra, M. Carli, and A. Neri, "New approaches to genome sequence analysis based on digital signal processing," University of California, 2002.
[9] P. Bernaola-Galván, I. Grosse, P. Carpena, J. L. Oliver, R. Román- Roldán, and H. E. Stanley, "Finding Borders between Coding and Noncoding DNA Regions by an Entropic Segmentation Method," PHYSICAL REVIEW LETTERS, vol. 85, pp. 1342-1345, 2000.
[10] D. Nicorici and J. Astola, "Segmentation of DNA into Coding and Noncoding Regions Based on Recursive Entropic Segmentation and Stop-Codon Statistics," EURASIP Journal on Applied Signal Processing, pp. 81-91, 2004.
[11] A. R. Fuentes, J. V. L. Ginori, and R. G. Ábalo, "Detection of Coding Regions in Large DNA Sequences Using the Short Time Fourier Transform with Reduced Computational Load," In: Martínez-Trinidad, J.F., Carrasco Ochoa, J.A., Kittler, J. (eds.) CIARP 2006. LNCS, vol. 4225, pp. 902-909, 2006.
[12] P. D. Cristea, "Conversion of nucleotides sequences into genomic signals," J. Cell. Mol. Med., vol. 6, pp. 279-303, 2002.
[13] S.-C. Su, C. H. Yeh, and C. J. Kuo, "Structural Analysis of Genomic Sequences with Matched Filtering," IEEE Signal Proccessing Magazine, vol. 3, pp. 2893-2896, 2003.
[14] A. A. Tsonis, J. B. Elsner, and P. A. Tsonis, "Periodicity in DNA coding sequences: Implications in gene evolution," J. Theor. Biol., vol. 151, pp. 323-331, 1991.
[15] V. R. Chechetkin and A. Y. Turygin, "Size-dependence of threeperiodicity and long-range correlations in DNA sequences," Phys. Lett. A, vol. 199, pp. 75-80, 1995.
[16] J. Gao, Y. Cao, Y. Qi, and J. Hu, "Building Innovative Representations of DNA Sequences to Facilitate Gene Finding," IEEE INTELLIGENT SYSTEMS, pp. 34-39, 2005.
[17] C. E. Shannon, "A Mathematical Theory of Communication," The Bell System Technical Journal, vol. 27, pp. 379-423, 623-656, 1948.
[18] A. Rényi, "On measures of information and entropy," Proceedings of the 4th Berkeley Symposium on Mathematics, Statistics and Probability, pp. 547-561, 1960.
[19] J. A. Swets and R. M. Pickett, "Evaluation of diagnostic systems: methods from signal detection theory.," Nueva York: Academic Press, 1982.
[20] M. H. Zweig and G. Campbell, "Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine.," Clin Chem, vol. 39, pp. 561-577, 1993.
[21] "GenBank database," NCBI.