Identifying New Sequence Features for Exon-Intron Discrimination by Rescaled-Range Frameshift Analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32799
Identifying New Sequence Features for Exon-Intron Discrimination by Rescaled-Range Frameshift Analysis

Authors: Sing-Wu Liou, Yin-Fu Huang

Abstract:

For identifying the discriminative sequence features between exons and introns, a new paradigm, rescaled-range frameshift analysis (RRFA), was proposed. By RRFA, two new sequence features, the frameshift sensitivity (FS) and the accumulative penta-mer complexity (APC), were discovered which were further integrated into a new feature of larger scale, the persistency in anti-mutation (PAM). The feature-validation experiments were performed on six model organisms to test the power of discrimination. All the experimental results highly support that FS, APC and PAM were all distinguishing features between exons and introns. These identified new sequence features provide new insights into the sequence composition of genes and they have great potentials of forming a new basis for recognizing the exonintron boundaries in gene sequences.

Keywords: Exon-Intron Discrimination, Rescaled-Range Frameshift Analysis, Frameshift Sensitivity, Accumulative Sequence Complexity

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1079688

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1135

References:


[1] M Burset, I A Seledtsov, and V V Solovyev. Analysis of canonical and non-canonical splice sites in mammalian genomes. Nucleic Acids Res, 28(21):4364-4375, 2000.
[2] M Burset, I A Seledtsov, and V V Solovyev. Splicedb: database of canonical and non-canonical mammalian splice sites. Nucleic Acids Res, 29(1):255-259, 2001.
[3] V R Chechetkin and V V Lobzin. Study of correlations in segmented dna sequences: application to structure coupling between exons and introns. J Theor Biol, 190(1):69-83, 1998.
[4] Tzu-Ming Chern, Erik van Nimwegen, Chikatoshi Kai, Jun Kawai,Piero Carninci, Yoshihide Hayashizaki, and Mihaela Zavolan. A simple physical model predicts small exon length variations. PLoS Genet,2(4):e45, 2006.
[5] J M Claverie and L Bougueleret. Heuristic informational analysis of sequences. Nucleic Acids Res, 14(1):179-196, 1986.
[6] Alexei Fedorov, Serge Saxonov, and Walter Gilbert. Regularities of context-dependent codon bias in eukaryotic genes. Nucleic Acids Res, 30(5):1192-1197, 2002.
[7] J W Fickett and C S Tung. Assessment of protein coding measures. Nucleic Acids Res, 20(24):6441-6450, 1992.
[8] C Frontali and E Pizzi. Similarity in oligonucleotide usage in introns and intergenic regions contributes to long-range correlation in the caenorhabditis elegans genome. Gene, 232(1):87-95, 1999.
[9] A Gabrielian and A Bolshoy. Sequence complexity and dna curvature. Comput Chem, 23(3-4):263-274, 1999.
[10] Vivek Gopalan, Tin Wee Tan, Bernett T K Lee, and Shoba Ranganathan. Xpro: database of eukaryotic protein-encoding genes. Nucleic Acids Res, 32(Database issue):D59-63, 2004.
[11] Matthew P Hare and Stephen R Palumbi. High intron sequence conservation across three mammalian orders suggests functional constraints. Mol Biol Evol, 20(6):969-978, 2003.
[12] Jennifer L Kabat, Sergio Barberan-Soler, Paul McKenna, Hiram Clawson, Tracy Farrer, and Alan M Zahler. Intronic alternative splicing regulators identified by comparative genomics in nematodes. PLoS Comput Biol, 2(7):e86, 2006.
[13] M Kozak. Comparison of initiation of protein synthesis in procaryotes, eucaryotes, and organelles. Microbiol Rev, 47(1):1-45, 1983.
[14] S W Liou and Y F Huang. Investigating the intrinsic differences in flank regions of exon-intron junction sites. In BMEI (2), volume 2, pages 96-101. IEEE Computer Society, 2008.
[15] Jacek Majewski and Jurg Ott. Distribution and characterization of regulatory elements in the human genome. Genome Res, 12(12):1827-1836, 2002.
[16] A J McCullough and S M Berget. G triplets located throughout a class of small vertebrate introns enforce intron borders and regulate splice site selection. Mol Cell Biol, 17(8):4562-4571, 1997.
[17] G Mengeritsky and T F Smith. New analytical tool for analysis of splice site sequence determinants. Comput Appl Biosci, 5(2):97-100, 1989.
[18] K Nakata, M Kanehisa, and C DeLisi. Prediction of splice junctions in mrna sequences. Nucleic Acids Res, 13(14):5327-5340, 1985.
[19] Y L Orlov and V N Potapov. Complexity: an internet resource for analysis of dna sequence complexity. Nucleic Acids Res, 32(Web Server issue):W628-33, 2004.
[20] Joanna L Parmley and Laurence D Hurst. Exonic splicing regulatory elements skew synonymous codon usage near intron-exon boundaries in mammals. Mol Biol Evol, 24(8):1600-1603, 2007.
[21] Pasquale Pollastro and Salvatore Rampone. Hs3d: Homo sapiens splice site data set. Nucleic Acids Research, Annual Database Issue, 2002.
[22] S Rampone. Recognition of splice junctions on dna sequences by brain learning algorithm. Bioinformatics, 14(8):676-684, 1998.
[23] P A Sharp. Splicing of messenger rna precursors. Science, 235(4790):766-771, 1987.
[24] V V Solovyev, A A Salamov, and C B Lawrence. Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. Nucleic Acids Res, 22(24):5156-5163, 1994.
[25] Rotem Sorek and Gil Ast. Intronic sequences flanking alternatively spliced exons are conserved between human and mouse. Genome Res, 13(7):1631-1637, 2003.
[26] H Sun and L A Chasin. Multiple splicing defects in an intronic false exon. Mol Cell Biol, 20(17):6414-6425, 2000.
[27] Rodger B Voelker and J Andrew Berglund. A comprehensive computational characterization of conserved mammalian intronic sequences reveals conserved motifs associated with constitutive and alternative splicing. Genome Res, 17(7):1023-1033, 2007.
[28] Erik Willie and Jacek Majewski. Evidence for codon bias selection at the pre-mrna level in eukaryotes. Trends Genet, 20(11):534-538, 2004.
[29] G K Wong, D A Passey, Y Huang, Z Yang, and J Yu. Is junk dna mostly intron dna? Genome Res, 10(11):1672-1678, 2000.