{"title":"CompPSA: A Component-Based Pairwise RNA Secondary Structure Alignment Algorithm","authors":"Ghada Badr, Arwa Alturki","volume":126,"journal":"International Journal of Bioengineering and Life Sciences","pagesStart":444,"pagesEnd":455,"ISSN":"1307-6892","URL":"https:\/\/publications.waset.org\/pdf\/10007311","abstract":"The biological function of an RNA molecule depends
\r\non its structure. The objective of the alignment is finding the
\r\nhomology between two or more RNA secondary structures. Knowing
\r\nthe common functionalities between two RNA structures allows
\r\na better understanding and a discovery of other relationships
\r\nbetween them. Besides, identifying non-coding RNAs -that is not
\r\ntranslated into a protein- is a popular application in which RNA
\r\nstructural alignment is the first step A few methods for RNA
\r\nstructure-to-structure alignment have been developed. Most of these
\r\nmethods are partial structure-to-structure, sequence-to-structure, or
\r\nstructure-to-sequence alignment. Less attention is given in the
\r\nliterature to the use of efficient RNA structure representation and the
\r\nstructure-to-structure alignment methods are lacking. In this paper,
\r\nwe introduce an O(N2) Component-based Pairwise RNA Structure
\r\nAlignment (CompPSA) algorithm, where structures are given as
\r\na component-based representation and where N is the maximum
\r\nnumber of components in the two structures. The proposed algorithm
\r\ncompares the two RNA secondary structures based on their weighted
\r\ncomponent features rather than on their base-pair details. Extensive
\r\nexperiments are conducted illustrating the efficiency of the CompPSA
\r\nalgorithm when compared to other approaches and on different real
\r\nand simulated datasets. The CompPSA algorithm shows an accurate
\r\nsimilarity measure between components. The algorithm gives the
\r\nflexibility for the user to align the two RNA structures based on
\r\ntheir weighted features (position, full length, and\/or stem length).
\r\nMoreover, the algorithm proves scalability and efficiency in time and
\r\nmemory performance.","references":"[1] D. Gusfield, Algorithms on strings, trees and sequences: computer\r\nscience and computational biology. Cambridge University Press, 1997.\r\n[2] D. Sankoff, \u201cSimultaneous solution of the rna folding, alignment\r\nand protosequence problems,\u201d SIAM Journal on Applied Mathematics,\r\nvol. 45, no. 5, pp. 810\u2013825, 1985. [3] S. B. Needleman and C. D. Wunsch, \u201cA general method applicable to\r\nthe search for similarities in the amino acid sequence of two proteins,\u201d\r\nJournal of molecular biology, vol. 48, no. 3, pp. 443\u2013453, 1970.\r\n[4] T. F. Smith and M. S. Waterman, \u201cIdentification of common molecular\r\nsubsequences,\u201d Journal of molecular biology, vol. 147, no. 1, pp.\r\n195\u2013197, 1981.\r\n[5] J. Gorodkin, L. J. Heyer, and G. D. Stormo, \u201cFinding the most significant\r\ncommon sequence and structure motifs in a set of rna sequences,\u201d\r\nNucleic Acids Research, vol. 25, no. 18, pp. 3724\u20133732, 1997.\r\n[6] J. H. Havgaard, R. B. Lyngs\u00f8, G. D. Stormo, and J. Gorodkin, \u201cPairwise\r\nlocal structural alignment of rna sequences with sequence similarity less\r\nthan 40%,\u201d Bioinformatics, vol. 21, no. 9, pp. 1815\u20131824, 2005.\r\n[7] J. H. Havgaard, R. B. Lyngs\u00f8, and J. Gorodkin, \u201cThe foldalign web\r\nserver for pairwise structural rna alignment and mutual motif search,\u201d\r\nNucleic acids research, vol. 33, no. suppl 2, pp. W650\u2013W653, 2005.\r\n[8] J. H. Havgaard, E. Torarinsson, and J. Gorodkin, \u201cFast pairwise\r\nstructural rna alignments by pruning of the dynamical programming\r\nmatrix,\u201d PLOS computational biology, vol. 3, no. 10, p. e193, 2007.\r\n[9] E. Torarinsson, J. H. Havgaard, and J. Gorodkin, \u201cMultiple structural\r\nalignment and clustering of rna sequences,\u201d Bioinformatics, vol. 23,\r\nno. 8, pp. 926\u2013932, 2007.\r\n[10] M. Bauer, G. W. Klau, and K. Reinert, \u201cAccurate multiple\r\nsequence-structure alignment of rna sequences using combinatorial\r\noptimization,\u201d BMC bioinformatics, vol. 8, no. 1, p. 271, 2007.\r\n[11] S. Heyne, S. Will, M. Beckstette, and R. Backofen, \u201cLightweight\r\ncomparison of rnas based on exact sequence-structure matches,\u201d\r\nBioinformatics, p. btp065, 2009.\r\n[12] Y. Jiang, W. Xu, L. P. Thompson, R. R. Gutell, and D. P. Miranker,\r\n\u201cR-pass: A fast structure-based rna sequence alignment algorithm,\u201d\r\nin Bioinformatics and Biomedicine (BIBM), 2011 IEEE International\r\nConference on. IEEE, 2011, pp. 618\u2013622.\r\n[13] Y. Tabei, K. Tsuda, T. Kin, and K. Asai, \u201cScarna: fast and accurate\r\nstructural alignment of rna sequences by matching fixed-length stem\r\nfragments,\u201d Bioinformatics, vol. 22, no. 14, pp. 1723\u20131729, 2006.\r\n[14] T. K. Wong, K.-L. Wan, B.-Y. Hsu, B. W. Cheung, W.-K. Hon, T.-W.\r\nLam, and S.-M. Yiu, \u201cRnasalign: Rna structural alignment system,\u201d\r\nBioinformatics, vol. 27, no. 15, pp. 2151\u20132152, 2011.\r\n[15] M. Hochsmann, T. Toller, R. Giegerich, and S. Kurtz, \u201cLocal similarity\r\nin rna secondary structures,\u201d in Bioinformatics Conference, 2003. CSB\r\n2003. Proceedings of the 2003 IEEE. IEEE, 2003, pp. 159\u2013168.\r\n[16] M.-Y. Wu, C.-B. Yang, and K.-S. Huang, \u201cRna secondary structure\r\nalignment based on stem representation,\u201d in Proceedings of the 21st\r\nWorkshop on Combinational Mathematics and Computation Theory.\r\nCiteseer, pp. 60\u201369.\r\n[17] J. Liu, J. T. Wang, J. Hu, and B. Tian, \u201cA method for aligning rna\r\nsecondary structures and its application to rna motif detection,\u201d BMC\r\nbioinformatics, vol. 6, no. 1, p. 89, 2005.\r\n[18] C. Zhong and S. Zhang, \u201cEfficient alignment of rna secondary structures\r\nusing sparse dynamic programming,\u201d BMC bioinformatics, vol. 14, no. 1,\r\np. 269, 2013.\r\n[19] G. Badr and M. Turcotte, \u201cComponent-based matching for multiple\r\ninteracting rna sequences,\u201d in Bioinformatics Research and Applications.\r\nSpringer, 2011, pp. 73\u201386.\r\n[20] H. Arraqibah and G. Badr, \u201cExtended component-based motif\r\nlocalization for interacting rna structures,\u201d in ISBRA 2013, Charlotte,\r\nUS, May 20 2013.\r\n[21] B. Hunter, \u201cVisualization of secondary rna structure prediction\r\nalgorithms,\u201d 2006.\r\n[22] I. L. Hofacker, W. Fontana, P. F. Stadler, L. S. Bonhoeffer, M. Tacker,\r\nand P. Schuster, \u201cFast folding and comparison of rna secondary\r\nstructures,\u201d Monatshefte f\u00a8ur Chemie\/Chemical Monthly, vol. 125, no. 2,\r\npp. 167\u2013188, 1994.\r\n[23] S. W. Burge, J. Daub, R. Eberhardt, J. Tate, L. Barquist, E. P. Nawrocki,\r\nS. R. Eddy, P. P. Gardner, and A. Bateman, \u201cRfam 11.0: 10 years of rna\r\nfamilies,\u201d Nucleic acids research, p. gks1005, 2012.\r\n[24] D. A. Benson, M. Cavanaugh, K. Clark, I. Karsch-Mizrachi, D. J.\r\nLipman, J. Ostell, and E. W. Sayers, \u201cGenbank,\u201d Nucleic acids research,\r\np. gks1195, 2012.\r\n[25] A. Alturki, \u201cComponent based pair-wise rna secondary structure\r\nalignment algorithm,\u201d Master\u2019s thesis, King Saud University, 2014.\r\n[26] T. M. Lowe and S. R. Eddy, \u201ctrnascan-se: a program for improved\r\ndetection of transfer rna genes in genomic sequence,\u201d Nucleic acids\r\nresearch, vol. 25, no. 5, pp. 0955\u2013964, 1997.","publisher":"World Academy of Science, Engineering and Technology","index":"Open Science Index 126, 2017"}