Maximum Common Substructure Extraction in RNA Secondary Structures Using Clique Detection Approach
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32769
Maximum Common Substructure Extraction in RNA Secondary Structures Using Clique Detection Approach

Authors: Shih-Yi Chao

Abstract:

The similarity comparison of RNA secondary structures is important in studying the functions of RNAs. In recent years, most existing tools represent the secondary structures by tree-based presentation and calculate the similarity by tree alignment distance. Different to previous approaches, we propose a new method based on maximum clique detection algorithm to extract the maximum common structural elements in compared RNA secondary structures. A new graph-based similarity measurement and maximum common subgraph detection procedures for comparing purely RNA secondary structures is introduced. Given two RNA secondary structures, the proposed algorithm consists of a process to determine the score of the structural similarity, followed by comparing vertices labelling, the labelled edges and the exact degree of each vertex. The proposed algorithm also consists of a process to extract the common structural elements between compared secondary structures based on a proposed maximum clique detection of the problem. This graph-based model also can work with NC-IUB code to perform the pattern-based searching. Therefore, it can be used to identify functional RNA motifs from database or to extract common substructures between complex RNA secondary structures. We have proved the performance of this proposed algorithm by experimental results. It provides a new idea of comparing RNA secondary structures. This tool is helpful to those who are interested in structural bioinformatics.

Keywords: Clique detection, labeled vertices, RNA secondary structures, subgraph, similarity.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1072684

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1406

References:


[1] M. Hochsmann, B. Voss, and R. Giegerich, "Pure multiple RNA secondary structure alignments: a progressive profile approach," IEEE Trans. on Computational Biology and Bioinformatics, vol. 1, pp. 53-62, 2004.
[2] J. Liu, J. T. L. Wang, J. Hu, and B. Tian, "A method for aligning RNA secondary structures and its application to RNA motif detection," BMC Bioinformatics, vol. 6, pp. 89-109, 2005.
[3] J. Allali and M. F. Sagot, "A new distance for high level RNA secondary structure comparison," IEEE Trans. on Computational Biology and Bioinformatics, vol. 2, pp. 1-11, 2005.
[4] G. D. Collins, S. Le and K. Zhang, "A new algorithm for computing similarity between RNA structures," Information Sciences, vol. 139, pp. 59-77, 2001.
[5] S. Dulucq and L. Tichit, "RNA secondary structure comparison: exact analysis of the Zhang-Sasha tree edit-algorithm," Theoretical Computer Science, vol. 306, pp. 471-484, 2003.
[6] J. T. L. Wang, B.A. Shapiro, D. Shasha, K. Zhang and K.M. Currey, "An algorithm for finding the largest approximately common substructures of two trees," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, pp. 889-895, 1998.
[7] I. L. Hofacker, "Vienna RNA secondary structure server," Nucleic Acids Research, vol. 31, pp. 3429-3431, 2003.
[8] T.J. Macke, D. J.Ecker, R.R. Gutell, D. Gautheret, D. A. Case and R. Sampath, "RNAMotif, an RNA secondary structure definition and search algorithm," Nucleic Acids Research, vol. 29, pp. 4724-4735, 2001.
[9] K. D. Pruitt, T. Tatusova and R. D. Maglott, "NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins," Nucleic Acids Research, vol. 35(Database issue), pp. D61-65, 2007.
[10] G. Levi, "A note on the derivation of maximal common subgraphs of two directed or undirected graphs," Calcolo, vol. 9, pp. 347-352, 1973.
[11] J. W. Raymond and P. Willett, "Maximum common subgraph isomorphism algorithms for the matching of chemical structures," Journal of Computer-aided Molecular Design, vol. 16, pp. 521-533, 2002.
[12] P. Pardalos and J. Xue, "The maximum clique problem," J. Global Optimiz, vol. 4, pp. 301-328, 1994.
[13] R. Carraghan and P. M. Pardalos, "An exact algorithm for the maximum clique problem," Operations Research Letters, vol. 9, pp.375-382, 1990.
[14] N. C. Lau, L. P. Lim, E.G. Weinstein and D. P. Bartel, "An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans," Science, vol. 294, pp. 858-862, 2001.
[15] L. P. Lim, N.C. Lau, E.G. Weinstein, A. Abdelhakim, S. Yekta, M. W. Rhoades, C.B. Burge, D.P. Bartel, "The microRNAs of Caenorhabditis elegans," Genes & Development, vol. 17, pp. 991-1008, 2003.
[16] S. Griffiths-Jones, A. Bateman, M. Marshall, A. Khanna and S. R. Eddy, "Rfam: and RNA family database," Nucleic Acids Research, vol. 31, pp. 439-441, 2003.
[17] L. R. Otake, P. Scamborova, C. Hashimoto, J. A. Steltz, "The divergent U12-type spliceosome is required for pre-mRNA splicing and is essential for development in Drosophila," Molecular Cell, vol. 9, pp. 439-446, 2002.
[18] N. C. Andrews, "Disorders of iron metabolism," New England Journal of Medicine, vol.341, pp. 1986-1995, 1999.
[19] C. A. R. Hoare, "Quicksort," Computer Journal, vol. 5, pp. 10-15, 1962.
[20] M. R. Garey and D. Johnson, "The complexity of near-optimal graph coloring," Journal of the Association for Computing Machinery, vol. 23, pp. 43-49, 1976.
[21] P. Turan, "On the theory of graphs," Colloq. Math, vol. 3, pp. 19-30, 1954.