Approximation Algorithm for the Shortest Approximate Common Superstring Problem
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32804
Approximation Algorithm for the Shortest Approximate Common Superstring Problem

Authors: A.S. Rebaï, M. Elloumi

Abstract:

The Shortest Approximate Common Superstring (SACS) problem is : Given a set of strings f={w1, w2, ... , wn}, where no wi is an approximate substring of wj, i ≠ j, find a shortest string Sa, such that, every string of f is an approximate substring of Sa. When the number of the strings n>2, the SACS problem becomes NP-complete. In this paper, we present a greedy approximation SACS algorithm. Our algorithm is a 1/2-approximation for the SACS problem. It is of complexity O(n2*(l2+log(n))) in computing time, where n is the number of the strings and l is the length of a string. Our SACS algorithm is based on computation of the Length of the Approximate Longest Overlap (LALO).

Keywords: Shortest approximate common superstring, approximation algorithms, strings overlaps, complexities.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1083623

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1457

References:


[1] D. Maier, J. A. Storer, A note on complexity of the superstring problem, TR-233, Princeton University, Dept. EECS : (1977).
[2] D. Maier, The complexity of some problems on subsequences and supersequences, J. of ACM, Vol. 25, N┬░2 : (April 1978), p322-336.
[3] M. R. Garey, D. S. Johnson, Computers and intractability, Freeman (Ed.) : (1979).
[4] J. K. Gallant, D. Maier, J. Storer, On finding minimal length superstrings, J. Computer and System Sciences, Vol. 20, N┬░1 : (1980), p50-58.
[5] D. S. Hirschberg, Recent results on the complexity of commonsubsequence problems, Time Warps, String Edits and Macromolecules The Theory and Practice of Sequence Comparison, Sankoff and Kruskal (Eds.), Addison-Wesley Pub. Inc. : (1983), p325-330.
[6] H. Peltola, H. Söderlund, E. Ukkonen, SEQAID : A DNA sequence assembling program based on a mathematical model, Nucleic Acids Research, Vol. 12, N┬░1 : (1984), p307-321.
[7] S. Dear, R. Staden, A sequence assembly and editing program for efficient management of large projects, Nucleic Acids Research, Vol. 19 : (1991), p3907-3911.
[8] X. Huang, A contig assembly program based on sensitive detection of string overlaps, Genomics, N┬░14 : (1992), p18-25.
[9] J. Kececioglu, E. Myers, Combinatorial algorithms for DNA sequence assembly, Algorithmica, Vol. 13, N┬░12, Springer International Eds. : (1995), p7-51.
[10] G. G. Sutton, O. White, M. D. Adams, A. R. Kerlavage, TIGR Assembler: A new tool for assembing large shotgun sequencing projects, Genome Science & Technology, Vol. 1, N┬░1, Mary Ann Liebert, Inc. : (1995), p9-19.
[11] M. Elloumi, DNA Sequence Assembly Algorithms Based on Clustering Approaches, The 2000 International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences, METMBS'2000 (Las Vegas, Nevada, USA) : (June 2000).
[12] J. Cohen, Bioinformatics-An Introduction For Computer Scientists, ACM Computing Surveys, Vol 36, No 2, June 2004, p122-158.
[13] S. Rahmann, The Shortest Common Supersequence Problem In a Microarray Production Setting, Bioinformatics, Vol 19, (2003), p ii156- ii161.
[14] E. Ukkonen, A Linear time algorithm for finding approximate shortest common superstrings, Algorithmica, N┬░5: (1990), p313-323.
[15] J. Kececioglu, Exact and approximation algorithms for DNA sequence reconstruction, Ph.D. dissertation, Technical Report, N┬░91-26, Dept. of Computer Science, The University of Arisona, Tucson, AZ 85721 : (1991).
[16] J. Tarhio, E. Ukkonen, A greedy approximation algorithm for constructing shortest common superstrings, Theo. Comput. Sci., N┬░57: (1988), p131-145.
[17] S. Teng, F. Yao, Approximation shortest superstrings, 34th IEEE Symposium on Foundation of Computer Science : (1993).
[18] T. A. Jenkyns, The greedy travelling salesman's problem, Networks N┬░9 : (1979), p363-373.
[19] J. K. Gallant, The complexity of the overlap method for sequencing biopolymers, J. Theor. Biol., N┬░101 : (1982), p1-17.
[20] J. Van Leeuwen, Graph algorithms, Handbook of Theoretical Computer Science, Vol. A : Algorithms and Complexity, J. Van Leeuwen (Ed.), Elsevier Science Pub. B. V. : (1990), p527-631.
[21] R. E. Bellman, Dynamic Programming, Princeton University Press, New Jersey : (1957)
[22] R. E. Bellman, S. E. Dreyfus, Applied Dynamic Programming, Princeton University Press, New Jersey : (1962)
[23] R. A. Wagner, M. J. Fischer, The string-to-string correction problem, J. of ACM, Vol 21 No 1 : (1974), p168-173.
[24] P. H. Sellers, The theory and computation of Evolutionary distances : Pattern recognition, J. Algorithms, No 1 : (1980) p359-373.
[25] M. Elloumi, An algorithm for the approximate string-matching problem, Atlantic Symposium on Computational Biology, Genome Information Systems & Technology, CBGI'2001, (Durham, North Carolina, U.S.A.) : (March 2001).
[26] J. S. Vitter, Ph. Flajolet, Average-case analysis of algorithms and data structures, Handbook of Theoretical Computer Science, Vol. A : Algorithms and Complexity, J. Van Leeuwen (Ed.), Elsevier Science Pub. B. V. : (1990), p431-524.