String Matching using Inverted Lists
Authors: Chouvalit Khancome, Veera Boonjing
Abstract:
This paper proposes a new solution to string matching problem. This solution constructs an inverted list representing a string pattern to be searched for. It then uses a new algorithm to process an input string in a single pass. The preprocessing phase takes 1) time complexity O(m) 2) space complexity O(1) where m is the length of pattern. The searching phase time complexity takes 1) O(m+α ) in average case 2) O(n/m) in the best case and 3) O(n) in the worst case, where α is the number of comparing leading to mismatch and n is the length of input text.
Keywords: String matching, inverted list, inverted index, pattern, algorithm.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1333732
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1555References:
[1] B., R. S., Moore, J.S., ''A fast string searching algorithm'', Communications of the ACM. 20, 1997, pp. 762-772.
[2] M. Chrochemore, Handcart C., ''Automata for Matching Patterns'', in Handbook of Formal Languages, Volume 2, Linear Modeling: Background and Application, G. Rozenberg and A. Salomaa ed., Springer-Verlag, Berlin. 1997, Ch. 9, pp. 399-462.
[3] M. Chrochemore ''Off-line serial exact string searching, in Pattern Matching Algorithms'', A. Apostolico and Z. Galil ed., Oxford University Press Chapter 1, pp 1-53.
[4] M. Crochemore; L. Gasieniec; W. Rytter, ''Constant-space string-matching in sublinear average time'', Compression and Complexity of Sequences 1997. Proceedings, 1997, pp. 230 - 239.
[5] C. Monz and M. de Rijke. (2002, August) Inverted Index Construction. Available: http://staff.science.uva.nl/~christof/courses/ir /transparencies/clean-w-05.pdf.
[6] M. Escardo, (2006, October 15), Complexity considerations for hash tables Available: http://www.cs.bham.ac.uk/~mhe/foundations2/ node92.html
[7] C. Charras and T. Lecroq. (2006, October 10). Handbook of Exact String Matching. Available: www-igm.univ-mlv.fr/~lecroq/string/string.pdf.
[8] G. Navarro and M. Raffinot. Flexible Pattern Matching in Strings. The press Syndicate of The University of Cambridge. 2002.
[9] Galil, Z., Giancarlo, R., ''On the exact complexity of string matching upper bounds'', SIAM Journal on Computing, 21(3), 1992, pp. 407-437.
[10] H. Kesong, W. Yongcheng, C. Guilin, ''Research on A Faster Algorithm for Pattern Matching'', Proceedings of the fifth International workshop on Information retrieval with Asian languages, 2000, pp. 119-124.
[11] Wikipedia, (2006, November 15), Hash table. Available: http://en.wikipedia.org/wiki/Hash_table.
[12] K. Loudon, (2006, November 24), Hash Tables. Available: www.oreilly.com/catalog/masteralgoc/chapter/ch08.pdf.
[13] V. H. DINH, (2006, November 24), Hash Table. Available: http://libetpan.sourceforge.net/doc/API/API/x161.html.
[14] J. Law, ''Book reviews: Review of "Flexible pattern matching in strings: practical on-line algorithms for text and biological sequences by Gonzolo Navarro and Mathieu Raffinot." Cambridge University Press 2002''. ACM SIGSOFT Software Engineering Notes, Volume 28 Issue 2 :, 2003, pp. 1-36.
[15] G. Navarro, M. Raffinot, ''Fast and flexible string matching by combining bit-parallelism and suffix automata'', December 2000 Journal of Experimental Algorithmics (JEA), Volume 5.
[16] D.E. Knuth, JR. Morris, , J.H., Pratt, V.R., ''Fast pattern matching in strings''. SIAM Journal on Computing 6(1), 1997, pp. 323-350.
[17] M. S. Ager, O. Danvy, H. K. Rohde, ''Fast partial evaluation of pattern matching in strings''. ACM Transactions on Programming Languages and Systems (TOPLAS), Volume 28 Issue 4, 2006, pp. 3-9.
[18] M. S. Ager, O. Danvy, H. K. Rohde, ''On obtaining Knuth, Morris, and Pratt's string matcher by partial evaluation''. Proceedings of the ASIAN symposium on Partial evaluation and semantics-based program manipulation, 2002, pp. 32-46.
[19] JR. Morris, J.H., Pratt, V.R., A linear pattern-matching algorithm, Technical Report 40, University of California, Berkeley. 1970.
[20] O. R. Zaïane. (2001, September 15), CMPUT 391: Inverted Index for Information Retrieval, University of Alberta. Available:http://www.cs.ualberta.ca /~zaiane/courses/cmput39-03/.
[21] R. B. Yates and B. R. Neto. ''Mordern Information Retrieval'', The ACM press.A Division of the Association for Computing Machinery,Inc. 1999, pp. 191-227.
[22] I. Simon, ''String matching and automata'', in Results and Trends in Theoetical Computer Science, Graz, Austria, J. Karhumaki, H. Maurer and G. Rozenerg ed., Lecture Notes in Computer Science 814, Springer - Verlag, Berlin, 1994, pp. 386-395.
[23] R.M. Karp, M.O. Rabin., Efficient randomized pattern matching algorithms, IBM Journal on Research Development 31(2), 1987, pp. 249-260.