Distributed Splay Suffix Arrays: A New Structure for Distributed String Search
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33122
Distributed Splay Suffix Arrays: A New Structure for Distributed String Search

Authors: Tu Kun, Gu Nai-jie, Bi Kun, Liu Gang, Dong Wan-li

Abstract:

As a structure for processing string problem, suffix array is certainly widely-known and extensively-studied. But if the string access pattern follows the “90/10" rule, suffix array can not take advantage of the fact that we often find something that we have just found. Although the splay tree is an efficient data structure for small documents when the access pattern follows the “90/10" rule, it requires many structures and an excessive amount of pointer manipulations for efficiently processing and searching large documents. In this paper, we propose a new and conceptually powerful data structure, called splay suffix arrays (SSA), for string search. This data structure combines the features of splay tree and suffix arrays into a new approach which is suitable to implementation on both conventional and clustered computers.

Keywords: suffix arrays, splay tree, string search, distributedalgorithm

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1071982

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1780

References:


[1] I. H. Written, A. Moffat, and T. C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Van Nostrand Reinhold, 1994.
[2] W. B. Cavnar. Using an n-gram based document representation with a vector processing retrieval model. In Proc. Of the 3rd Text Retrieval Conference (TREC-3), pages 269-277. NIST special publication, 1995.
[3] G. A. Stephen. String Searching Algorithms. World Scientific Publishing, 1994.
[4] M. Crochemore and W. Rytter. Text Algorithms. Oxford Univ. Press, New York, 1994
[5] D. Gusfield. Algorithms on Strings, Trees, and Sequences. Cambridge Univ. Press, 1997
[6] E. M. McCreighl. A Space-Economical Suffix Tree Construction Algorithm. Journal of the ACM, 23, pp. 262-272, 1976.
[7] U. Manber and G. Myers Suffix Arrays: A New Method for On-Line String Searches. SIAM J. Comput. 22(5), pp 935-948, 1993.
[8] G. H. Gonnet, R. A. Baeza-Yates, et al. New indices for text: PAT trees and PAT arrays. In W. B. Frakes and R. A. Baeza-Yates, editors, Information Retrieval: Data Structure and Algorithms, pages 66-82. Prentice-Hall, New Jersey, 1992.
[9] M. Nagao and S. Mori. A new method of n-gram statistics for large number of n and automatic extraction of words and phrase form large text data of Japanese. In Proc of COLING-94, pages 611-615,1994.
[10] A. Apostolico. The myriad virtues of subword trees. In Combinatorial Algorithms on Words, pages 85-96. Springer-Verlag, 1985.
[11] A. Fellah. Concurrent and Distributed Data Structures for Multikeys Sorting on Computer Clusters. IEEE Proc. Of the 16th Intern. Symp. On High Performance Comput. Systems and Applications. Pp. 281-, Moncton, Canada, 2002.
[12] A. Fellah, A. Maamir and M. Abaza. Distributed Data Structures for Multikey Sorting. Intern. J. of Parallel and Distributed Systems and Networks, Vol. 2(2), pp. 62-68, 1999.
[13] Fellah, A. and Mawson, R. Distributed multidimensional suffix arrays for string search. Communications, Computers and signal Processing, 2003. PACRIM. 2003 IEEE Pacific Rim Conference on , Volume: 2, pp. 792-795, 2003.
[14] K. Sadakane. A fast algorithm for making suffix arrays and for burrows-wheeler transformation. In Proc. DCC-98, pages 129-138, 1998.
[15] D. E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming. Addison-Wesley, 1973.
[16] J. L. Bentley and R. Sedgewick. Fast algorithms for sorting and searching strings. In the 8th Annual ACM-SIAM Sympo. On Descrete Algorithms, pages 360-369, 1997.
[17] J. Kitajima, G. Navarro, B. Ribeiro, and N. Ziviani. Distributed generation of suffix arrays: a quicksort-based approach. In Proc. WSP-97, pages 53-69. Carleton University Press, 1997.
[18] J. Kitajima, B. Ribeiro, and N. Ziviani. Network and memory analysis in distributed parallel generation of PAT arrays. In Proc. 8th Brazilian Symp. On Comp. Arch. - High-Performance Processing, pages 193-202. Brazilian Comp. Soc., 1996.
[19] G. Navarro, J. Kitajima, B. Ribeiro, and N. Ziviani. Distributed generation of suffix arrays. In Proc. CPM-97, LNCS 1264, pages 102-115, 1997.
[20] Kitajima, J.P., Navarro, G. A fast distributed suffix array generation algorithm. String Processing and
[21] Information Retrieval Symposium,1999 and International Workshop on Groupware, Sept,1999,22-24 Pages:97-104
[22] D. D. Sleator and R. E. Tarjan. Self-adjusting Binary Search Trees. Journal of the ACM 32, pp. 652-686, 1985.