Dynamic Inverted Index Maintenance
Authors: Leo Galambos
Abstract:
The majority of today's IR systems base the IR task on two main processes: indexing and searching. There exists a special group of dynamic IR systems where both processes (indexing and searching) happen simultaneously; such a system discards obsolete information, simultaneously dealing with the insertion of new in¬formation, while still answering user queries. In these dynamic, time critical text document databases, it is often important to modify index structures quickly, as documents arrive. This paper presents a method for dynamization which may be used for this task. Experimental results show that the dynamization process is possible and that it guarantees the response time for the query operation and index actualization.
Keywords: Search engine, inverted file, index management.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1055164
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1381References:
[1] Apache, Jakarta project: Lucene. http://jakarta.apache.org.
[2] Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Chapter 8. ACM Press 1999.
[3] Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30, 1–7, 107–117, 1998.
[4] Brown, E.W.: Execution Performance Issues in Full-Text Information Retrieval. Computer Science Department, University of Massachusetts at Amherst, Technical Report 95-81, October 1995.
[5] Clarke, C., Cormack, G.: Dynamic Inverted Indexes for a Distributed Full-Text Retrieval System. TechRep MT-95-01, University of Waterloo, February 1995.
[6] Cutting, D., Pedersen, J.: Optimizations for dynamic inverted index maintenance. Proceedings of SIGIR, 405-411, 1990.
[7] Fox, E.A., Harman, D.K., Baeza-Yates, R., Lee, W.C.: Inverted fi les. In Information Retrieval: Data Structures and Algorithms, Prentice-Hall, pp 28-43.
[8] EGOTHOR, JAVA IR system. http://www.egothor.org/.
[9] Lim, L., Wang, M., Padmanabhan, S., Vitter, J.S., Agarwal, R.: Characterizing Web Document Change, LNCS 2118, 133–146, 2001.
[10] Lim, L., Wang, M., Padmanabhan, S., Vitter, J.S., Agarwal, R.: Dynamic Maintenance of Web Indexes Using Landmarks. Proc. of the 12th W3 Conference, 2003.
[11] Moffat, A., Zobel, J.: Self-Indexing Inverted Files for Fast Text Retrival. ACM TIS, 349–379, October 1996, Volume 14, Number 4.
[12] Mehlhorn, K.: Data Structures and Effi cient Algorithms, Springer Verlag, EATCS Monographs, 1984.
[13] Mehlhorn, K., Overmars, M.H.: Optimal Dynamization of Decomposable Searching Problems. IPL 12, 93–98, 1981.
[14] Mehlhorn, K.: Lower Bounds on the Effi ciency of Transforming Static Data Structures into Dynamic Data Structures. Math. Systems Theory 15, 1–16, 1981.
[15] Salton, G., Lesk, M.E.: Computer evaluation of indexing and text processing. Journal of the ACM, 15(1):8-36, January 1968.
[16] Salton, G.: The SMART Retrieval System - Experiments in Automatic Document Processing. Prentice Hall Inc., Englewood Cliffs, 1971.
[17] Tomasic, A., Garcia-Molina, H., Shoens, K.: Incremental Updates of Inverted Lists for Text Document Retrieval. Short Version of Stanford University Computer Science Technical Note STAN-CS-TN-93-1, December, 1993.