Non-Overlapping Hierarchical Index Structure for Similarity Search
Authors: Mounira Taileb, Sid Lamrous, Sami Touati
Abstract:
In order to accelerate the similarity search in highdimensional database, we propose a new hierarchical indexing method. It is composed of offline and online phases. Our contribution concerns both phases. In the offline phase, after gathering the whole of the data in clusters and constructing a hierarchical index, the main originality of our contribution consists to develop a method to construct bounding forms of clusters to avoid overlapping. For the online phase, our idea improves considerably performances of similarity search. However, for this second phase, we have also developed an adapted search algorithm. Our method baptized NOHIS (Non-Overlapping Hierarchical Index Structure) use the Principal Direction Divisive Partitioning (PDDP) as algorithm of clustering. The principle of the PDDP is to divide data recursively into two sub-clusters; division is done by using the hyper-plane orthogonal to the principal direction derived from the covariance matrix and passing through the centroid of the cluster to divide. Data of each two sub-clusters obtained are including by a minimum bounding rectangle (MBR). The two MBRs are directed according to the principal direction. Consequently, the nonoverlapping between the two forms is assured. Experiments use databases containing image descriptors. Results show that the proposed method outperforms sequential scan and SRtree in processing k-nearest neighbors.
Keywords: K-nearest neighbour search, multi-dimensional indexing, multimedia databases, similarity search.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1072940
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1563References:
[1] C. Faloutsos, "Searching Multimedia Databases by Content". Kluwer Academic Publishers, 1996.
[2] A. Guttman. "R-trees: A dynamic index structure for spatial searching". In Proceedings of the ACM SIGMOD International Conference on Management of data, Boston, Masachussets, USA, pages 47-57. 1984.
[3] T. Zhang, R. Ramakrishnan, M. Linvy, "Birch: An efficient data clustering method for very large databases". In Proceedings of the ACM SIGMOD International Conference on Management of Data, Montreal, Canada, pp.103-114, 1996.
[4] N. Beckman, H.P.Kriegel, R. Schneider, & B. Seeger, (1990). "The R*- tree: An efficient and robust access method for points and rectangles". In Proceeding of the ACM SIGMOD International Conference on Management of Data, Atlantic City, New Jersey, USA, pp. 322-331, 1990.
[5] S. Bertchold, D.A. Keim, H.P. Kriegel, "The X-tree: An index structure for hight-dimentional data". In Proceeding of the 22nd Internatioanl Conference on Very Large Data Bases, Mumbai (Bombay), India, pp. 28-39, 1996.
[6] D.A. White and R. Jain, "Similarity Indexing with the SS-Tree". In Proceeding of the 12th Int-l Conf Data Eng., pp. 516-523, 1996.
[7] N. Katayama, S. Satoh. "The SR-tree : An index structure for highdimensional nearest neighbor queries". In Proceeding of the ACM SIGMOD, International Conference on Management of Data, Tuscon, Arizona, USA, pages 369-380. 1997.
[8] J.T. Robinson, "The k-d-B-Tree: A Search Structure for Large Multidimensional Dynamic Indexes," Proc. ACM SIGMOD Int-l Conf. Management of Data, pp. 10-18, Apr. 1981.
[9] D.B. Lomet and B. Salzberg, "The hB-Tree: A Multiattribute Indexing Method with Good Guaranteed Performance," ACM Trans. Database Systems, vol. 15, no. 4, pp. 625-658, 1990.
[10] A. Henrich, "The LSDh-Tree: An Access Structure for Feature Vectors". Proc. 14th Int-l Conf. Data Eng., pp. 362-369, 1998.
[11]
[kd-Tree] J. Bentley, "Multidimensional Binary Search Trees Used for Associative Searching," Comm. ACM, vol. 18, no. 9, pp. 509- 517, 1975.
[12] D. L. Boley, "Principal Direction Divisive Partitioning", Data Mining and Knowledge Discovery 2(4):325-344, 1998.
[13] N. Bouteldja, V. Gouet-Brunet et M. Scholl. "Evaluation of strategies for multiple sphere queries with local image descriptors". In IS&T/SPIE Conference on Multimedia Content Analysis, Management, and Retrieval, San Jose CA, USA, 2006.
[14] S. Savaresi, D. L. Boley, S. Bittanti, G. Gazzaniga. "Choosing the cluster to split in bisecting divisive clustering algorithms". CSE Report TR-00-055, University of Minnesota, 2000.
[15] N. Roussopoulos, S. Kelly, F. Vincent. "Nearest Neighbor Queries". In Proceeding of ACM SIGMOD, May 1995.