SC-LSH: An Efficient Indexing Method for Approximate Similarity Search in High Dimensional Space
Authors: Sanaa Chafik, ImaneDaoudi, Mounim A. El Yacoubi, Hamid El Ouardi
Abstract:
Locality Sensitive Hashing (LSH) is one of the most promising techniques for solving nearest neighbour search problem in high dimensional space. Euclidean LSH is the most popular variation of LSH that has been successfully applied in many multimedia applications. However, the Euclidean LSH presents limitations that affect structure and query performances. The main limitation of the Euclidean LSH is the large memory consumption. In order to achieve a good accuracy, a large number of hash tables is required. In this paper, we propose a new hashing algorithm to overcome the storage space problem and improve query time, while keeping a good accuracy as similar to that achieved by the original Euclidean LSH. The Experimental results on a real large-scale dataset show that the proposed approach achieves good performances and consumes less memory than the Euclidean LSH.
Keywords: Approximate Nearest Neighbor Search, Content based image retrieval (CBIR), Curse of dimensionality, Locality sensitive hashing, Multidimensional indexing, Scalability.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1094439
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2576References:
[1] A. Guttman. "R-trees: A dynamic index structure for spatial searching”. In SIGMOD, 1984, pp. 47-57.
[2] J. L. Bentley. "K-d trees for semidynamic point sets”. InSoCG, 1990, pp. 187-197.
[3] Weber, H.J. Schek and S. Blott, "A quantitative analysis and performance, study for similarity-search methods in high-dimensional spaces”. In Proceedings of 24rd International Conference on Very Large Data Bases, New York, USA, 1998, pp.194-205.
[4] S. Blott and R. Weber."A Simple Vector-Approximation File for Similarity Search in High-Dimensional Vector Spaces”. Technical Report 19, ESPRIT project HERMES, March 1997.
[5] L. Ye and Y. Hua. "The CMVAI-File: An Efficient Approximation- Based High-Dimensional Index Structure”,Multimedia Information Networking and Security (MINES), 2010.
[6] H.Lu, B.C.Ooi, H.T.Shen and X.Xue. "Hierarchical Indexing Structure for Efficient Similarity Search in Video Retrieval”. IEEE Transactions on Knowledge and Data Engineering, November 2006.
[7] D. Gorisse, M. Cord and F. Precioso. "Locality-sensitive hashing for chi2-Distance”, IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 2012, pp.402–409.
[8] H. Wang, J. Cao, L. Shu, and D. Rafiei."Locality sensitive hashing revisited: Filling the gap between theory and algorithm analysis”. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, 2013, pp. 1969-1978.
[9] A. Andoni and P. Indyk. "Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions”. In 47th Annual IEEE Symposium on foundations of Computer Science, 2006, pp. 459- 468.
[10] C. Calistru, C. Ribeiro and G.David. "High- Dimensional Indexing for Video Retrieval”. Multimedia - A Multidisciplinary Approach to Complex Issues, 2012.
[11] I. Daoudi, K. Idrissi and S.E.A Ouatik. "Kernel region approximation blocks for indexing heterogonous databases”. Multimedia and Expo, 2008 IEEE International Conference, 2008, pp. 1237-1240.
[12] T. Liu, A. W. Moore, A. G. Gray and K. Yang. "An Investigation of Practical Approximate Nearest Neighbor Algorithm”. In proceeding of: Advances in Neural Information Processing Systems 17. Vancouver, British Columbia, Canada, 2004.
[13] J. Gan, J. Feng, Q. Fang and W.Ng, "Locality-sensitive hashing scheme based on dynamic collision counting”. In proceeding SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. New York, USA, 2012, pp. 541-552.
[14] Q. Lv, W. Josephson, Z. Wang, M. Charikar and K.Li. "Multi-probe LSH: efficient indexing for high-dimensional similarity search”. In VLDB, 2007, pp.950-96.
[15] P.Indyk, and R.Motwani, "Approximate nearest neighbor: Towards removing the curse of dimensionality”. In Proceeding STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing, 1998, pp.604-613.
[16] M.S. Charikar. "Similarity estimation techniques from rounding algorithms”. In Proceedings of the Symposium on Theory of Computing, 2002.
[17] M. Datar, N. Immorlica, P. Indyk and V. Mirrokni. "Locality sensitive hashing scheme based on p-stable distributions”. In Proceedings of the ACM Symposium on Computational Geometry, 2004.
[18] A. Andoni and P. Indyk. E2LSH: Exact Euclidean locality-sensitive hashing. Implementationavailableat:http://www.mit.edu/~andoni/LSH/.
[19] A. Z. Broder, M. Charikar, A. M. Frieze, andM. Mitzenmacher. "Minwise independent permutations”. In STOC, 1998, pp. 327–336.
[20] S. Chafik, I.Daoudi, M.A EL Yacoubi, H. El Ouardi and B. Dorizzi. "Locality sensitive hashing for content-based image retrieval: A comparative experimental study”. The Fifth International Conference on Next Generation Networks and services (NGNs), 2014.( unpublished)
[21] Haveliwala, T., Gionis, A. and Indyk. "Scalable Techniques for Clustering the Web” (Extended Abstract). In Third International Workshop on the Web and Databases (WebDB 2000), Dallas, Texas, 2000.
[22] D. Feng and J. Yang, C. Liu. "An efficient indexing method for contentbased image retrieval”.Neurocomputing, April 2013, pp.103-114.
[23] J. Buhler and M. Tompa. "Finding motifs using randomprojections”. In Proceedings of the Annual International Conference on Computational Molecular Biology (RECOMB1), 2002, pp. 225-242.
[24] http://corpus-texmex.irisa.fr