An Efficient Multi Join Algorithm Utilizing a Lattice of Double Indices

Hanan A. M. Abd Alla; Lilac A. E. Al-Safadi

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32797

An Efficient Multi Join Algorithm Utilizing a Lattice of Double Indices

Authors: Hanan A. M. Abd Alla, Lilac A. E. Al-Safadi

Abstract:

In this paper, a novel multi join algorithm to join multiple relations will be introduced. The novel algorithm is based on a hashed-based join algorithm of two relations to produce a double index. This is done by scanning the two relations once. But instead of moving the records into buckets, a double index will be built. This will eliminate the collision that can happen from a complete hash algorithm. The double index will be divided into join buckets of similar categories from the two relations. The algorithm then joins buckets with similar keys to produce joined buckets. This will lead at the end to a complete join index of the two relations. without actually joining the actual relations. The time complexity required to build the join index of two categories is Om log m where m is the size of each category. Totaling time complexity to O n log m for all buckets. The join index will be used to materialize the joined relation if required. Otherwise, it will be used along with other join indices of other relations to build a lattice to be used in multi-join operations with minimal I/O requirements. The lattice of the join indices can be fitted into the main memory to reduce time complexity of the multi join algorithm.

Keywords: Multi join, Relation, Lattice, Join indices.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1332820

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1246

References:

[1] D. Knuth, The Art of Computer Programming: Volume 3 / Sorting and Searching, Addison-Wesley Publishing Company, 1973.
[2] Y. Azar, A. Broder, A. Karlin, and E. Upfal, "Balanced allocations," in Proceedings of 26th ACM Symposium on the Theory of Computing, 1994, pp.593-602.
[3] Jan Van Lunteren, "Searching very large routing tables in wide embedded memory," in Proceedings of IEEE Globecom, November 2001.
[4] R. Anantha Krishna, A. Das, J. Gehrke, F. Korn, S. Muthukrishnan, and D. Shrivastava, "Efficient approximation of correlated sums on data streams," TKDE, 2003.
[5] A. Arasu and G. S. Manku, "Approximate counts and quantiles over sliding windows," PODS, 2004.
[6] N. Bandi, C. Sun, D. Agrawal, and A. El Abbadi, "Hardware acceleration in commercial databases: A case study of spatial operations," VLDB, 2004.
[7] Peter A. Boncz, Stefan Manegold, and Martin L. Kersten, "Database architecture optimized for the new bottleneck: Memory access," in Proceedings of the Twenty-fifth International Conference on Very Large Databases, 1999, pp. 54-65.
[8] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms. MIT Press, Cambridge, MA, 2nd edition, 2001.
[9] Abhinandan Das, Johannes Gehrke, and Mirek Riedewald, "Approximate join processing over data streams," in Proceedings of the 2003 ACM SIGMOD international conference on Management of data, ACM Press, 2003, pp.40-51.
[10] Stefan Manegold, Peter A. Boncz, and Martin L. Kersten, "What happens during a join? Dissecting CPU and memory optimization effects," in Proceedings of 26th International Conference on Very Large Data Bases, 2000, pp. 339-350.
[11] A. Andersson, T. Hagerup, J. H┬░astad, and O. Petersson, "Tight bounds for searching a sorted array of strings," SIAM Journal on Computing, 30(5):1552-1578, 2001.
[12] L. Arge, P. Ferragina, R. Grossi, and J.S. Vitter, "On sorting strings in external memory," ACM STOC -97, 1997, pp.540-548.
[13] M.A. Bender, E.D. Demaine, andM. Farach-Colton, "Cache-oblivious Btrees," IEEE FOCS -00, 2000, pp.399-409.
[14] J.L. Bentley and R. Sedgewick, "Fast algorithms for sorting and searching strings," ACM-SIAM SODA -97, 1997, pp.360-369.