Impact of Stack Caches: Locality Awareness and Cost Effectiveness
Authors: Abdulrahman K. Alshegaifi, Chun-Hsi Huang
Abstract:
Treating data based on its location in memory has received much attention in recent years due to its different properties, which offer important aspects for cache utilization. Stack data and non-stack data may interfere with each other’s locality in the data cache. One of the important aspects of stack data is that it has high spatial and temporal locality. In this work, we simulate non-unified cache design that split data cache into stack and non-stack caches in order to maintain stack data and non-stack data separate in different caches. We observe that the overall hit rate of non-unified cache design is sensitive to the size of non-stack cache. Then, we investigate the appropriate size and associativity for stack cache to achieve high hit ratio especially when over 99% of accesses are directed to stack cache. The result shows that on average more than 99% of stack cache accuracy is achieved by using 2KB of capacity and 1-way associativity. Further, we analyze the improvement in hit rate when adding small, fixed, size of stack cache at level1 to unified cache architecture. The result shows that the overall hit rate of unified cache design with adding 1KB of stack cache is improved by approximately, on average, 3.9% for Rijndael benchmark. The stack cache is simulated by using SimpleScalar toolset.
Keywords: Hit rate, Locality of program, Stack cache, and Stack data.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1124101
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1516References:
[1] G. E. Blelloch and B. M. Maggs, “Parallel algorithms”, Algorithms and theory of computation handbook, Chapman & Hall/CRC, pp. 25–25, 2010.
[2] E. Horowitz and A. Zorat, “Divide-and-conquer for parallel processing”, IEEE Transactions on Computer, vol. C-32, no. 6, pp. 582–585, 1983.
[3] L. K. Melhus, “Analyzing contextual bias of program execution on modern CPUs”, M.S. Thesis, Norwegian University of Science and Technology, 2013
[4] A. Hemsath, R. Morton and J. Sjodin, “Implementing a stack cache”, Rice University, Advanced Microprocessor Architecture, http://www.owlnet.rice.edu/~elec525/projects/SCreport.pdf (2002).
[5] L. E. Olson, Y. Eckert, S. Manne and M. D. Hill, “Revisiting stack caches for energy efficiency”, University of Wisconsin, Technical Report. TR1813, 2014
[6] D. Burger and T. M. Austin, “The SimpleScalar tool set, version 2.0”, University of Wisconsin-Madison Computer Sciences Department, Technical Report. TR1342, 1997
[7] P. Kataria, "Parallel quicksort implementation using MPI and pthreads”, http://www.winlab.rutgers.edu/~pkataria/pdc.pdf (2008).
[8] M. A. Ertl, “Implementation of stack-based languages on register machines”, Ph.D. Thesis, Technische Universit\"at Wien, Austria, 1996
[9] M. F. Ionescu and K. E. Schauser. "Optimizing parallel bitonic sort", Parallel Processing Symposium, 1997. Proceedings., 11th International. IEEE, pp. 303-309, 1997.
[10] R. Romansky and Y. Lazarov, “Stack cache memory”, Journal of Information, Control and Management Systems, vol. 1, no. 2, pp. 29–37, 2003.