Cache Analysis and Software Optimizations for Faster on-Chip Network Simulations
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 87760
Cache Analysis and Software Optimizations for Faster on-Chip Network Simulations

Authors: Khyamling Parane, B. M. Prabhu Prasad, Basavaraj Talawar

Abstract:

Fast simulations are critical in reducing time to market in CMPs and SoCs. Several simulators have been used to evaluate the performance and power consumed by Network-on-Chips. Researchers and designers rely upon these simulators for design space exploration of NoC architectures. Our experiments show that simulating large NoC topologies take hours to several days for completion. To speed up the simulations, it is necessary to investigate and optimize the hotspots in simulator source code. Among several simulators available, we choose Booksim2.0, as it is being extensively used in the NoC community. In this paper, we analyze the cache and memory system behaviour of Booksim2.0 to accurately monitor input dependent performance bottlenecks. Our measurements show that cache and memory usage patterns vary widely based on the input parameters given to Booksim2.0. Based on these measurements, the cache configuration having least misses has been identified. To further reduce the cache misses, we use software optimization techniques such as removal of unused functions, loop interchanging and replacing post-increment operator with pre-increment operator for non-primitive data types. The cache misses were reduced by 18.52%, 5.34% and 3.91% by employing above technology respectively. We also employ thread parallelization and vectorization to improve the overall performance of Booksim2.0. The OpenMP programming model and SIMD are used for parallelizing and vectorizing the more time-consuming portions of Booksim2.0. Speedups of 2.93x and 3.97x were observed for the Mesh topology with 30 × 30 network size by employing thread parallelization and vectorization respectively.

Keywords: cache behaviour, network-on-chip, performance profiling, vectorization

Procedia PDF Downloads 200