3D Network-on-Chip with on-Chip DRAM: An Empirical Analysis for Future Chip Multiprocessor
Authors: Thomas Canhao Xu, Bo Yang, Alexander Wei Yin, Pasi Liljeberg, Hannu Tenhunen
Abstract:
With the increasing number of on-chip components and the critical requirement for processing power, Chip Multiprocessor (CMP) has gained wide acceptance in both academia and industry during the last decade. However, the conventional bus-based onchip communication schemes suffer from very high communication delay and low scalability in large scale systems. Network-on-Chip (NoC) has been proposed to solve the bottleneck of parallel onchip communications by applying different network topologies which separate the communication phase from the computation phase. Observing that the memory bandwidth of the communication between on-chip components and off-chip memory has become a critical problem even in NoC based systems, in this paper, we propose a novel 3D NoC with on-chip Dynamic Random Access Memory (DRAM) in which different layers are dedicated to different functionalities such as processors, cache or memory. Results show that, by using our proposed architecture, average link utilization has reduced by 10.25% for SPLASH-2 workloads. Our proposed design costs 1.12% less execution cycles than the traditional design on average.
Keywords: 3D integration, network-on-chip, memory-on-chip, DRAM, chip multiprocessor.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1331743
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2446References:
[1] AMD, "The amd opteron 6000 series platform," May 2010, http://www.amd.com/us/products/server/processors/6000-seriesplatform/ pages/6000-series-platform.aspx.
[2] L. Benini and G. D. Micheli, "Networks on chips: A new soc paradigm," IEEE Computer, vol. 35, no. 1, pp. 70-78, January 2002.
[3] S. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, P. Iyer, A. Singh, T. Jacob, S. Jain, S. Venkataraman, Y. Hoskote, and N. Borkar, "An 80-tile 1.28tflops network-on-chip in 65nm cmos," in Solid-State Circuits Conference, 2007. ISSCC 2007. Digest of Technical Papers. IEEE International, Feb. 2007, pp. 98-589.
[4] Intel, "Single-chip cloud computer," May 2010, http://techresearch.intel.com/articles/Tera-Scale/1826.htm.
[5] ÔÇöÔÇö, "Intel core i7-980x processor extreme edition," May 2010, http://ark.intel.com/Product.aspx?id=47932.
[6] S. I. Association, "The international technology roadmap for semiconductors (itrs)," 2007, http://www.itrs.net/Links/2007ITRS/Home2007.htm.
[7] B. M. Rogers, A. Krishna, G. B. Bell, K. Vu, X. Jiang, and Y. Solihin, "Scaling the bandwidth wall: challenges in and avenues for cmp scaling," in Proceedings of the 36th annual international symposium on Computer architecture, June 2009, pp. 371-382.
[8] A. Weldezion, Z. Lu, R. Weerasekera, and H. Tenhunen, "3-d memory organization and performance analysis for multi-processor network-onchip architecture," in 3D System Integration, 2009. 3DIC 2009. IEEE International Conference on, 28-30 2009, pp. 1 -7.
[9] G. H. Loh, "3d-stacked memory architectures for multi-core processors," in ISCA -08: Proceedings of the 35th Annual International Symposium on Computer Architecture. Washington, DC, USA: IEEE Computer Society, 2008, pp. 453-464.
[10] D. Sylvester and K. Keutzer, "Getting to the bottom of deep submicron," in Computer-Aided Design, 1998. ICCAD 98. Digest of Technical Papers. 1998 IEEE/ACM International Conference on, Nov 1998, pp. 203-211.
[11] T. C. Xu, A. W. Yin, P. Liljeberg, and H. Tenhunen, "A study of 3d network-on-chip design for data parallel h.264 coding," in Proceedings of the 27th Norchip Conference, November 2009.
[12] G. L. Loi, B. Agrawal, N. Srivastava, S.-C. Lin, T. Sherwood, and K. Banerjee, "A thermally-aware performance analysis of vertically integrated (3-d) processor-memory hierarchy," in DAC -06: Proceedings of the 43rd annual Design Automation Conference. New York, NY, USA: ACM, 2006, pp. 991-996.
[13] M. Tremblay and S. Chaudhry, "A third-generation 65nm 16-core 32-thread plus 32-scout-thread cmt sparc processor," in ISSCC 2008, February 2008, pp. 82-83.
[14] IBM, "Ibm power 7 processor," in Hot chips 2009, August 2009.
[15] T. Shyamkumar, M. Naveen, A. J. Ho, and J. N. P., "Cacti 5.1," HP Labs, Tech. Rep. HPL-2008-20.
[16] U. of Catania, "Noxim, an open network-on-chip simulator," http://noxim.sourceforge.net.
[17] S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, "The splash- 2 programs: Characterization and methodological considerations," in Proceedings of the 22nd International Symposium on Computer Architecture, June 1995, pp. 24-36.
[18] P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner, "Simics: A full system simulation platform," Computer, vol. 35, no. 2, pp. 50-58, February 2002.
[19] Intel, "Intel core i7 processor extreme edition and intel core i7 processor datasheet, volume 1," December 2008, http://download.intel.com/design/processor/datashts/320834.pdf.
[20] D. Wentzlaff, P. Griffin, H. Hoffmann, L. Bao, B. Edwards, C. Ramey, M. Mattina, C.-C. Miao, J. Brown, and A. Agarwal, "On-chip interconnection architecture of the tile processor," Micro, IEEE, vol. 27, no. 5, pp. 15 -31, sept.-oct. 2007.
[21] T. C. Xu, P. Liljeberg, and H. Tenhunen, "A study of through silicon via impact to 3d network-on-chip design," in Proceedings of the 2010 International Conference on Electronics and Information Engineering (ICEIE 2010), August 2010.
[22] H. Global, "Ddr 2 memory controller ip core for fpga and asic," June 2010, http://www.hitechglobal.com/ipcores/ddr2controller.htm.
[23] H. Sullivan and T. R. Bashkow, "A large scale, homogeneous, fully distributed parallel machine," in Proceedings of the 4th annual symposium on Computer architecture, March 1977, pp. 105-117.
[24] C. Kim, D. Burger, and S. W. Keckler, "An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches," in ACM SIGPLAN, October 2002, pp. 211-222.
[25] A. Patel and K. Ghose, "Energy-efficient mesi cache coherence with pro-active snoop filtering for multicore microprocessors," in Proceeding of the thirteenth international symposium on Low power electronics and design, August 2008, pp. 247-252.
[26] H.-S. Wang, X. Zhu, L.-S. Peh, and S. Malik, "Orion: a powerperformance simulator for interconnection networks," in Proceedings of the 35th Annual IEEE/ACM International Symposium on Microarchitecture, November 2002, pp. 294-305.