A PIM (Processor-In-Memory) for Computer Graphics : Data Partitioning and Placement Schemes
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32799
A PIM (Processor-In-Memory) for Computer Graphics : Data Partitioning and Placement Schemes

Authors: Jae Chul Cha, Sandeep K. Gupta

Abstract:

The demand for higher performance graphics continues to grow because of the incessant desire towards realism. And, rapid advances in fabrication technology have enabled us to build several processor cores on a single die. Hence, it is important to develop single chip parallel architectures for such data-intensive applications. In this paper, we propose an efficient PIM architectures tailored for computer graphics which requires a large number of memory accesses. We then address the two important tasks necessary for maximally exploiting the parallelism provided by the architecture, namely, partitioning and placement of graphic data, which affect respectively load balances and communication costs. Under the constraints of uniform partitioning, we develop approaches for optimal partitioning and placement, which significantly reduce search space. We also present heuristics for identifying near-optimal placement, since the search space for placement is impractically large despite our optimization. We then demonstrate the effectiveness of our partitioning and placement approaches via analysis of example scenes; simulation results show considerable search space reductions, and our heuristics for placement performs close to optimal – the average ratio of communication overheads between our heuristics and the optimal was 1.05. Our uniform partitioning showed average load-balance ratio of 1.47 for geometry processing and 1.44 for rasterization, which is reasonable.

Keywords: Data Partitioning and Placement, Graphics, PIM, Search Space Reduction.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1062874

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1454

References:


[1] International Technology Roadmap for Semiconductors , www.itrs.net/
[2] Keith Diefendorff, et al., How Multimedia Workloads Will Change Processor Design, IEEE Computer, p.43-45, 1997.
[3] D. Burger, et al., Memory Bandwidth Limitations of Future Microprocessors, In Proceedings of the 23rd Inter-national Symposium on Computer Architecture, p.78-89, 1996.
[4] Patterson D, et al., A Case for Intelligent DRAM: IRAM, IEEE Micro, 1997.
[5] Mark Oskin, et al., Active Pages: A Computation Model for Intelligent Memory, In Proceedings of the 23rd. Inter-national Symposium on. Computer Architecture, p.192-203, 1998.
[6] Yi Kang, et al., FlexRAM: Toward an Advanced Intelligent Memory System, In proceedings of 1999 IEEE International Conference on Computer Design, p.192, 1999.
[7] Jung-Yup Kang, et al., An Efficient PIM (Processor-In-Memory) Architecture for Motion Estimation. In proceedings of the 14th IEEE International Conference on Application-Specific Systems, Architectures, and Processors, p.282-292, 2003.
[8] Jung-Yup Kang, et al., Accelerating the Kernels of BLAST with an Efficient PIM (Processor-In-Memory) Architecture, In proceedings of the 3rd International IEEE Computer Society Computational Systems Bioinformatics Conference, p.552-553, 2004.
[9] John Montrym, et al., The GeForce 6800, IEEE Micro, p.41-51, 2005.
[10] Emmett Kilgariff, et al., The GeForce 6 Series GPU Architecture, download.nvidia.com/ developer/GPU_Gems_2/GPU_Gems2_ch30.pdf
[11] Molner, et. al., A sorting classification of parallel rendering, Computer Graphics and Application, IEEE, p.23-32, 1994.
[12] S. Whitman, Dynamic load balancing for parallel polygon rendering, IEEE Computer Graphics and Applications, p.41-48, 1994.
[13] S. Whitman, Parallel Graphics Rendering Algorithms, In Proceedings of 3rd Eurographics Workshop on Rendering, Consolidation Express, Bristol, UK, p.123-134, 1992.
[14] Tahsin M. Kurc, et al., Object-Space Parallel Polygon Rendering on Hypercubes, Compu-ters & Graphics , p.487-503, 1998.
[15] B. Wei, et al., Performance Issues of a Distributed Frame Buffer on a Multicomputer. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics Hardware, p.87 -96, 1998.
[16] Vineet Kumar. A Host Interface Architecture for HIPPI. In Proceedings of Scalable High Performance Computing Conference, p.142-149, 1994.
[17] Jae C. Cha, et al., Technical Report CENG-2007-6.
[18] Akeley, Kurt. RealityEngine Graphics. In Proceedings of SIGGRAPH -93, New York, p.109-116, 1993.
[19] Thomas W. Crockett, et al., Rendering Algorithm for MIMD Architectures, In Proceedings of the 1993 Parallel Rendering Symposium, p.35-42,1993.
[20] Deering, et al., A System for Cost Effective 3D Shaded Graphics. In Proceedings of SIGGRAPH -93, p.101-108, 1993.
[21] Ellsworth, et al.,. A New Algorithm for Interactive Graphics on Multicomputers. IEEE Computer Graphics & Applications, p.33-40, 1994.
[22] Fuchs, Henry, et al., Pixel-Planes 5: A Heterogeneous Multiprocessor Graphics System Using Processor-Enhanced Memories. In Proceedings of SIGGRAPH -89, p.79-88, 1993.
[23] J. D. Foley, et al., Computer Graphics, Principles and Practice. Addison- Wesley, 2nd edition, 1996.
[24] Francis S Hill Jr., et al., Computer Graphics Using OpenGL, Prentice Hall, 3rd edition, 2006.
[25] Tomas Akenine-Moller, et al., Real-Time Rendering, 2nd edition, A.K. Peters Ltd, 2002.
[26] Thomas W. Crockett, An Introduction to Parallel Rendering, Parallel Computing, p.819-843, 1997.
[27] D.R. Roble, A Load Balanced Parallel Scanline Z-Buffer Algorithm for the iPSC Hypercube, In Proceedings of the 1st International Conference PIXIM 88, p.177-192, 1998.
[28] D.S. Whelan, Animac: A Multiprocessor Architecture for Real time Computer Animation, Ph.D. dissertation, California Institute of Technology, Pasadena, CA, 1985.
[29] Carl Mueller, Hierarchical Graphics Databases in Sort-First, In Proceedings of the IEEE Symposium on Parallel Rendering, p.49-57, 1997.
[30] David Ellsworth, A Multicomputer Polygon Rendering Algorithm for Interactive Applications, In Proceedings of the 1993 Parallel Rendering Symposium, p.43-48, 1993.
[31] Carl Mueller, The sort-first rendering architecture for high-performance graphics, In Proceedings of the 1995 symposium on Interactive 3D graphics, p.75-ff., Monterey, 1995.
[32] The Cg Tutorial: The Definitive Guide to Programmable Real-Time Graphics, NVDIA, http://developer.nvidia.com/CgTutorial.
[33] Dirk Bartz, Rendering and Visualization in Parallel Environments, In SIGGRAPH 2000 Course.
[34] Frederico Abraham et al., A Load-Balancing Strategy for Sort-First Distributed Rendering, In Proceedings of SIGGRAPH -04, p.292-299, 2004.
[35] Wulf, Wm.A and McKee, S.A. Hitting the Memory Wall: Implications of the Obvious. ACM Computer Architecture News. Vol.23, No.1, 1995.
[36] http://www.nvidia.com/page/8800_tech_specs.html
[37] http://www.xbox.com/en-AU/support/xbox360/manuals/xbox360specs.h tm
[38] http://techreport.com/articles.x/10039/1