Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30006
Performance Improvements of DSP Applications on a Generic Reconfigurable Platform

Authors: Michalis D. Galanis, Gregory Dimitroulakos, Costas E. Goutis

Abstract:

Speedups from mapping four real-life DSP applications on an embedded system-on-chip that couples coarsegrained reconfigurable logic with an instruction-set processor are presented. The reconfigurable logic is realized by a 2-Dimensional Array of Processing Elements. A design flow for improving application-s performance is proposed. Critical software parts, called kernels, are accelerated on the Coarse-Grained Reconfigurable Array. The kernels are detected by profiling the source code. For mapping the detected kernels on the reconfigurable logic a prioritybased mapping algorithm has been developed. Two 4x4 array architectures, which differ in their interconnection structure among the Processing Elements, are considered. The experiments for eight different instances of a generic system show that important overall application speedups have been reported for the four applications. The performance improvements range from 1.86 to 3.67, with an average value of 2.53, compared with an all-software execution. These speedups are quite close to the maximum theoretical speedups imposed by Amdahl-s law.

Keywords: Reconfigurable computing, Coarse-grained reconfigurable array, Embedded systems, DSP, Performance

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1330547

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF

References:


[1] R. Hartenstein, "A Decade of Reconfigurable Computing: A Visionary Retrospective", in Proc. of ACM/IEEE DATE -01, pp. 642-649, 2001.
[2] E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, A. Agarwal. "Baring it all to software: RAW machines", in IEEE Computer, vol. 30, no. 9, pp. 86-93, Sept. 1997.
[3] T. Miyamori and K. Olukutun, "REMARC: Reconfigurable Multimedia Array Coprocessor", in IEICE Trans. on Information and Systems, vol. E82-D, no. 2, pp. 389-397, Feb. 1999.
[4] H. Singh, M.-H. Lee, G. Lu, F.J. Kurdahi, N. Bagherzadeh, E.M. Chaves Filho, "MorphoSys: An Integrated Reconfigurable System for Data- Parallel and Communication-Intensive Applications", in IEEE Trans. on Computers, vol. 49, no. 5, pp. 465-481, May 2000.
[5] Morpho Reconfigurable DSP (rDSP) IP core, Morpho Technologies, www.morphotech.com, 2005.
[6] V. Baumgarte, G. Ehlers, F. May, A. Nuckel, M. Vorbach, M. Weinhardt, "PACT XPP - A Self-Reconfigurable Data Processing Architecture", in the Journal of Supercomputing, Springer, vol. 26, no. 2, pp. 167-184, September 2003.
[7] J. Becker, M. Vorbach, "Architecture, Memory and Interface Technology Integration of an Industrial/Academic Configurable System-on-Chip (CSoC)", in Proc. of ISVLSI, IEEE Computer Society Press, pp. 107-112, 2003.
[8] S. C. Goldstein, H. Schmit, M. Budiu, S. Cadambi, M. Moe, R. R. Taylor, "PipeRench: A Reconfigurable Architecture and Compiler", in IEEE Computer, vol. 33, no. 4, pp. 70-77, April 2000.
[9] D. C. Cronquist, P. Franklin, S. G. Berg, C. Ebeling, "Specifying and Compiling Applications for RaPiD," in Proc. of FCCM -98, pp. 116-125, 1998.
[10] N. Bansal, S. Gupta, N. Dutt, A. Nikolau, R. Gupta, "Interconnect Aware Mapping of Applications to Coarse-Grain Reconfigurable Architectures", in Proc. of FPL -04, pp. 891-899, 2004.
[11] J. Lee, K. Choi, N. D. Dutt, "Compilation Approach for Coarse-Grained Reconfigurable Architectures", in IEEE Design & Test of Computers, vol. 20, no. 1, pp. 26-33, Jan.-Feb., 2003.
[12] G. Venkataramani, W. Najjar, F. Kurdahi, N. Bagherzadeh, W. Bohm and J. Hammes, "Automatic Compilation to a Coarse-Grained Reconfigurable System-on-Chip", in ACM Transactions on Embedded Computing Systems, vol. 2, no. 4, pp 560-589, Nov. 2003.
[13] Y. Kim, C. Park, S. Kang, H. Song, J. Jung, K. Choi, "Design and Evaluation of a Coarse-Grained Reconfigurable Architecture", in Proc. of ISOCC -04, pp. 227-230, 2004.
[14] B. Mei, S. Vernalde, D. Verkest, R. Lauwereins, "Mapping methodology for a Tightly Coupled VLIW/Reconfigurable Matrix Architecture, A Case Study", in Proc. of ACM/IEEE DATE -04, pp. 1224-1229, 2004.
[15] F.-J. Veredas, M. Scheppler, W. Moffat, B. Mei, "Custom implementation of the Coarse-Grained Reconfigurable ADRES architecture for Multimedia purposes", in Proc. of FPL -05, pp. 106-111, 2005.
[16] ARM Corp., www.arm.com, 2005.
[17] G. De Micheli, Synthesis and Optimization of Digital Circuits, McGraw- Hill, 1994.
[18] SUIF2 compiler infrastructure, http://suif.stanford.edu/suif/suif2/index.html, 2005.
[19] M. D. Smith and G. Holloway, "An Introduction to Machine SUIF and its Portable Libraries for Analysis and Optimization", Technical Report, Harvard University, 2002. http://www.eecs.harvard.edu/hube/research/machsuif.html.
[20] K. Kennedy and R. Allen, "Optimizing compilers for modern architectures", Morgan Kauffman Publishers, 2002.
[21] J.W. Crenshaw, "MATH Toolkit for Real-Time Programming", CMP Books, 2000.
[22] S. Kumar, L. Pires, S. Ponnuswamy, C. Nanavati, J. Golusky, M. Vojta, S. Wadi, D. Pandalai, H. Spaanenberg, "A Benchmark Suite for Evaluating Configurable Computing Systems - Status, Reflections, and Future directions", in Proc. of FPGA, pp. 126-134, 2000.
[23] M. Bister, Y. Taeymans, J. Cornelis, "Automatic Segmentation of Cardiac MR Images", in Proc. of Computers in Cardiology, IEEE Computer Society Press, pp.215-218, 1989.
[24] SimpleScalar LLC, http://www.simplescalar.com, 2005.