Portable and Parallel Accelerated Development Method for Field-Programmable Gate Array (FPGA)-Central Processing Unit (CPU)- Graphics Processing Unit (GPU) Heterogeneous Computing
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 87758
Portable and Parallel Accelerated Development Method for Field-Programmable Gate Array (FPGA)-Central Processing Unit (CPU)- Graphics Processing Unit (GPU) Heterogeneous Computing

Authors: Nan Hu, Chao Wang, Xi Li, Xuehai Zhou

Abstract:

The field-programmable gate array (FPGA) has been widely adopted in the high-performance computing domain. In recent years, the embedded system-on-a-chip (SoC) contains coarse granularity multi-core CPU (central processing unit) and mobile GPU (graphics processing unit) that can be used as general-purpose accelerators. The motivation is that algorithms of various parallel characteristics can be efficiently mapped to the heterogeneous architecture coupled with these three processors. The CPU and GPU offload partial computationally intensive tasks from the FPGA to reduce the resource consumption and lower the overall cost of the system. However, in present common scenarios, the applications always utilize only one type of accelerator because the development approach supporting the collaboration of the heterogeneous processors faces challenges. Therefore, a systematic approach takes advantage of write-once-run-anywhere portability, high execution performance of the modules mapped to various architectures and facilitates the exploration of design space. In this paper, A servant-execution-flow model is proposed for the abstraction of the cooperation of the heterogeneous processors, which supports task partition, communication and synchronization. At its first run, the intermediate language represented by the data flow diagram can generate the executable code of the target processor or can be converted into high-level programming languages. The instantiation parameters efficiently control the relationship between the modules and computational units, including two hierarchical processing units mapping and adjustment of data-level parallelism. An embedded system of a three-dimensional waveform oscilloscope is selected as a case study. The performance of algorithms such as contrast stretching, etc., are analyzed with implementations on various combinations of these processors. The experimental results show that the heterogeneous computing system with less than 35% resources achieves similar performance to the pure FPGA and approximate energy efficiency.

Keywords: FPGA-CPU-GPU collaboration, design space exploration, heterogeneous computing, intermediate language, parameterized instantiation

Procedia PDF Downloads 119