Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3

CUDA Related Abstracts

3 GPU Accelerated Fractal Image Compression for Medical Imaging in Parallel Computing Platform

Authors: Md. Enamul Haque, Abdullah Al Kaisan, Mahmudur R. Saniat, Aminur Rahman

Abstract:

In this paper, we have implemented both sequential and parallel version of fractal image compression algorithms using CUDA (Compute Unified Device Architecture) programming model for parallelizing the program in Graphics Processing Unit for medical images, as they are highly similar within the image itself. There is several improvements in the implementation of the algorithm as well. Fractal image compression is based on the self similarity of an image, meaning an image having similarity in majority of the regions. We take this opportunity to implement the compression algorithm and monitor the effect of it using both parallel and sequential implementation. Fractal compression has the property of high compression rate and the dimensionless scheme. Compression scheme for fractal image is of two kinds, one is encoding and another is decoding. Encoding is very much computational expensive. On the other hand decoding is less computational. The application of fractal compression to medical images would allow obtaining much higher compression ratios. While the fractal magnification an inseparable feature of the fractal compression would be very useful in presenting the reconstructed image in a highly readable form. However, like all irreversible methods, the fractal compression is connected with the problem of information loss, which is especially troublesome in the medical imaging. A very time consuming encoding process, which can last even several hours, is another bothersome drawback of the fractal compression.

Keywords: Parallel Computing, accelerated GPU, CUDA, fractal image compression

Procedia PDF Downloads 186
2 Fast Prediction Unit Partition Decision and Accelerating the Algorithm Using Cudafor Intra and Inter Prediction of HEVC

Authors: Qiang Zhang, Chun Yuan

Abstract:

Since the PU (Prediction Unit) decision process is the most time consuming part of the emerging HEVC (High Efficient Video Coding) standardin intra and inter frame coding, this paper proposes the fast PU decision algorithm and speed up the algorithm using CUDA (Compute Unified Device Architecture). In intra frame coding, the fast PU decision algorithm uses the texture features to skip intra-frame prediction or terminal the intra-frame prediction for smaller PU size. In inter frame coding of HEVC, the fast PU decision algorithm takes use of the similarity of its own two Nx2N size PU's motion vectors and the hierarchical structure of CU (Coding Unit) partition to skip some modes of PU partition, so as to reduce the motion estimation times. The accelerate algorithm using CUDA is based on the fast PU decision algorithm which uses the GPU to make the motion search and the gradient computation could be parallel computed. The proposed algorithm achieves up to 57% time saving compared to the HM 10.0 with little rate-distortion losses (0.043dB drop and 1.82% bitrate increase on average).

Keywords: Parallel, CUDA, HEVC, PU decision, inter prediction, intra prediction

Procedia PDF Downloads 237
1 Solid Particles Transport and Deposition Prediction in a Turbulent Impinging Jet Using the Lattice Boltzmann Method and a Probabilistic Model on GPU

Authors: Ali Abdul Kadhim, Fue Lien

Abstract:

Solid particle distribution on an impingement surface has been simulated utilizing a graphical processing unit (GPU). In-house computational fluid dynamics (CFD) code has been developed to investigate a 3D turbulent impinging jet using the lattice Boltzmann method (LBM) in conjunction with large eddy simulation (LES) and the multiple relaxation time (MRT) models. This paper proposed an improvement in the LBM-cellular automata (LBM-CA) probabilistic method. In the current model, the fluid flow utilizes the D3Q19 lattice, while the particle model employs the D3Q27 lattice. The particle numbers are defined at the same regular LBM nodes, and transport of particles from one node to its neighboring nodes are determined in accordance with the particle bulk density and velocity by considering all the external forces. The previous models distribute particles at each time step without considering the local velocity and the number of particles at each node. The present model overcomes the deficiencies of the previous LBM-CA models and, therefore, can better capture the dynamic interaction between particles and the surrounding turbulent flow field. Despite the increasing popularity of LBM-MRT-CA model in simulating complex multiphase fluid flows, this approach is still expensive in term of memory size and computational time required to perform 3D simulations. To improve the throughput of each simulation, a single GeForce GTX TITAN X GPU is used in the present work. The CUDA parallel programming platform and the CuRAND library are utilized to form an efficient LBM-CA algorithm. The methodology was first validated against a benchmark test case involving particle deposition on a square cylinder confined in a duct. The flow was unsteady and laminar at Re=200 (Re is the Reynolds number), and simulations were conducted for different Stokes numbers. The present LBM solutions agree well with other results available in the open literature. The GPU code was then used to simulate the particle transport and deposition in a turbulent impinging jet at Re=10,000. The simulations were conducted for L/D=2,4 and 6, where L is the nozzle-to-surface distance and D is the jet diameter. The effect of changing the Stokes number on the particle deposition profile was studied at different L/D ratios. For comparative studies, another in-house serial CPU code was also developed, coupling LBM with the classical Lagrangian particle dispersion model. Agreement between results obtained with LBM-CA and LBM-Lagrangian models and the experimental data is generally good. The present GPU approach achieves a speedup ratio of about 350 against the serial code running on a single CPU.

Keywords: Multi-Phase Flow, CUDA, LES, lattice Boltzmann method, probabilistic model, GPU parallel programming, MRT

Procedia PDF Downloads 81