Search results for: cuda
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 5

Search results for: cuda

5 Efficient Heuristic Algorithm to Speed Up Graphcut in Gpu for Image Stitching

Authors: Tai Nguyen, Minh Bui, Huong Ninh, Tu Nguyen, Hai Tran

Abstract:

GraphCut algorithm has been widely utilized to solve various types of computer vision problems. Its expensive computational cost encouraged many researchers to improve the speed of the algorithm. Recent works proposed schemes that work on parallel computing platforms such as CUDA. However, the problem of low convergence speed prevents the usage of GraphCut for real time applications. In this paper, we propose global suppression heuristic to boost the conver-gence process of the algorithm. A parallel implementation of GraphCut algorithm on CUDA designed for the image stitching problem is introduced. Our method achieves up to 3× time boost on the graph of size 80 × 480 compared to the best sequential GraphCut algorithm while achieving satisfactory stitched images, suitable for panorama applications. Our source code will be soon available for further research.

Keywords: CUDA, graph cut, image stitching, texture synthesis, maxflow/mincut algorithm

Procedia PDF Downloads 110
4 GPU Accelerated Fractal Image Compression for Medical Imaging in Parallel Computing Platform

Authors: Md. Enamul Haque, Abdullah Al Kaisan, Mahmudur R. Saniat, Aminur Rahman

Abstract:

In this paper, we have implemented both sequential and parallel version of fractal image compression algorithms using CUDA (Compute Unified Device Architecture) programming model for parallelizing the program in Graphics Processing Unit for medical images, as they are highly similar within the image itself. There is several improvements in the implementation of the algorithm as well. Fractal image compression is based on the self similarity of an image, meaning an image having similarity in majority of the regions. We take this opportunity to implement the compression algorithm and monitor the effect of it using both parallel and sequential implementation. Fractal compression has the property of high compression rate and the dimensionless scheme. Compression scheme for fractal image is of two kinds, one is encoding and another is decoding. Encoding is very much computational expensive. On the other hand decoding is less computational. The application of fractal compression to medical images would allow obtaining much higher compression ratios. While the fractal magnification an inseparable feature of the fractal compression would be very useful in presenting the reconstructed image in a highly readable form. However, like all irreversible methods, the fractal compression is connected with the problem of information loss, which is especially troublesome in the medical imaging. A very time consuming encoding process, which can last even several hours, is another bothersome drawback of the fractal compression.

Keywords: accelerated GPU, CUDA, parallel computing, fractal image compression

Procedia PDF Downloads 315
3 Acceleration of Lagrangian and Eulerian Flow Solvers via Graphics Processing Units

Authors: Pooya Niksiar, Ali Ashrafizadeh, Mehrzad Shams, Amir Hossein Madani

Abstract:

There are many computationally demanding applications in science and engineering which need efficient algorithms implemented on high performance computers. Recently, Graphics Processing Units (GPUs) have drawn much attention as compared to the traditional CPU-based hardware and have opened up new improvement venues in scientific computing. One particular application area is Computational Fluid Dynamics (CFD), in which mature CPU-based codes need to be converted to GPU-based algorithms to take advantage of this new technology. In this paper, numerical solutions of two classes of discrete fluid flow models via both CPU and GPU are discussed and compared. Test problems include an Eulerian model of a two-dimensional incompressible laminar flow case and a Lagrangian model of a two phase flow field. The CUDA programming standard is used to employ an NVIDIA GPU with 480 cores and a C++ serial code is run on a single core Intel quad-core CPU. Up to two orders of magnitude speed up is observed on GPU for a certain range of grid resolution or particle numbers. As expected, Lagrangian formulation is better suited for parallel computations on GPU although Eulerian formulation represents significant speed up too.

Keywords: CFD, Eulerian formulation, graphics processing units, Lagrangian formulation

Procedia PDF Downloads 389
2 Parallel Pipelined Conjugate Gradient Algorithm on Heterogeneous Platforms

Authors: Sergey Kopysov, Nikita Nedozhogin, Leonid Tonkov

Abstract:

The article presents a parallel iterative solver for large sparse linear systems which can be used on a heterogeneous platform. Traditionally, the problem of solving linear systems does not scale well on multi-CPU/multi-GPUs clusters. For example, most of the attempts to implement the classical conjugate gradient method were at best counted in the same amount of time as the problem was enlarged. The paper proposes the pipelined variant of the conjugate gradient method (PCG), a formulation that is potentially better suited for hybrid CPU/GPU computing since it requires only one synchronization point per one iteration instead of two for standard CG. The standard and pipelined CG methods need the vector entries generated by the current GPU and other GPUs for matrix-vector products. So the communication between GPUs becomes a major performance bottleneck on multi GPU cluster. The article presents an approach to minimize the communications between parallel parts of algorithms. Additionally, computation and communication can be overlapped to reduce the impact of data exchange. Using the pipelined version of the CG method with one synchronization point, the possibility of asynchronous calculations and communications, load balancing between the CPU and GPU for solving the large linear systems allows for scalability. The algorithm is implemented with the combined use of technologies: MPI, OpenMP, and CUDA. We show that almost optimum speed up on 8-CPU/2GPU may be reached (relatively to a one GPU execution). The parallelized solver achieves a speedup of up to 5.49 times on 16 NVIDIA Tesla GPUs, as compared to one GPU.

Keywords: conjugate gradient, GPU, parallel programming, pipelined algorithm

Procedia PDF Downloads 140
1 The Effect of a Saturated Kink on the Dynamics of Tungsten Impurities in the Plasma Core

Authors: H. E. Ferrari, R. Farengo, C. F. Clauser

Abstract:

Tungsten (W) will be used in ITER as one of the plasma facing components (PFCs). The W could migrate to the plasma center. This could have a potentially deleterious effect on plasma confinement. Electron cyclotron resonance heating (ECRH) can be used to prevent W accumulation. We simulated a series of H mode discharges in ASDEX U with PFC containing W, where central ECRH was used to prevent W accumulation in the plasma center. The experiments showed that the W density profiles were flat after a sawtooth crash, and become hollow in between sawtooth crashes when ECRH has been applied. It was also observed that a saturated kink mode was active in these conditions. We studied the effect of saturated kink like instabilities on the redistribution of W impurities. The kink was modeled as the sum of a simple analytical equilibrium (large aspect ratio, circular cross section) plus the perturbation produced by the kink. A numerical code that follows the exact trajectories of the impurity ions in the total fields and includes collisions was employed. The code is written in Cuda C and runs in Graphical Processing Units (GPUs), allowing simulations with a large number of particles with modest resources. Our simulations show that when the W ions have a thermal velocity distribution, the kink has no effect on the W density. When we consider the plasma rotation, the kink can affect the W density. When the average passing frequency of the W particles is similar to the frequency of the kink mode, the expulsion of W ions from the plasma core is maximum, and the W density shows a hollow structure. This could have implications for the mitigation of W accumulation.

Keywords: impurity transport, kink instability, tungsten accumulation, tungsten dynamics

Procedia PDF Downloads 159