Search results for: OpenMP
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 9

Search results for: OpenMP

9 Programming Aid Tool for Detecting Common Mistakes of Novice Programmers in OpenMP Code

Authors: Jae Young Park, Seung Wook Lee, Jong Tae Kim

Abstract:

OpenMP is an API for parallel programming model of shared memory multiprocessors. Novice OpenMP programmers often produce the code that compiler cannot find human errors. It was investigated how compiler coped with the common mistakes that can occur in OpenMP code. The latest version(4.4.3) of GCC is used for this research. It was found that GCC compiled the codes without any errors or warnings. In this paper the programming aid tool is presented for OpenMP programs. It can check 12 common mistakes that novice programmer can commit during the programming of OpenMP. It was demonstrated that the programming aid tool can detect the various common mistakes that GCC failed to detect.

Keywords: Parallel programming, OpenMP, programming aid.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1506
8 A Survey on Performance Tools for OpenMP

Authors: Mubarak S. Mohsen, Rosni Abdullah, Yong M. Teo

Abstract:

Advances in processors architecture, such as multicore, increase the size of complexity of parallel computer systems. With multi-core architecture there are different parallel languages that can be used to run parallel programs. One of these languages is OpenMP which embedded in C/Cµ or FORTRAN. Because of this new architecture and the complexity, it is very important to evaluate the performance of OpenMP constructs, kernels, and application program on multi-core systems. Performance is the activity of collecting the information about the execution characteristics of a program. Performance tools consists of at least three interfacing software layers, including instrumentation, measurement, and analysis. The instrumentation layer defines the measured performance events. The measurement layer determines what performance event is actually captured and how it is measured by the tool. The analysis layer processes the performance data and summarizes it into a form that can be displayed in performance tools. In this paper, a number of OpenMP performance tools are surveyed, explaining how each is used to collect, analyse, and display data collection.

Keywords: Parallel performance tools, OpenMP, multi-core.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1851
7 Parallel Algorithm for Numerical Solution of Three-Dimensional Poisson Equation

Authors: Alibek Issakhov

Abstract:

In this paper developed and realized absolutely new algorithm for solving three-dimensional Poisson equation. This equation used in research of turbulent mixing, computational fluid dynamics, atmospheric front, and ocean flows and so on. Moreover in the view of rising productivity of difficult calculation there was applied the most up-to-date and the most effective parallel programming technology - MPI in combination with OpenMP direction, that allows to realize problems with very large data content. Resulted products can be used in solving of important applications and fundamental problems in mathematics and physics.

Keywords: MPI, OpenMP, three dimensional Poisson equation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1648
6 Parallel-Distributed Software Implementation of Buchberger Algorithm

Authors: Praloy Kumar Biswas, Prof. Dipanwita Roy Chowdhury

Abstract:

Grobner basis calculation forms a key part of computational commutative algebra and many other areas. One important ramification of the theory of Grobner basis provides a means to solve a system of non-linear equations. This is why it has become very important in the areas where the solution of non-linear equations is needed, for instance in algebraic cryptanalysis and coding theory. This paper explores on a parallel-distributed implementation for Grobner basis calculation over GF(2). For doing so Buchberger algorithm is used. OpenMP and MPI-C language constructs have been used to implement the scheme. Some relevant results have been furnished to compare the performances between the standalone and hybrid (parallel-distributed) implementation.

Keywords: Grobner basis, Buchberger Algorithm, Distributed- Parallel Computation, OpenMP, MPI.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1783
5 Parallel Explicit Group Domain Decomposition Methods for the Telegraph Equation

Authors: Kew Lee Ming, Norhashidah Hj. Mohd. Ali

Abstract:

In a previous work, we presented the numerical solution of the two dimensional second order telegraph partial differential equation discretized by the centred and rotated five-point finite difference discretizations, namely the explicit group (EG) and explicit decoupled group (EDG) iterative methods, respectively. In this paper, we utilize a domain decomposition algorithm on these group schemes to divide the tasks involved in solving the same equation. The objective of this study is to describe the development of the parallel group iterative schemes under OpenMP programming environment as a way to reduce the computational costs of the solution processes using multicore technologies. A detailed performance analysis of the parallel implementations of points and group iterative schemes will be reported and discussed.

Keywords: Telegraph equation, explicit group iterative scheme, domain decomposition algorithm, parallelization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1472
4 Dynamic Data Partition Algorithm for a Parallel H.264 Encoder

Authors: Juntae Kim, Jaeyoung Park, Kyoungkun Lee, Jong Tae Kim

Abstract:

The H.264/AVC standard is a highly efficient video codec providing high-quality videos at low bit-rates. As employing advanced techniques, the computational complexity has been increased. The complexity brings about the major problem in the implementation of a real-time encoder and decoder. Parallelism is the one of approaches which can be implemented by multi-core system. We analyze macroblock-level parallelism which ensures the same bit rate with high concurrency of processors. In order to reduce the encoding time, dynamic data partition based on macroblock region is proposed. The data partition has the advantages in load balancing and data communication overhead. Using the data partition, the encoder obtains more than 3.59x speed-up on a four-processor system. This work can be applied to other multimedia processing applications.

Keywords: H.264/AVC, video coding, thread-level parallelism, OpenMP, multimedia

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1751
3 Analyzing the Factors that Cause Parallel Performance Degradation in Parallel Graph-Based Computations Using Graph500

Authors: Mustafa Elfituri, Jonathan Cook

Abstract:

Recently, graph-based computations have become more important in large-scale scientific computing as they can provide a methodology to model many types of relations between independent objects. They are being actively used in fields as varied as biology, social networks, cybersecurity, and computer networks. At the same time, graph problems have some properties such as irregularity and poor locality that make their performance different than regular applications performance. Therefore, parallelizing graph algorithms is a hard and challenging task. Initial evidence is that standard computer architectures do not perform very well on graph algorithms. Little is known exactly what causes this. The Graph500 benchmark is a representative application for parallel graph-based computations, which have highly irregular data access and are driven more by traversing connected data than by computation. In this paper, we present results from analyzing the performance of various example implementations of Graph500, including a shared memory (OpenMP) version, a distributed (MPI) version, and a hybrid version. We measured and analyzed all the factors that affect its performance in order to identify possible changes that would improve its performance. Results are discussed in relation to what factors contribute to performance degradation.

Keywords: Graph computation, Graph500 benchmark, parallel architectures, parallel programming, workload characterization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 472
2 Automatic Tuning for a Systemic Model of Banking Originated Losses (SYMBOL) Tool on Multicore

Authors: Ronal Muresano, Andrea Pagano

Abstract:

Nowadays, the mathematical/statistical applications are developed with more complexity and accuracy. However, these precisions and complexities have brought as result that applications need more computational power in order to be executed faster. In this sense, the multicore environments are playing an important role to improve and to optimize the execution time of these applications. These environments allow us the inclusion of more parallelism inside the node. However, to take advantage of this parallelism is not an easy task, because we have to deal with some problems such as: cores communications, data locality, memory sizes (cache and RAM), synchronizations, data dependencies on the model, etc. These issues are becoming more important when we wish to improve the application’s performance and scalability. Hence, this paper describes an optimization method developed for Systemic Model of Banking Originated Losses (SYMBOL) tool developed by the European Commission, which is based on analyzing the application's weakness in order to exploit the advantages of the multicore. All these improvements are done in an automatic and transparent manner with the aim of improving the performance metrics of our tool. Finally, experimental evaluations show the effectiveness of our new optimized version, in which we have achieved a considerable improvement on the execution time. The time has been reduced around 96% for the best case tested, between the original serial version and the automatic parallel version.

Keywords: Algorithm optimization, Bank Failures, OpenMP, Parallel Techniques, Statistical tool.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1842
1 Parallel Pipelined Conjugate Gradient Algorithm on Heterogeneous Platforms

Authors: Sergey Kopysov, Nikita Nedozhogin, Leonid Tonkov

Abstract:

The article presents a parallel iterative solver for large sparse linear systems which can be used on a heterogeneous platform. Traditionally, the problem of solving linear systems do not scale well on cluster containing multiple Central Processing Units (multi-CPUs cluster) or cluster containing multiple Graphics Processing Units (multi-GPUs cluster). For example, most of the attempts to implement the classical conjugate gradient method were at best counted in the same amount of time as the problem was enlarged. The paper proposes the pipelined variant of the conjugate gradient method (PCG), a formulation that is potentially better suited for hybrid CPU/GPU computing since it requires only one synchronization point per one iteration, instead of two for standard CG (Conjugate Gradient). The standard and pipelined CG methods need the vector entries generated by current GPU and other GPUs for matrix-vector product. So the communication between GPUs becomes a major performance bottleneck on miltiGPU cluster. The article presents an approach to minimize the communications between parallel parts of algorithms. Additionally, computation and communication can be overlapped to reduce the impact of data exchange. Using pipelined version of the CG method with one synchronization point, the possibility of asynchronous calculations and communications, load balancing between the CPU and GPU for solving the large linear systems allows for scalability. The algorithm is implemented with the combined use of technologies: MPI, OpenMP and CUDA. We show that almost optimum speed up on 8-CPU/2GPU may be reached (relatively to a one GPU execution). The parallelized solver achieves a speedup of up to 5.49 times on 16 NVIDIA Tesla GPUs, as compared to one GPU.

Keywords: Conjugate Gradient, GPU, parallel programming, pipelined algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 316