Search results for: parallel processing
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 4630

Search results for: parallel processing

4600 Pushing the Boundary of Parallel Tractability for Ontology Materialization via Boolean Circuits

Authors: Zhangquan Zhou, Guilin Qi

Abstract:

Materialization is an important reasoning service for applications built on the Web Ontology Language (OWL). To make materialization efficient in practice, current research focuses on deciding tractability of an ontology language and designing parallel reasoning algorithms. However, some well-known large-scale ontologies, such as YAGO, have been shown to have good performance for parallel reasoning, but they are expressed in ontology languages that are not parallelly tractable, i.e., the reasoning is inherently sequential in the worst case. This motivates us to study the problem of parallel tractability of ontology materialization from a theoretical perspective. That is we aim to identify the ontologies for which materialization is parallelly tractable, i.e., in the NC complexity. Since the NC complexity is defined based on Boolean circuit that is widely used to investigate parallel computing problems, we first transform the problem of materialization to evaluation of Boolean circuits, and then study the problem of parallel tractability based on circuits. In this work, we focus on datalog rewritable ontology languages. We use Boolean circuits to identify two classes of datalog rewritable ontologies (called parallelly tractable classes) such that materialization over them is parallelly tractable. We further investigate the parallel tractability of materialization of a datalog rewritable OWL fragment DHL (Description Horn Logic). Based on the above results, we analyze real-world datasets and show that many ontologies expressed in DHL belong to the parallelly tractable classes.

Keywords: ontology materialization, parallel reasoning, datalog, Boolean circuit

Procedia PDF Downloads 255
4599 Comparative Analysis of Classical and Parallel Inpainting Algorithms Based on Affine Combinations of Projections on Convex Sets

Authors: Irina Maria Artinescu, Costin Radu Boldea, Eduard-Ionut Matei

Abstract:

The paper is a comparative study of two classical variants of parallel projection methods for solving the convex feasibility problem with their equivalents that involve variable weights in the construction of the solutions. We used a graphical representation of these methods for inpainting a convex area of an image in order to investigate their effectiveness in image reconstruction applications. We also presented a numerical analysis of the convergence of these four algorithms in terms of the average number of steps and execution time in classical CPU and, alternatively, in parallel GPU implementation.

Keywords: convex feasibility problem, convergence analysis, inpainting, parallel projection methods

Procedia PDF Downloads 153
4598 Achievable Average Secrecy Rates over Bank of Parallel Independent Fading Channels with Friendly Jamming

Authors: Munnujahan Ara

Abstract:

In this paper, we investigate the effect of friendly jamming power allocation strategies on the achievable average secrecy rate over a bank of parallel fading wiretap channels. We investigate the achievable average secrecy rate in parallel fading wiretap channels subject to Rayleigh and Rician fading. The achievable average secrecy rate, due to the presence of a line-of-sight component in the jammer channel is also evaluated. Moreover, we study the detrimental effect of correlation across the parallel sub-channels, and evaluate the corresponding decrease in the achievable average secrecy rate for the various fading configurations. We also investigate the tradeoff between the transmission power and the jamming power for a fixed total power budget. Our results, which are applicable to current orthogonal frequency division multiplexing (OFDM) communications systems, shed further light on the achievable average secrecy rates over a bank of parallel fading channels in the presence of friendly jammers.

Keywords: fading parallel channels, wire-tap channel, OFDM, secrecy capacity, power allocation

Procedia PDF Downloads 489
4597 Researching Apache Hama: A Pure BSP Computing Framework

Authors: Kamran Siddique, Yangwoo Kim, Zahid Akhtar

Abstract:

In recent years, the technological advancements have led to a deluge of data from distinctive domains and the need for development of solutions based on parallel and distributed computing has still long way to go. That is why, the research and development of massive computing frameworks is continuously growing. At this particular stage, highlighting a potential research area along with key insights could be an asset for researchers in the field. Therefore, this paper explores one of the emerging distributed computing frameworks, Apache Hama. It is a Top Level Project under the Apache Software Foundation, based on Bulk Synchronous Processing (BSP). We present an unbiased and critical interrogation session about Apache Hama and conclude research directions in order to assist interested researchers.

Keywords: apache hama, bulk synchronous parallel, BSP, distributed computing

Procedia PDF Downloads 228
4596 Quantitative Analysis of Multiprocessor Architectures for Radar Signal Processing

Authors: Deepak Kumar, Debasish Deb, Reena Mamgain

Abstract:

Radar signal processing requires high number crunching capability. Most often this is achieved using multiprocessor platform. Though multiprocessor platform provides the capability of meeting the real time computational challenges, the architecture of the same along with mapping of the algorithm on the architecture plays a vital role in efficiently using the platform. Towards this, along with standard performance metrics, few additional metrics are defined which helps in evaluating the multiprocessor platform along with the algorithm mapping. A generic multiprocessor architecture can not suit all the processing requirements. Depending on the system requirement and type of algorithms used, the most suitable architecture for the given problem is decided. In the paper, we study different architectures and quantify the different performance metrics which enables comparison of different architectures for their merit. We also carried out case study of different architectures and their efficiency depending on parallelism exploited on algorithm or data or both.

Keywords: radar signal processing, multiprocessor architecture, efficiency, load imbalance, buffer requirement, pipeline, parallel, hybrid, cluster of processors (COPs)

Procedia PDF Downloads 389
4595 Machine Learning Approach for Mutation Testing

Authors: Michael Stewart

Abstract:

Mutation testing is a type of software testing proposed in the 1970s where program statements are deliberately changed to introduce simple errors so that test cases can be validated to determine if they can detect the errors. Test cases are executed against the mutant code to determine if one fails, detects the error and ensures the program is correct. One major issue with this type of testing was it became intensive computationally to generate and test all possible mutations for complex programs. This paper used reinforcement learning and parallel processing within the context of mutation testing for the selection of mutation operators and test cases that reduced the computational cost of testing and improved test suite effectiveness. Experiments were conducted using sample programs to determine how well the reinforcement learning-based algorithm performed with one live mutation, multiple live mutations and no live mutations. The experiments, measured by mutation score, were used to update the algorithm and improved accuracy for predictions. The performance was then evaluated on multiple processor computers. With reinforcement learning, the mutation operators utilized were reduced by 50 – 100%.

Keywords: automated-testing, machine learning, mutation testing, parallel processing, reinforcement learning, software engineering, software testing

Procedia PDF Downloads 174
4594 A Parallel Algorithm for Solving the PFSP on the Grid

Authors: Samia Kouki

Abstract:

Solving NP-hard combinatorial optimization problems by exact search methods, such as Branch-and-Bound, may degenerate to complete enumeration. For that reason, exact approaches limit us to solve only small or moderate size problem instances, due to the exponential increase in CPU time when problem size increases. One of the most promising ways to reduce significantly the computational burden of sequential versions of Branch-and-Bound is to design parallel versions of these algorithms which employ several processors. This paper describes a parallel Branch-and-Bound algorithm called GALB for solving the classical permutation flowshop scheduling problem as well as its implementation on a Grid computing infrastructure. The experimental study of our distributed parallel algorithm gives promising results and shows clearly the benefit of the parallel paradigm to solve large-scale instances in moderate CPU time.

Keywords: grid computing, permutation flow shop problem, branch and bound, load balancing

Procedia PDF Downloads 260
4593 Autism Disease Detection Using Transfer Learning Techniques: Performance Comparison between Central Processing Unit vs. Graphics Processing Unit Functions for Neural Networks

Authors: Mst Shapna Akter, Hossain Shahriar

Abstract:

Neural network approaches are machine learning methods used in many domains, such as healthcare and cyber security. Neural networks are mostly known for dealing with image datasets. While training with the images, several fundamental mathematical operations are carried out in the Neural Network. The operation includes a number of algebraic and mathematical functions, including derivative, convolution, and matrix inversion and transposition. Such operations require higher processing power than is typically needed for computer usage. Central Processing Unit (CPU) is not appropriate for a large image size of the dataset as it is built with serial processing. While Graphics Processing Unit (GPU) has parallel processing capabilities and, therefore, has higher speed. This paper uses advanced Neural Network techniques such as VGG16, Resnet50, Densenet, Inceptionv3, Xception, Mobilenet, XGBOOST-VGG16, and our proposed models to compare CPU and GPU resources. A system for classifying autism disease using face images of an autistic and non-autistic child was used to compare performance during testing. We used evaluation matrices such as Accuracy, F1 score, Precision, Recall, and Execution time. It has been observed that GPU runs faster than the CPU in all tests performed. Moreover, the performance of the Neural Network models in terms of accuracy increases on GPU compared to CPU.

Keywords: autism disease, neural network, CPU, GPU, transfer learning

Procedia PDF Downloads 91
4592 The Parallelization of Algorithm Based on Partition Principle for Association Rules Discovery

Authors: Khadidja Belbachir, Hafida Belbachir

Abstract:

subsequently the expansion of the physical supports storage and the needs ceaseless to accumulate several data, the sequential algorithms of associations’ rules research proved to be ineffective. Thus the introduction of the new parallel versions is imperative. We propose in this paper, a parallel version of a sequential algorithm “Partition”. This last is fundamentally different from the other sequential algorithms, because it scans the data base only twice to generate the significant association rules. By consequence, the parallel approach does not require much communication between the sites. The proposed approach was implemented for an experimental study. The obtained results, shows a great reduction in execution time compared to the sequential version and Count Distributed algorithm.

Keywords: association rules, distributed data mining, partition, parallel algorithms

Procedia PDF Downloads 379
4591 Spatial Audio Player Using Musical Genre Classification

Authors: Jun-Yong Lee, Hyoung-Gook Kim

Abstract:

In this paper, we propose a smart music player that combines the musical genre classification and the spatial audio processing. The musical genre is classified based on content analysis of the musical segment detected from the audio stream. In parallel with the classification, the spatial audio quality is achieved by adding an artificial reverberation in a virtual acoustic space to the input mono sound. Thereafter, the spatial sound is boosted with the given frequency gains based on the musical genre when played back. Experiments measured the accuracy of detecting the musical segment from the audio stream and its musical genre classification. A listening test was performed based on the virtual acoustic space based spatial audio processing.

Keywords: automatic equalization, genre classification, music segment detection, spatial audio processing

Procedia PDF Downloads 407
4590 Study of Temperature Difference and Current Distribution in Parallel-Connected Cells at Low Temperature

Authors: Sara Kamalisiahroudi, Jun Huang, Zhe Li, Jianbo Zhang

Abstract:

Two types of commercial cylindrical lithium ion batteries (Panasonic 3.4 Ah NCR-18650B and Samsung 2.9 Ah INR-18650), were investigated experimentally. The capacities of these samples were individually measured using constant current-constant voltage (CC-CV) method at different ambient temperatures (-10 ℃, 0 ℃, 25 ℃). Their internal resistance was determined by electrochemical impedance spectroscopy (EIS) and pulse discharge methods. The cells with different configurations of parallel connection NCR-NCR, INR-INR and NCR-INR were charged/discharged at the aforementioned ambient temperatures. The results showed that the difference of internal resistance between cells much more evident at low temperatures. Furthermore, the parallel connection of NCR-NCR exhibits the most uniform temperature distribution in cells at -10 ℃, this feature is quite favorable for the safety of the battery pack.

Keywords: batteries in parallel connection, internal resistance, low temperature, temperature difference, current distribution

Procedia PDF Downloads 454
4589 4-DOFs Parallel Mechanism for Minimally Invasive Robotic Surgery

Authors: Khalil Ibrahim, Ahmed Ramadan, Mohamed Fanni, Yo Kobayashi, Ahmed Abo-Ismail, Masakatus G. Fujie

Abstract:

This paper deals with the design process and the dynamic control simulation of a new type of 4-DOFs parallel mechanism that can be used as an endoscopic surgical manipulator. The proposed mechanism, 2-PUU_2-PUS, is designed based on the screw theory and the parallel virtual chain type synthesis method. Based on the structure analysis of the 4-DOF parallel mechanism, the inverse position equation is studied using the inverse analysis theory of kinematics. The design and the stress analysis of the mechanism are investigated using SolidWorks software. The virtual prototype of the parallel mechanism is constructed, and the dynamic simulation is performed using ADAMS TM software. The system model utilizing PID and PI controllers has been built using MATLAB software. A more realistic simulation in accordance with a given bending angle and point to point control is implemented by the use of both ADAMS/MATLAB software. The simulation results showed that this control method has solved the coordinate control for the 4-DOF parallel manipulator so that each output is feedback to the four driving rods. From the results, the tracking performance is achieved. Other control techniques, such as intelligent ones, are recommended to improve the tracking performance and reduce the numerical truncation error.

Keywords: parallel mechanisms, medical robotics, tracjectory control, virtual chain type synthesis method

Procedia PDF Downloads 443
4588 Exploring MPI-Based Parallel Computing in Analyzing Very Large Sequences

Authors: Bilal Wajid, Erchin Serpedin

Abstract:

The health industry is aiming towards personalized medicine. If the patient’s genome needs to be sequenced it is important that the entire analysis be completed quickly. This paper explores use of parallel computing to analyze very large sequences. Two cases have been considered. In the first case, the sequence is kept constant and the effect of increasing the number of MPI-based processes is evaluated in terms of execution time, speed and efficiency. In the second case the number of MPI-based processes have been kept constant whereas, the length of the sequence was increased.

Keywords: parallel computing, alignment, genome assembly, alignment

Procedia PDF Downloads 249
4587 Security Over OFDM Fading Channels with Friendly Jammer

Authors: Munnujahan Ara

Abstract:

In this paper, we investigate the effect of friendly jamming power allocation strategies on the achievable average secrecy rate over a bank of parallel fading wiretap channels. We investigate the achievable average secrecy rate in parallel fading wiretap channels subject to Rayleigh and Rician fading. The achievable average secrecy rate, due to the presence of a line-of-sight component in the jammer channel is also evaluated. Moreover, we study the detrimental effect of correlation across the parallel sub-channels, and evaluate the corresponding decrease in the achievable average secrecy rate for the various fading configurations. We also investigate the tradeoff between the transmission power and the jamming power for a fixed total power budget. Our results, which are applicable to current orthogonal frequency division multiplexing (OFDM) communications systems, shed further light on the achievable average secrecy rates over a bank of parallel fading channels in the presence of friendly jammers.

Keywords: fading parallel channels, wire-tap channel, OFDM, secrecy capacity, power allocation

Procedia PDF Downloads 485
4586 Numerical Studies for Standard Bi-Conjugate Gradient Stabilized Method and the Parallel Variants for Solving Linear Equations

Authors: Kuniyoshi Abe

Abstract:

Bi-conjugate gradient (Bi-CG) is a well-known method for solving linear equations Ax = b, for x, where A is a given n-by-n matrix, and b is a given n-vector. Typically, the dimension of the linear equation is high and the matrix is sparse. A number of hybrid Bi-CG methods such as conjugate gradient squared (CGS), Bi-CG stabilized (Bi-CGSTAB), BiCGStab2, and BiCGstab(l) have been developed to improve the convergence of Bi-CG. Bi-CGSTAB has been most often used for efficiently solving the linear equation, but we have seen the convergence behavior with a long stagnation phase. In such cases, it is important to have Bi-CG coefficients that are as accurate as possible, and the stabilization strategy, which stabilizes the computation of the Bi-CG coefficients, has been proposed. It may avoid stagnation and lead to faster computation. Motivated by a large number of processors in present petascale high-performance computing hardware, the scalability of Krylov subspace methods on parallel computers has recently become increasingly prominent. The main bottleneck for efficient parallelization is the inner products which require a global reduction. The resulting global synchronization phases cause communication overhead on parallel computers. The parallel variants of Krylov subspace methods reducing the number of global communication phases and hiding the communication latency have been proposed. However, the numerical stability, specifically, the convergence speed of the parallel variants of Bi-CGSTAB may become worse than that of the standard Bi-CGSTAB. In this paper, therefore, we compare the convergence speed between the standard Bi-CGSTAB and the parallel variants by numerical experiments and show that the convergence speed of the standard Bi-CGSTAB is faster than the parallel variants. Moreover, we propose the stabilization strategy for the parallel variants.

Keywords: bi-conjugate gradient stabilized method, convergence speed, Krylov subspace methods, linear equations, parallel variant

Procedia PDF Downloads 143
4585 Optoelectronic Hardware Architecture for Recurrent Learning Algorithm in Image Processing

Authors: Abdullah Bal, Sevdenur Bal

Abstract:

This paper purposes a new type of hardware application for training of cellular neural networks (CNN) using optical joint transform correlation (JTC) architecture for image feature extraction. CNNs require much more computation during the training stage compare to test process. Since optoelectronic hardware applications offer possibility of parallel high speed processing capability for 2D data processing applications, CNN training algorithm can be realized using Fourier optics technique. JTC employs lens and CCD cameras with laser beam that realize 2D matrix multiplication and summation in the light speed. Therefore, in the each iteration of training, JTC carries more computation burden inherently and the rest of mathematical computation realized digitally. The bipolar data is encoded by phase and summation of correlation operations is realized using multi-object input joint images. Overlapping properties of JTC are then utilized for summation of two cross-correlations which provide less computation possibility for training stage. Phase-only JTC does not require data rearrangement, electronic pre-calculation and strict system alignment. The proposed system can be incorporated simultaneously with various optical image processing or optical pattern recognition techniques just in the same optical system.

Keywords: CNN training, image processing, joint transform correlation, optoelectronic hardware

Procedia PDF Downloads 483
4584 Massively-Parallel Bit-Serial Neural Networks for Fast Epilepsy Diagnosis: A Feasibility Study

Authors: Si Mon Kueh, Tom J. Kazmierski

Abstract:

There are about 1% of the world population suffering from the hidden disability known as epilepsy and major developing countries are not fully equipped to counter this problem. In order to reduce the inconvenience and danger of epilepsy, different methods have been researched by using a artificial neural network (ANN) classification to distinguish epileptic waveforms from normal brain waveforms. This paper outlines the aim of achieving massive ANN parallelization through a dedicated hardware using bit-serial processing. The design of this bit-serial Neural Processing Element (NPE) is presented which implements the functionality of a complete neuron using variable accuracy. The proposed design has been tested taking into consideration non-idealities of a hardware ANN. The NPE consists of a bit-serial multiplier which uses only 16 logic elements on an Altera Cyclone IV FPGA and a bit-serial ALU as well as a look-up table. Arrays of NPEs can be driven by a single controller which executes the neural processing algorithm. In conclusion, the proposed compact NPE design allows the construction of complex hardware ANNs that can be implemented in a portable equipment that suits the needs of a single epileptic patient in his or her daily activities to predict the occurrences of impending tonic conic seizures.

Keywords: Artificial Neural Networks (ANN), bit-serial neural processor, FPGA, Neural Processing Element (NPE)

Procedia PDF Downloads 298
4583 Parallel Evaluation of Sommerfeld Integrals for Multilayer Dyadic Green's Function

Authors: Duygu Kan, Mehmet Cayoren

Abstract:

Sommerfeld-integrals (SIs) are commonly encountered in electromagnetics problems involving analysis of antennas and scatterers embedded in planar multilayered media. Generally speaking, the analytical solution of SIs is unavailable, and it is well known that numerical evaluation of SIs is very time consuming and computationally expensive due to the highly oscillating and slowly decaying nature of the integrands. Therefore, fast computation of SIs has a paramount importance. In this paper, a parallel code has been developed to speed up the computation of SI in the framework of calculation of dyadic Green’s function in multilayered media. OpenMP shared memory approach is used to parallelize the SI algorithm and resulted in significant time savings. Moreover accelerating the computation of dyadic Green’s function is discussed based on the parallel SI algorithm developed.

Keywords: Sommerfeld-integrals, multilayer dyadic Green’s function, OpenMP, shared memory parallel programming

Procedia PDF Downloads 222
4582 GPU-Accelerated Triangle Mesh Simplification Using Parallel Vertex Removal

Authors: Thomas Odaker, Dieter Kranzlmueller, Jens Volkert

Abstract:

We present an approach to triangle mesh simplification designed to be executed on the GPU. We use a quadric error metric to calculate an error value for each vertex of the mesh and order all vertices based on this value. This step is followed by the parallel removal of a number of vertices with the lowest calculated error values. To allow for the parallel removal of multiple vertices we use a set of per-vertex boundaries that prevent mesh foldovers even when simplification operations are performed on neighbouring vertices. We execute multiple iterations of the calculation of the vertex errors, ordering of the error values and removal of vertices until either a desired number of vertices remains in the mesh or a minimum error value is reached. This parallel approach is used to speed up the simplification process while maintaining mesh topology and avoiding foldovers at every step of the simplification.

Keywords: computer graphics, half edge collapse, mesh simplification, precomputed simplification, topology preserving

Procedia PDF Downloads 341
4581 Portable and Parallel Accelerated Development Method for Field-Programmable Gate Array (FPGA)-Central Processing Unit (CPU)- Graphics Processing Unit (GPU) Heterogeneous Computing

Authors: Nan Hu, Chao Wang, Xi Li, Xuehai Zhou

Abstract:

The field-programmable gate array (FPGA) has been widely adopted in the high-performance computing domain. In recent years, the embedded system-on-a-chip (SoC) contains coarse granularity multi-core CPU (central processing unit) and mobile GPU (graphics processing unit) that can be used as general-purpose accelerators. The motivation is that algorithms of various parallel characteristics can be efficiently mapped to the heterogeneous architecture coupled with these three processors. The CPU and GPU offload partial computationally intensive tasks from the FPGA to reduce the resource consumption and lower the overall cost of the system. However, in present common scenarios, the applications always utilize only one type of accelerator because the development approach supporting the collaboration of the heterogeneous processors faces challenges. Therefore, a systematic approach takes advantage of write-once-run-anywhere portability, high execution performance of the modules mapped to various architectures and facilitates the exploration of design space. In this paper, A servant-execution-flow model is proposed for the abstraction of the cooperation of the heterogeneous processors, which supports task partition, communication and synchronization. At its first run, the intermediate language represented by the data flow diagram can generate the executable code of the target processor or can be converted into high-level programming languages. The instantiation parameters efficiently control the relationship between the modules and computational units, including two hierarchical processing units mapping and adjustment of data-level parallelism. An embedded system of a three-dimensional waveform oscilloscope is selected as a case study. The performance of algorithms such as contrast stretching, etc., are analyzed with implementations on various combinations of these processors. The experimental results show that the heterogeneous computing system with less than 35% resources achieves similar performance to the pure FPGA and approximate energy efficiency.

Keywords: FPGA-CPU-GPU collaboration, design space exploration, heterogeneous computing, intermediate language, parameterized instantiation

Procedia PDF Downloads 91
4580 Parallel Asynchronous Multi-Splitting Methods for Differential Algebraic Systems

Authors: Malika Elkyal

Abstract:

We consider an iterative parallel multi-splitting method for differential algebraic equations. The main feature of the proposed idea is to use the asynchronous form. We prove that the multi-splitting technique can effectively accelerate the convergent performance of the iterative process. The main characteristic of an asynchronous mode is that the local algorithm does not have to wait at predetermined messages to become available. We allow some processors to communicate more frequently than others, and we allow the communication delays to be substantial and unpredictable. Accordingly, we note that synchronous algorithms in the computer science sense are particular cases of our formulation of asynchronous one.

Keywords: parallel methods, asynchronous mode, multisplitting, differential algebraic equations

Procedia PDF Downloads 534
4579 Natural Convection between Two Parallel Wavy Plates

Authors: Si Abdallah Mayouf

Abstract:

In this work, the effects of the wavy surface on free convection heat transfer boundary layer flow between two parallel wavy plates have been studied numerically. The two plates are considered at a constant temperature. The equations and the boundary conditions are discretized by the finite difference scheme and solved numerically using the Gauss-Seidel algorithm. The important parameters in this problem are the amplitude of the wavy surfaces and the distance between the two wavy plates. Results are presented as velocity profiles, temperature profiles and local Nusselt number according to the important parameters.

Keywords: free convection, wavy surface, parallel plates, fluid dynamics

Procedia PDF Downloads 284
4578 Parallelizing the Hybrid Pseudo-Spectral Time Domain/Finite Difference Time Domain Algorithms for the Large-Scale Electromagnetic Simulations Using Massage Passing Interface Library

Authors: Donggun Lee, Q-Han Park

Abstract:

Due to its coarse grid, the Pseudo-Spectral Time Domain (PSTD) method has advantages against the Finite Difference Time Domain (FDTD) method in terms of memory requirement and operation time. However, since the efficiency of parallelization is much lower than that of FDTD, PSTD is not a useful method for a large-scale electromagnetic simulation in a parallel platform. In this paper, we propose the parallelization technique of the hybrid PSTD-FDTD (HPF) method which simultaneously possesses the efficient parallelizability of FDTD and the quick speed and low memory requirement of PSTD. Parallelization cost of the HPF method is exactly the same as the parallel FDTD, but still, it occupies much less memory space and has faster operation speed than the parallel FDTD. Experiments in distributed memory systems have shown that the parallel HPF method saves up to 96% of the operation time and reduces 84% of the memory requirement. Also, by combining the OpenMP library to the MPI library, we further reduced the operation time of the parallel HPF method by 50%.

Keywords: FDTD, hybrid, MPI, OpenMP, PSTD, parallelization

Procedia PDF Downloads 120
4577 Detecting the Edge of Multiple Images in Parallel

Authors: Prakash K. Aithal, U. Dinesh Acharya, Rajesh Gopakumar

Abstract:

Edge is variation of brightness in an image. Edge detection is useful in many application areas such as finding forests, rivers from a satellite image, detecting broken bone in a medical image etc. The paper discusses about finding edge of multiple aerial images in parallel .The proposed work tested on 38 images 37 colored and one monochrome image. The time taken to process N images in parallel is equivalent to time taken to process 1 image in sequential. The proposed method achieves pixel level parallelism as well as image level parallelism.

Keywords: edge detection, multicore, gpu, opencl, mpi

Procedia PDF Downloads 457
4576 Comparison of Parallel CUDA and OpenMP Implementations of Memetic Algorithms for Solving Optimization Problems

Authors: Jason Digalakis, John Cotronis

Abstract:

Memetic algorithms (MAs) are useful for solving optimization problems. It is quite difficult to search the search space of the optimization problem with large dimensions. There is a challenge to use all the cores of the system. In this study, a sequential implementation of the memetic algorithm is converted into a concurrent version, which is executed on the cores of both CPU and GPU. For this reason, CUDA and OpenMP libraries are operated on the parallel algorithm to make a concurrent execution on CPU and GPU, respectively. The aim of this study is to compare CPU and GPU implementation of the memetic algorithm. For this purpose, fourteen benchmark functions are selected as test problems. The obtained results indicate that our approach leads to speedups up to five thousand times higher compared to one CPU thread while maintaining a reasonable results quality. This clearly shows that GPUs have the potential to acceleration of MAs and allow them to solve much more complex tasks.

Keywords: memetic algorithm, CUDA, GPU-based memetic algorithm, open multi processing, multimodal functions, unimodal functions, non-linear optimization problems

Procedia PDF Downloads 70
4575 A Practical Protection Method for Parallel Transmission-Lines Based on the Fault Travelling-Waves

Authors: Mohammad Reza Ebrahimi

Abstract:

In new restructured power systems, swift fault detection is very important. The parallel transmission-lines are vastly used in this kind of power systems because of high amount of energy transferring. In this paper, a method based on the comparison of two schemes, i.e., i) maximum magnitude of travelling-wave (TW) energy ii) the instants of maximum energy occurrence at the circuits of parallel transmission-line is proposed. Using the travelling-wave of fault in order to faulted line identification this method has noticeable operation time. Moreover, the algorithm can cover for identification of faults as external or internal faults. For an internal fault, the exact location of the fault can be estimated confidently. A lot of simulations have been done with PSCAD/EMTDC to verify the performance of the proposed algorithm.

Keywords: travelling-wave, maximum energy, parallel transmission-line, fault location

Procedia PDF Downloads 164
4574 Large-Scale Simulations of Turbulence Using Discontinuous Spectral Element Method

Authors: A. Peyvan, D. Li, J. Komperda, F. Mashayek

Abstract:

Turbulence can be observed in a variety fluid motions in nature and industrial applications. Recent investment in high-speed aircraft and propulsion systems has revitalized fundamental research on turbulent flows. In these systems, capturing chaotic fluid structures with different length and time scales is accomplished through the Direct Numerical Simulation (DNS) approach since it accurately simulates flows down to smallest dissipative scales, i.e., Kolmogorov’s scales. The discontinuous spectral element method (DSEM) is a high-order technique that uses spectral functions for approximating the solution. The DSEM code has been developed by our research group over the course of more than two decades. Recently, the code has been improved to run large cases in the order of billions of solution points. Running big simulations requires a considerable amount of RAM. Therefore, the DSEM code must be highly parallelized and able to start on multiple computational nodes on an HPC cluster with distributed memory. However, some pre-processing procedures, such as determining global element information, creating a global face list, and assigning global partitioning and element connection information of the domain for communication, must be done sequentially with a single processing core. A separate code has been written to perform the pre-processing procedures on a local machine. It stores the minimum amount of information that is required for the DSEM code to start in parallel, extracted from the mesh file, into text files (pre-files). It packs integer type information with a Stream Binary format in pre-files that are portable between machines. The files are generated to ensure fast read performance on different file-systems, such as Lustre and General Parallel File System (GPFS). A new subroutine has been added to the DSEM code to read the startup files using parallel MPI I/O, for Lustre, in a way that each MPI rank acquires its information from the file in parallel. In case of GPFS, in each computational node, a single MPI rank reads data from the file, which is specifically generated for the computational node, and send them to other ranks on the node using point to point non-blocking MPI communication. This way, communication takes place locally on each node and signals do not cross the switches of the cluster. The read subroutine has been tested on Argonne National Laboratory’s Mira (GPFS), National Center for Supercomputing Application’s Blue Waters (Lustre), San Diego Supercomputer Center’s Comet (Lustre), and UIC’s Extreme (Lustre). The tests showed that one file per node is suited for GPFS and parallel MPI I/O is the best choice for Lustre file system. The DSEM code relies on heavily optimized linear algebra operation such as matrix-matrix and matrix-vector products for calculation of the solution in every time-step. For this, the code can either make use of its matrix math library, BLAS, Intel MKL, or ATLAS. This fact and the discontinuous nature of the method makes the DSEM code run efficiently in parallel. The results of weak scaling tests performed on Blue Waters showed a scalable and efficient performance of the code in parallel computing.

Keywords: computational fluid dynamics, direct numerical simulation, spectral element, turbulent flow

Procedia PDF Downloads 118
4573 A Parallel Computation Based on GPU Programming for a 3D Compressible Fluid Flow Simulation

Authors: Sugeng Rianto, P.W. Arinto Yudi, Soemarno Muhammad Nurhuda

Abstract:

A computation of a 3D compressible fluid flow for virtual environment with haptic interaction can be a non-trivial issue. This is especially how to reach good performances and balancing between visualization, tactile feedback interaction, and computations. In this paper, we describe our approach of computation methods based on parallel programming on a GPU. The 3D fluid flow solvers have been developed for smoke dispersion simulation by using combinations of the cubic interpolated propagation (CIP) based fluid flow solvers and the advantages of the parallelism and programmability of the GPU. The fluid flow solver is generated in the GPU-CPU message passing scheme to get rapid development of haptic feedback modes for fluid dynamic data. A rapid solution in fluid flow solvers is developed by applying cubic interpolated propagation (CIP) fluid flow solvers. From this scheme, multiphase fluid flow equations can be solved simultaneously. To get more acceleration in the computation, the Navier-Stoke Equations (NSEs) is packed into channels of texel, where computation models are performed on pixels that can be considered to be a grid of cells. Therefore, despite of the complexity of the obstacle geometry, processing on multiple vertices and pixels can be done simultaneously in parallel. The data are also shared in global memory for CPU to control the haptic in providing kinaesthetic interaction and felling. The results show that GPU based parallel computation approaches provide effective simulation of compressible fluid flow model for real-time interaction in 3D computer graphic for PC platform. This report has shown the feasibility of a new approach of solving the compressible fluid flow equations on the GPU. The experimental tests proved that the compressible fluid flowing on various obstacles with haptic interactions on the few model obstacles can be effectively and efficiently simulated on the reasonable frame rate with a realistic visualization. These results confirm that good performances and balancing between visualization, tactile feedback interaction, and computations can be applied successfully.

Keywords: CIP, compressible fluid, GPU programming, parallel computation, real-time visualisation

Procedia PDF Downloads 410
4572 Identification of Vehicle Dynamic Parameters by Using Optimized Exciting Trajectory on 3- DOF Parallel Manipulator

Authors: Di Yao, Gunther Prokop, Kay Buttner

Abstract:

Dynamic parameters, including the center of gravity, mass and inertia moments of vehicle, play an essential role in vehicle simulation, collision test and real-time control of vehicle active systems. To identify the important vehicle dynamic parameters, a systematic parameter identification procedure is studied in this work. In the first step of the procedure, a conceptual parallel manipulator (virtual test rig), which possesses three rotational degrees-of-freedom, is firstly proposed. To realize kinematic characteristics of the conceptual parallel manipulator, the kinematic analysis consists of inverse kinematic and singularity architecture is carried out. Based on the Euler's rotation equations for rigid body dynamics, the dynamic model of parallel manipulator and derivation of measurement matrix for parameter identification are presented subsequently. In order to reduce the sensitivity of parameter identification to measurement noise and other unexpected disturbances, a parameter optimization process of searching for optimal exciting trajectory of parallel manipulator is conducted in the following section. For this purpose, the 321-Euler-angles defined by parameterized finite-Fourier-series are primarily used to describe the general exciting trajectory of parallel manipulator. To minimize the condition number of measurement matrix for achieving better parameter identification accuracy, the unknown coefficients of parameterized finite-Fourier-series are estimated by employing an iterative algorithm based on MATLAB®. Meanwhile, the iterative algorithm will ensure the parallel manipulator still keeps in an achievable working status during the execution of optimal exciting trajectory. It is showed that the proposed procedure and methods in this work can effectively identify the vehicle dynamic parameters and could be an important application of parallel manipulator in the fields of parameter identification and test rig development.

Keywords: parameter identification, parallel manipulator, singularity architecture, dynamic modelling, exciting trajectory

Procedia PDF Downloads 243
4571 Analyzing the Factors that Cause Parallel Performance Degradation in Parallel Graph-Based Computations Using Graph500

Authors: Mustafa Elfituri, Jonathan Cook

Abstract:

Recently, graph-based computations have become more important in large-scale scientific computing as they can provide a methodology to model many types of relations between independent objects. They are being actively used in fields as varied as biology, social networks, cybersecurity, and computer networks. At the same time, graph problems have some properties such as irregularity and poor locality that make their performance different than regular applications performance. Therefore, parallelizing graph algorithms is a hard and challenging task. Initial evidence is that standard computer architectures do not perform very well on graph algorithms. Little is known exactly what causes this. The Graph500 benchmark is a representative application for parallel graph-based computations, which have highly irregular data access and are driven more by traversing connected data than by computation. In this paper, we present results from analyzing the performance of various example implementations of Graph500, including a shared memory (OpenMP) version, a distributed (MPI) version, and a hybrid version. We measured and analyzed all the factors that affect its performance in order to identify possible changes that would improve its performance. Results are discussed in relation to what factors contribute to performance degradation.

Keywords: graph computation, graph500 benchmark, parallel architectures, parallel programming, workload characterization.

Procedia PDF Downloads 123