Search results for: parallel processing
4727 Fractional Residue Number System
Authors: Parisa Khoshvaght, Mehdi Hosseinzadeh
Abstract:
During the past few years, the Residue Number System (RNS) has been receiving considerable interest due to its parallel and fault-tolerant properties. This system is a useful tool for Digital Signal Processing (DSP) since it can support parallel, carry-free, high-speed and low power arithmetic. One of the drawbacks of Residue Number System is the fractional numbers, that is, the corresponding circuit is very hard to realize in conventional CMOS technology. In this paper, we propose a method in which the numbers of transistors are significantly reduced. The related delay is extremely diminished, in the first glance we use this method to solve concerning problem of one decimal functional number some how this proposition can be extended to generalize the idea. Another advantage of this method is the independency on the kind of moduli.Keywords: computer arithmetic, residue number system, number system, one-Hot, VLSI
Procedia PDF Downloads 4954726 Parallel Random Number Generation for the Modern Supercomputer Architectures
Authors: Roman Snytsar
Abstract:
Pseudo-random numbers are often used in scientific computing such as the Monte Carlo Simulations or the Quantum Inspired Optimization. Requirements for a parallel random number generator running in the modern multi-core vector environment are more stringent than those for sequential random number generators. As well as passing the usual quality tests, the output of the parallel random number generator must be verifiable and reproducible throughout the concurrent execution. We propose a family of vectorized Permuted Congruential Generators. Implementations are available for multiple modern vector modern computer architectures. Besides demonstrating good single core performance, the generators scale easily across many processor cores and multiple distributed nodes. We provide performance and parallel speedup analysis and comparisons between the implementations.Keywords: pseudo-random numbers, quantum optimization, SIMD, parallel computing
Procedia PDF Downloads 1204725 Classification Rule Discovery by Using Parallel Ant Colony Optimization
Authors: Waseem Shahzad, Ayesha Tahir Khan, Hamid Hussain Awan
Abstract:
Ant-Miner algorithm that lies under ACO algorithms is used to extract knowledge from data in the form of rules. A variant of Ant-Miner algorithm named as cAnt-MinerPB is used to generate list of rules using pittsburgh approach in order to maintain the rule interaction among the rules that are generated. In this paper, we propose a parallel Ant MinerPB in which Ant colony optimization algorithm runs parallel. In this technique, a data set is divided vertically (i-e attributes) into different subsets. These subsets are created based on the correlation among attributes using Mutual Information (MI). It generates rules in a parallel manner and then merged to form a final list of rules. The results have shown that the proposed technique achieved higher accuracy when compared with original cAnt-MinerPB and also the execution time has also reduced.Keywords: ant colony optimization, parallel Ant-MinerPB, vertical partitioning, classification rule discovery
Procedia PDF Downloads 2954724 Quantitative Analysis of Multiprocessor Architectures for Radar Signal Processing
Authors: Deepak Kumar, Debasish Deb, Reena Mamgain
Abstract:
Radar signal processing requires high number crunching capability. Most often this is achieved using multiprocessor platform. Though multiprocessor platform provides the capability of meeting the real time computational challenges, the architecture of the same along with mapping of the algorithm on the architecture plays a vital role in efficiently using the platform. Towards this, along with standard performance metrics, few additional metrics are defined which helps in evaluating the multiprocessor platform along with the algorithm mapping. A generic multiprocessor architecture can not suit all the processing requirements. Depending on the system requirement and type of algorithms used, the most suitable architecture for the given problem is decided. In the paper, we study different architectures and quantify the different performance metrics which enables comparison of different architectures for their merit. We also carried out case study of different architectures and their efficiency depending on parallelism exploited on algorithm or data or both.Keywords: radar signal processing, multiprocessor architecture, efficiency, load imbalance, buffer requirement, pipeline, parallel, hybrid, cluster of processors (COPs)
Procedia PDF Downloads 4124723 Pushing the Boundary of Parallel Tractability for Ontology Materialization via Boolean Circuits
Authors: Zhangquan Zhou, Guilin Qi
Abstract:
Materialization is an important reasoning service for applications built on the Web Ontology Language (OWL). To make materialization efficient in practice, current research focuses on deciding tractability of an ontology language and designing parallel reasoning algorithms. However, some well-known large-scale ontologies, such as YAGO, have been shown to have good performance for parallel reasoning, but they are expressed in ontology languages that are not parallelly tractable, i.e., the reasoning is inherently sequential in the worst case. This motivates us to study the problem of parallel tractability of ontology materialization from a theoretical perspective. That is we aim to identify the ontologies for which materialization is parallelly tractable, i.e., in the NC complexity. Since the NC complexity is defined based on Boolean circuit that is widely used to investigate parallel computing problems, we first transform the problem of materialization to evaluation of Boolean circuits, and then study the problem of parallel tractability based on circuits. In this work, we focus on datalog rewritable ontology languages. We use Boolean circuits to identify two classes of datalog rewritable ontologies (called parallelly tractable classes) such that materialization over them is parallelly tractable. We further investigate the parallel tractability of materialization of a datalog rewritable OWL fragment DHL (Description Horn Logic). Based on the above results, we analyze real-world datasets and show that many ontologies expressed in DHL belong to the parallelly tractable classes.Keywords: ontology materialization, parallel reasoning, datalog, Boolean circuit
Procedia PDF Downloads 2714722 Researching Apache Hama: A Pure BSP Computing Framework
Authors: Kamran Siddique, Yangwoo Kim, Zahid Akhtar
Abstract:
In recent years, the technological advancements have led to a deluge of data from distinctive domains and the need for development of solutions based on parallel and distributed computing has still long way to go. That is why, the research and development of massive computing frameworks is continuously growing. At this particular stage, highlighting a potential research area along with key insights could be an asset for researchers in the field. Therefore, this paper explores one of the emerging distributed computing frameworks, Apache Hama. It is a Top Level Project under the Apache Software Foundation, based on Bulk Synchronous Processing (BSP). We present an unbiased and critical interrogation session about Apache Hama and conclude research directions in order to assist interested researchers.Keywords: apache hama, bulk synchronous parallel, BSP, distributed computing
Procedia PDF Downloads 2504721 Comparative Analysis of Classical and Parallel Inpainting Algorithms Based on Affine Combinations of Projections on Convex Sets
Authors: Irina Maria Artinescu, Costin Radu Boldea, Eduard-Ionut Matei
Abstract:
The paper is a comparative study of two classical variants of parallel projection methods for solving the convex feasibility problem with their equivalents that involve variable weights in the construction of the solutions. We used a graphical representation of these methods for inpainting a convex area of an image in order to investigate their effectiveness in image reconstruction applications. We also presented a numerical analysis of the convergence of these four algorithms in terms of the average number of steps and execution time in classical CPU and, alternatively, in parallel GPU implementation.Keywords: convex feasibility problem, convergence analysis, inpainting, parallel projection methods
Procedia PDF Downloads 1744720 Achievable Average Secrecy Rates over Bank of Parallel Independent Fading Channels with Friendly Jamming
Authors: Munnujahan Ara
Abstract:
In this paper, we investigate the effect of friendly jamming power allocation strategies on the achievable average secrecy rate over a bank of parallel fading wiretap channels. We investigate the achievable average secrecy rate in parallel fading wiretap channels subject to Rayleigh and Rician fading. The achievable average secrecy rate, due to the presence of a line-of-sight component in the jammer channel is also evaluated. Moreover, we study the detrimental effect of correlation across the parallel sub-channels, and evaluate the corresponding decrease in the achievable average secrecy rate for the various fading configurations. We also investigate the tradeoff between the transmission power and the jamming power for a fixed total power budget. Our results, which are applicable to current orthogonal frequency division multiplexing (OFDM) communications systems, shed further light on the achievable average secrecy rates over a bank of parallel fading channels in the presence of friendly jammers.Keywords: fading parallel channels, wire-tap channel, OFDM, secrecy capacity, power allocation
Procedia PDF Downloads 5124719 Machine Learning Approach for Mutation Testing
Authors: Michael Stewart
Abstract:
Mutation testing is a type of software testing proposed in the 1970s where program statements are deliberately changed to introduce simple errors so that test cases can be validated to determine if they can detect the errors. Test cases are executed against the mutant code to determine if one fails, detects the error and ensures the program is correct. One major issue with this type of testing was it became intensive computationally to generate and test all possible mutations for complex programs. This paper used reinforcement learning and parallel processing within the context of mutation testing for the selection of mutation operators and test cases that reduced the computational cost of testing and improved test suite effectiveness. Experiments were conducted using sample programs to determine how well the reinforcement learning-based algorithm performed with one live mutation, multiple live mutations and no live mutations. The experiments, measured by mutation score, were used to update the algorithm and improved accuracy for predictions. The performance was then evaluated on multiple processor computers. With reinforcement learning, the mutation operators utilized were reduced by 50 – 100%.Keywords: automated-testing, machine learning, mutation testing, parallel processing, reinforcement learning, software engineering, software testing
Procedia PDF Downloads 1984718 Autism Disease Detection Using Transfer Learning Techniques: Performance Comparison between Central Processing Unit vs. Graphics Processing Unit Functions for Neural Networks
Authors: Mst Shapna Akter, Hossain Shahriar
Abstract:
Neural network approaches are machine learning methods used in many domains, such as healthcare and cyber security. Neural networks are mostly known for dealing with image datasets. While training with the images, several fundamental mathematical operations are carried out in the Neural Network. The operation includes a number of algebraic and mathematical functions, including derivative, convolution, and matrix inversion and transposition. Such operations require higher processing power than is typically needed for computer usage. Central Processing Unit (CPU) is not appropriate for a large image size of the dataset as it is built with serial processing. While Graphics Processing Unit (GPU) has parallel processing capabilities and, therefore, has higher speed. This paper uses advanced Neural Network techniques such as VGG16, Resnet50, Densenet, Inceptionv3, Xception, Mobilenet, XGBOOST-VGG16, and our proposed models to compare CPU and GPU resources. A system for classifying autism disease using face images of an autistic and non-autistic child was used to compare performance during testing. We used evaluation matrices such as Accuracy, F1 score, Precision, Recall, and Execution time. It has been observed that GPU runs faster than the CPU in all tests performed. Moreover, the performance of the Neural Network models in terms of accuracy increases on GPU compared to CPU.Keywords: autism disease, neural network, CPU, GPU, transfer learning
Procedia PDF Downloads 1184717 A Parallel Algorithm for Solving the PFSP on the Grid
Authors: Samia Kouki
Abstract:
Solving NP-hard combinatorial optimization problems by exact search methods, such as Branch-and-Bound, may degenerate to complete enumeration. For that reason, exact approaches limit us to solve only small or moderate size problem instances, due to the exponential increase in CPU time when problem size increases. One of the most promising ways to reduce significantly the computational burden of sequential versions of Branch-and-Bound is to design parallel versions of these algorithms which employ several processors. This paper describes a parallel Branch-and-Bound algorithm called GALB for solving the classical permutation flowshop scheduling problem as well as its implementation on a Grid computing infrastructure. The experimental study of our distributed parallel algorithm gives promising results and shows clearly the benefit of the parallel paradigm to solve large-scale instances in moderate CPU time.Keywords: grid computing, permutation flow shop problem, branch and bound, load balancing
Procedia PDF Downloads 2834716 The Parallelization of Algorithm Based on Partition Principle for Association Rules Discovery
Authors: Khadidja Belbachir, Hafida Belbachir
Abstract:
subsequently the expansion of the physical supports storage and the needs ceaseless to accumulate several data, the sequential algorithms of associations’ rules research proved to be ineffective. Thus the introduction of the new parallel versions is imperative. We propose in this paper, a parallel version of a sequential algorithm “Partition”. This last is fundamentally different from the other sequential algorithms, because it scans the data base only twice to generate the significant association rules. By consequence, the parallel approach does not require much communication between the sites. The proposed approach was implemented for an experimental study. The obtained results, shows a great reduction in execution time compared to the sequential version and Count Distributed algorithm.Keywords: association rules, distributed data mining, partition, parallel algorithms
Procedia PDF Downloads 4154715 Spatial Audio Player Using Musical Genre Classification
Authors: Jun-Yong Lee, Hyoung-Gook Kim
Abstract:
In this paper, we propose a smart music player that combines the musical genre classification and the spatial audio processing. The musical genre is classified based on content analysis of the musical segment detected from the audio stream. In parallel with the classification, the spatial audio quality is achieved by adding an artificial reverberation in a virtual acoustic space to the input mono sound. Thereafter, the spatial sound is boosted with the given frequency gains based on the musical genre when played back. Experiments measured the accuracy of detecting the musical segment from the audio stream and its musical genre classification. A listening test was performed based on the virtual acoustic space based spatial audio processing.Keywords: automatic equalization, genre classification, music segment detection, spatial audio processing
Procedia PDF Downloads 4294714 Study of Temperature Difference and Current Distribution in Parallel-Connected Cells at Low Temperature
Authors: Sara Kamalisiahroudi, Jun Huang, Zhe Li, Jianbo Zhang
Abstract:
Two types of commercial cylindrical lithium ion batteries (Panasonic 3.4 Ah NCR-18650B and Samsung 2.9 Ah INR-18650), were investigated experimentally. The capacities of these samples were individually measured using constant current-constant voltage (CC-CV) method at different ambient temperatures (-10 ℃, 0 ℃, 25 ℃). Their internal resistance was determined by electrochemical impedance spectroscopy (EIS) and pulse discharge methods. The cells with different configurations of parallel connection NCR-NCR, INR-INR and NCR-INR were charged/discharged at the aforementioned ambient temperatures. The results showed that the difference of internal resistance between cells much more evident at low temperatures. Furthermore, the parallel connection of NCR-NCR exhibits the most uniform temperature distribution in cells at -10 ℃, this feature is quite favorable for the safety of the battery pack.Keywords: batteries in parallel connection, internal resistance, low temperature, temperature difference, current distribution
Procedia PDF Downloads 4784713 4-DOFs Parallel Mechanism for Minimally Invasive Robotic Surgery
Authors: Khalil Ibrahim, Ahmed Ramadan, Mohamed Fanni, Yo Kobayashi, Ahmed Abo-Ismail, Masakatus G. Fujie
Abstract:
This paper deals with the design process and the dynamic control simulation of a new type of 4-DOFs parallel mechanism that can be used as an endoscopic surgical manipulator. The proposed mechanism, 2-PUU_2-PUS, is designed based on the screw theory and the parallel virtual chain type synthesis method. Based on the structure analysis of the 4-DOF parallel mechanism, the inverse position equation is studied using the inverse analysis theory of kinematics. The design and the stress analysis of the mechanism are investigated using SolidWorks software. The virtual prototype of the parallel mechanism is constructed, and the dynamic simulation is performed using ADAMS TM software. The system model utilizing PID and PI controllers has been built using MATLAB software. A more realistic simulation in accordance with a given bending angle and point to point control is implemented by the use of both ADAMS/MATLAB software. The simulation results showed that this control method has solved the coordinate control for the 4-DOF parallel manipulator so that each output is feedback to the four driving rods. From the results, the tracking performance is achieved. Other control techniques, such as intelligent ones, are recommended to improve the tracking performance and reduce the numerical truncation error.Keywords: parallel mechanisms, medical robotics, tracjectory control, virtual chain type synthesis method
Procedia PDF Downloads 4684712 Exploring MPI-Based Parallel Computing in Analyzing Very Large Sequences
Authors: Bilal Wajid, Erchin Serpedin
Abstract:
The health industry is aiming towards personalized medicine. If the patient’s genome needs to be sequenced it is important that the entire analysis be completed quickly. This paper explores use of parallel computing to analyze very large sequences. Two cases have been considered. In the first case, the sequence is kept constant and the effect of increasing the number of MPI-based processes is evaluated in terms of execution time, speed and efficiency. In the second case the number of MPI-based processes have been kept constant whereas, the length of the sequence was increased.Keywords: parallel computing, alignment, genome assembly, alignment
Procedia PDF Downloads 2744711 Security Over OFDM Fading Channels with Friendly Jammer
Authors: Munnujahan Ara
Abstract:
In this paper, we investigate the effect of friendly jamming power allocation strategies on the achievable average secrecy rate over a bank of parallel fading wiretap channels. We investigate the achievable average secrecy rate in parallel fading wiretap channels subject to Rayleigh and Rician fading. The achievable average secrecy rate, due to the presence of a line-of-sight component in the jammer channel is also evaluated. Moreover, we study the detrimental effect of correlation across the parallel sub-channels, and evaluate the corresponding decrease in the achievable average secrecy rate for the various fading configurations. We also investigate the tradeoff between the transmission power and the jamming power for a fixed total power budget. Our results, which are applicable to current orthogonal frequency division multiplexing (OFDM) communications systems, shed further light on the achievable average secrecy rates over a bank of parallel fading channels in the presence of friendly jammers.Keywords: fading parallel channels, wire-tap channel, OFDM, secrecy capacity, power allocation
Procedia PDF Downloads 5034710 Optoelectronic Hardware Architecture for Recurrent Learning Algorithm in Image Processing
Authors: Abdullah Bal, Sevdenur Bal
Abstract:
This paper purposes a new type of hardware application for training of cellular neural networks (CNN) using optical joint transform correlation (JTC) architecture for image feature extraction. CNNs require much more computation during the training stage compare to test process. Since optoelectronic hardware applications offer possibility of parallel high speed processing capability for 2D data processing applications, CNN training algorithm can be realized using Fourier optics technique. JTC employs lens and CCD cameras with laser beam that realize 2D matrix multiplication and summation in the light speed. Therefore, in the each iteration of training, JTC carries more computation burden inherently and the rest of mathematical computation realized digitally. The bipolar data is encoded by phase and summation of correlation operations is realized using multi-object input joint images. Overlapping properties of JTC are then utilized for summation of two cross-correlations which provide less computation possibility for training stage. Phase-only JTC does not require data rearrangement, electronic pre-calculation and strict system alignment. The proposed system can be incorporated simultaneously with various optical image processing or optical pattern recognition techniques just in the same optical system.Keywords: CNN training, image processing, joint transform correlation, optoelectronic hardware
Procedia PDF Downloads 5064709 Numerical Studies for Standard Bi-Conjugate Gradient Stabilized Method and the Parallel Variants for Solving Linear Equations
Authors: Kuniyoshi Abe
Abstract:
Bi-conjugate gradient (Bi-CG) is a well-known method for solving linear equations Ax = b, for x, where A is a given n-by-n matrix, and b is a given n-vector. Typically, the dimension of the linear equation is high and the matrix is sparse. A number of hybrid Bi-CG methods such as conjugate gradient squared (CGS), Bi-CG stabilized (Bi-CGSTAB), BiCGStab2, and BiCGstab(l) have been developed to improve the convergence of Bi-CG. Bi-CGSTAB has been most often used for efficiently solving the linear equation, but we have seen the convergence behavior with a long stagnation phase. In such cases, it is important to have Bi-CG coefficients that are as accurate as possible, and the stabilization strategy, which stabilizes the computation of the Bi-CG coefficients, has been proposed. It may avoid stagnation and lead to faster computation. Motivated by a large number of processors in present petascale high-performance computing hardware, the scalability of Krylov subspace methods on parallel computers has recently become increasingly prominent. The main bottleneck for efficient parallelization is the inner products which require a global reduction. The resulting global synchronization phases cause communication overhead on parallel computers. The parallel variants of Krylov subspace methods reducing the number of global communication phases and hiding the communication latency have been proposed. However, the numerical stability, specifically, the convergence speed of the parallel variants of Bi-CGSTAB may become worse than that of the standard Bi-CGSTAB. In this paper, therefore, we compare the convergence speed between the standard Bi-CGSTAB and the parallel variants by numerical experiments and show that the convergence speed of the standard Bi-CGSTAB is faster than the parallel variants. Moreover, we propose the stabilization strategy for the parallel variants.Keywords: bi-conjugate gradient stabilized method, convergence speed, Krylov subspace methods, linear equations, parallel variant
Procedia PDF Downloads 1644708 The Characteristics of Settlement Owing to the Construction of Several Parallel Tunnels with Short Distances
Authors: Lojain Suliman, Xinrong Liu, Xiaohan Zhou
Abstract:
Since most tunnels are built in crowded metropolitan settings, the excavation process must take place in highly condensed locations, including high-density cities. In this way, the tunnels are typically located close together, which leads to more interaction between the parallel existing tunnels, and this, in turn, leads to more settlement. This research presents an examination of the impact of a large-scale tunnel excavation on two forms of settlement: surface settlement and settlement surrounding the tunnel. Additionally, research has been done on the properties of interactions between two and three parallel tunnels. The settlement has been evaluated using three primary techniques: theoretical modeling, numerical simulation, and data monitoring. Additionally, a parametric investigation on how distance affects the settlement characteristic for parallel tunnels with short distances has been completed. Additionally, it has been observed that the sequence of excavation has an impact on the behavior of settlements. Nevertheless, a comparison of the model test and numerical simulation yields significant agreement in terms of settlement trend and value. Additionally, when compared to the FEM study, the suggested analytical solution exhibits reduced sensitivity in the settlement prediction. For example, the settlement of the small tunnel diameter does not appear clearly on the settlement curve, while it is notable in the FEM analysis. It is advised, however, that additional studies be conducted in the future employing analytical solutions for settlement prediction for parallel tunnels.Keywords: settlement, FEM, analytical solution, parallel tunnels
Procedia PDF Downloads 364707 Massively-Parallel Bit-Serial Neural Networks for Fast Epilepsy Diagnosis: A Feasibility Study
Authors: Si Mon Kueh, Tom J. Kazmierski
Abstract:
There are about 1% of the world population suffering from the hidden disability known as epilepsy and major developing countries are not fully equipped to counter this problem. In order to reduce the inconvenience and danger of epilepsy, different methods have been researched by using a artificial neural network (ANN) classification to distinguish epileptic waveforms from normal brain waveforms. This paper outlines the aim of achieving massive ANN parallelization through a dedicated hardware using bit-serial processing. The design of this bit-serial Neural Processing Element (NPE) is presented which implements the functionality of a complete neuron using variable accuracy. The proposed design has been tested taking into consideration non-idealities of a hardware ANN. The NPE consists of a bit-serial multiplier which uses only 16 logic elements on an Altera Cyclone IV FPGA and a bit-serial ALU as well as a look-up table. Arrays of NPEs can be driven by a single controller which executes the neural processing algorithm. In conclusion, the proposed compact NPE design allows the construction of complex hardware ANNs that can be implemented in a portable equipment that suits the needs of a single epileptic patient in his or her daily activities to predict the occurrences of impending tonic conic seizures.Keywords: Artificial Neural Networks (ANN), bit-serial neural processor, FPGA, Neural Processing Element (NPE)
Procedia PDF Downloads 3214706 Portable and Parallel Accelerated Development Method for Field-Programmable Gate Array (FPGA)-Central Processing Unit (CPU)- Graphics Processing Unit (GPU) Heterogeneous Computing
Authors: Nan Hu, Chao Wang, Xi Li, Xuehai Zhou
Abstract:
The field-programmable gate array (FPGA) has been widely adopted in the high-performance computing domain. In recent years, the embedded system-on-a-chip (SoC) contains coarse granularity multi-core CPU (central processing unit) and mobile GPU (graphics processing unit) that can be used as general-purpose accelerators. The motivation is that algorithms of various parallel characteristics can be efficiently mapped to the heterogeneous architecture coupled with these three processors. The CPU and GPU offload partial computationally intensive tasks from the FPGA to reduce the resource consumption and lower the overall cost of the system. However, in present common scenarios, the applications always utilize only one type of accelerator because the development approach supporting the collaboration of the heterogeneous processors faces challenges. Therefore, a systematic approach takes advantage of write-once-run-anywhere portability, high execution performance of the modules mapped to various architectures and facilitates the exploration of design space. In this paper, A servant-execution-flow model is proposed for the abstraction of the cooperation of the heterogeneous processors, which supports task partition, communication and synchronization. At its first run, the intermediate language represented by the data flow diagram can generate the executable code of the target processor or can be converted into high-level programming languages. The instantiation parameters efficiently control the relationship between the modules and computational units, including two hierarchical processing units mapping and adjustment of data-level parallelism. An embedded system of a three-dimensional waveform oscilloscope is selected as a case study. The performance of algorithms such as contrast stretching, etc., are analyzed with implementations on various combinations of these processors. The experimental results show that the heterogeneous computing system with less than 35% resources achieves similar performance to the pure FPGA and approximate energy efficiency.Keywords: FPGA-CPU-GPU collaboration, design space exploration, heterogeneous computing, intermediate language, parameterized instantiation
Procedia PDF Downloads 1184705 Parallel Evaluation of Sommerfeld Integrals for Multilayer Dyadic Green's Function
Authors: Duygu Kan, Mehmet Cayoren
Abstract:
Sommerfeld-integrals (SIs) are commonly encountered in electromagnetics problems involving analysis of antennas and scatterers embedded in planar multilayered media. Generally speaking, the analytical solution of SIs is unavailable, and it is well known that numerical evaluation of SIs is very time consuming and computationally expensive due to the highly oscillating and slowly decaying nature of the integrands. Therefore, fast computation of SIs has a paramount importance. In this paper, a parallel code has been developed to speed up the computation of SI in the framework of calculation of dyadic Green’s function in multilayered media. OpenMP shared memory approach is used to parallelize the SI algorithm and resulted in significant time savings. Moreover accelerating the computation of dyadic Green’s function is discussed based on the parallel SI algorithm developed.Keywords: Sommerfeld-integrals, multilayer dyadic Green’s function, OpenMP, shared memory parallel programming
Procedia PDF Downloads 2474704 GPU-Accelerated Triangle Mesh Simplification Using Parallel Vertex Removal
Authors: Thomas Odaker, Dieter Kranzlmueller, Jens Volkert
Abstract:
We present an approach to triangle mesh simplification designed to be executed on the GPU. We use a quadric error metric to calculate an error value for each vertex of the mesh and order all vertices based on this value. This step is followed by the parallel removal of a number of vertices with the lowest calculated error values. To allow for the parallel removal of multiple vertices we use a set of per-vertex boundaries that prevent mesh foldovers even when simplification operations are performed on neighbouring vertices. We execute multiple iterations of the calculation of the vertex errors, ordering of the error values and removal of vertices until either a desired number of vertices remains in the mesh or a minimum error value is reached. This parallel approach is used to speed up the simplification process while maintaining mesh topology and avoiding foldovers at every step of the simplification.Keywords: computer graphics, half edge collapse, mesh simplification, precomputed simplification, topology preserving
Procedia PDF Downloads 3674703 Parallel Asynchronous Multi-Splitting Methods for Differential Algebraic Systems
Authors: Malika Elkyal
Abstract:
We consider an iterative parallel multi-splitting method for differential algebraic equations. The main feature of the proposed idea is to use the asynchronous form. We prove that the multi-splitting technique can effectively accelerate the convergent performance of the iterative process. The main characteristic of an asynchronous mode is that the local algorithm does not have to wait at predetermined messages to become available. We allow some processors to communicate more frequently than others, and we allow the communication delays to be substantial and unpredictable. Accordingly, we note that synchronous algorithms in the computer science sense are particular cases of our formulation of asynchronous one.Keywords: parallel methods, asynchronous mode, multisplitting, differential algebraic equations
Procedia PDF Downloads 5584702 Natural Convection between Two Parallel Wavy Plates
Authors: Si Abdallah Mayouf
Abstract:
In this work, the effects of the wavy surface on free convection heat transfer boundary layer flow between two parallel wavy plates have been studied numerically. The two plates are considered at a constant temperature. The equations and the boundary conditions are discretized by the finite difference scheme and solved numerically using the Gauss-Seidel algorithm. The important parameters in this problem are the amplitude of the wavy surfaces and the distance between the two wavy plates. Results are presented as velocity profiles, temperature profiles and local Nusselt number according to the important parameters.Keywords: free convection, wavy surface, parallel plates, fluid dynamics
Procedia PDF Downloads 3074701 Parallelizing the Hybrid Pseudo-Spectral Time Domain/Finite Difference Time Domain Algorithms for the Large-Scale Electromagnetic Simulations Using Massage Passing Interface Library
Authors: Donggun Lee, Q-Han Park
Abstract:
Due to its coarse grid, the Pseudo-Spectral Time Domain (PSTD) method has advantages against the Finite Difference Time Domain (FDTD) method in terms of memory requirement and operation time. However, since the efficiency of parallelization is much lower than that of FDTD, PSTD is not a useful method for a large-scale electromagnetic simulation in a parallel platform. In this paper, we propose the parallelization technique of the hybrid PSTD-FDTD (HPF) method which simultaneously possesses the efficient parallelizability of FDTD and the quick speed and low memory requirement of PSTD. Parallelization cost of the HPF method is exactly the same as the parallel FDTD, but still, it occupies much less memory space and has faster operation speed than the parallel FDTD. Experiments in distributed memory systems have shown that the parallel HPF method saves up to 96% of the operation time and reduces 84% of the memory requirement. Also, by combining the OpenMP library to the MPI library, we further reduced the operation time of the parallel HPF method by 50%.Keywords: FDTD, hybrid, MPI, OpenMP, PSTD, parallelization
Procedia PDF Downloads 1484700 Detecting the Edge of Multiple Images in Parallel
Authors: Prakash K. Aithal, U. Dinesh Acharya, Rajesh Gopakumar
Abstract:
Edge is variation of brightness in an image. Edge detection is useful in many application areas such as finding forests, rivers from a satellite image, detecting broken bone in a medical image etc. The paper discusses about finding edge of multiple aerial images in parallel .The proposed work tested on 38 images 37 colored and one monochrome image. The time taken to process N images in parallel is equivalent to time taken to process 1 image in sequential. The proposed method achieves pixel level parallelism as well as image level parallelism.Keywords: edge detection, multicore, gpu, opencl, mpi
Procedia PDF Downloads 4774699 Comparison of Parallel CUDA and OpenMP Implementations of Memetic Algorithms for Solving Optimization Problems
Authors: Jason Digalakis, John Cotronis
Abstract:
Memetic algorithms (MAs) are useful for solving optimization problems. It is quite difficult to search the search space of the optimization problem with large dimensions. There is a challenge to use all the cores of the system. In this study, a sequential implementation of the memetic algorithm is converted into a concurrent version, which is executed on the cores of both CPU and GPU. For this reason, CUDA and OpenMP libraries are operated on the parallel algorithm to make a concurrent execution on CPU and GPU, respectively. The aim of this study is to compare CPU and GPU implementation of the memetic algorithm. For this purpose, fourteen benchmark functions are selected as test problems. The obtained results indicate that our approach leads to speedups up to five thousand times higher compared to one CPU thread while maintaining a reasonable results quality. This clearly shows that GPUs have the potential to acceleration of MAs and allow them to solve much more complex tasks.Keywords: memetic algorithm, CUDA, GPU-based memetic algorithm, open multi processing, multimodal functions, unimodal functions, non-linear optimization problems
Procedia PDF Downloads 1014698 Large-Scale Simulations of Turbulence Using Discontinuous Spectral Element Method
Authors: A. Peyvan, D. Li, J. Komperda, F. Mashayek
Abstract:
Turbulence can be observed in a variety fluid motions in nature and industrial applications. Recent investment in high-speed aircraft and propulsion systems has revitalized fundamental research on turbulent flows. In these systems, capturing chaotic fluid structures with different length and time scales is accomplished through the Direct Numerical Simulation (DNS) approach since it accurately simulates flows down to smallest dissipative scales, i.e., Kolmogorov’s scales. The discontinuous spectral element method (DSEM) is a high-order technique that uses spectral functions for approximating the solution. The DSEM code has been developed by our research group over the course of more than two decades. Recently, the code has been improved to run large cases in the order of billions of solution points. Running big simulations requires a considerable amount of RAM. Therefore, the DSEM code must be highly parallelized and able to start on multiple computational nodes on an HPC cluster with distributed memory. However, some pre-processing procedures, such as determining global element information, creating a global face list, and assigning global partitioning and element connection information of the domain for communication, must be done sequentially with a single processing core. A separate code has been written to perform the pre-processing procedures on a local machine. It stores the minimum amount of information that is required for the DSEM code to start in parallel, extracted from the mesh file, into text files (pre-files). It packs integer type information with a Stream Binary format in pre-files that are portable between machines. The files are generated to ensure fast read performance on different file-systems, such as Lustre and General Parallel File System (GPFS). A new subroutine has been added to the DSEM code to read the startup files using parallel MPI I/O, for Lustre, in a way that each MPI rank acquires its information from the file in parallel. In case of GPFS, in each computational node, a single MPI rank reads data from the file, which is specifically generated for the computational node, and send them to other ranks on the node using point to point non-blocking MPI communication. This way, communication takes place locally on each node and signals do not cross the switches of the cluster. The read subroutine has been tested on Argonne National Laboratory’s Mira (GPFS), National Center for Supercomputing Application’s Blue Waters (Lustre), San Diego Supercomputer Center’s Comet (Lustre), and UIC’s Extreme (Lustre). The tests showed that one file per node is suited for GPFS and parallel MPI I/O is the best choice for Lustre file system. The DSEM code relies on heavily optimized linear algebra operation such as matrix-matrix and matrix-vector products for calculation of the solution in every time-step. For this, the code can either make use of its matrix math library, BLAS, Intel MKL, or ATLAS. This fact and the discontinuous nature of the method makes the DSEM code run efficiently in parallel. The results of weak scaling tests performed on Blue Waters showed a scalable and efficient performance of the code in parallel computing.Keywords: computational fluid dynamics, direct numerical simulation, spectral element, turbulent flow
Procedia PDF Downloads 133