Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3913

Search results for: parallel algorithm

3913 Flowing Online Vehicle GPS Data Clustering Using a New Parallel K-Means Algorithm

Authors: Orhun Vural, Oguz Bayat, Rustu Akay, Osman N. Ucan


This study presents a new parallel approach clustering of GPS data. Evaluation has been made by comparing execution time of various clustering algorithms on GPS data. This paper aims to propose a parallel based on neighborhood K-means algorithm to make it faster. The proposed parallelization approach assumes that each GPS data represents a vehicle and to communicate between vehicles close to each other after vehicles are clustered. This parallelization approach has been examined on different sized continuously changing GPS data and compared with serial K-means algorithm and other serial clustering algorithms. The results demonstrated that proposed parallel K-means algorithm has been shown to work much faster than other clustering algorithms.

Keywords: parallel k-means algorithm, parallel clustering, clustering algorithms, clustering on flowing data

Procedia PDF Downloads 114
3912 A Parallel Implementation of k-Means in MATLAB

Authors: Dimitris Varsamis, Christos Talagkozis, Alkiviadis Tsimpiris, Paris Mastorocostas


The aim of this work is the parallel implementation of k-means in MATLAB, in order to reduce the execution time. Specifically, a new function in MATLAB for serial k-means algorithm is developed, which meets all the requirements for the conversion to a function in MATLAB with parallel computations. Additionally, two different variants for the definition of initial values are presented. In the sequel, the parallel approach is presented. Finally, the performance tests for the computation times respect to the numbers of features and classes are illustrated.

Keywords: K-means algorithm, clustering, parallel computations, Matlab

Procedia PDF Downloads 295
3911 Constructing the Density of States from the Parallel Wang Landau Algorithm Overlapping Data

Authors: Arman S. Kussainov, Altynbek K. Beisekov


This work focuses on building an efficient universal procedure to construct a single density of states from the multiple pieces of data provided by the parallel implementation of the Wang Landau Monte Carlo based algorithm. The Ising and Pott models were used as the examples of the two-dimensional spin lattices to construct their densities of states. Sampled energy space was distributed between the individual walkers with certain overlaps. This was made to include the latest development of the algorithm as the density of states replica exchange technique. Several factors of immediate importance for the seamless stitching process have being considered. These include but not limited to the speed and universality of the initial parallel algorithm implementation as well as the data post-processing to produce the expected smooth density of states.

Keywords: density of states, Monte Carlo, parallel algorithm, Wang Landau algorithm

Procedia PDF Downloads 294
3910 An Improved Many Worlds Quantum Genetic Algorithm

Authors: Li Dan, Zhao Junsuo, Zhang Wenjun


Aiming at the shortcomings of the Quantum Genetic Algorithm such as the multimodal function optimization problems easily falling into the local optimum, and vulnerable to premature convergence due to no closely relationship between individuals, the paper presents an Improved Many Worlds Quantum Genetic Algorithm (IMWQGA). The paper using the concept of Many Worlds; using the derivative way of parallel worlds’ parallel evolution; putting forward the thought which updating the population according to the main body; adopting the transition methods such as parallel transition, backtracking, travel forth. In addition, the algorithm in the paper also proposes the quantum training operator and the combinatorial optimization operator as new operators of quantum genetic algorithm.

Keywords: quantum genetic algorithm, many worlds, quantum training operator, combinatorial optimization operator

Procedia PDF Downloads 638
3909 The Parallelization of Algorithm Based on Partition Principle for Association Rules Discovery

Authors: Khadidja Belbachir, Hafida Belbachir


subsequently the expansion of the physical supports storage and the needs ceaseless to accumulate several data, the sequential algorithms of associations’ rules research proved to be ineffective. Thus the introduction of the new parallel versions is imperative. We propose in this paper, a parallel version of a sequential algorithm “Partition”. This last is fundamentally different from the other sequential algorithms, because it scans the data base only twice to generate the significant association rules. By consequence, the parallel approach does not require much communication between the sites. The proposed approach was implemented for an experimental study. The obtained results, shows a great reduction in execution time compared to the sequential version and Count Distributed algorithm.

Keywords: association rules, distributed data mining, partition, parallel algorithms

Procedia PDF Downloads 318
3908 Parallel Version of Reinhard’s Color Transfer Algorithm

Authors: Abhishek Bhardwaj, Manish Kumar Bajpai


An image with its content and schema of colors presents an effective mode of information sharing and processing. By changing its color schema different visions and prospect are discovered by the users. This phenomenon of color transfer is being used by Social media and other channel of entertainment. Reinhard et al’s algorithm was the first one to solve this problem of color transfer. In this paper, we make this algorithm efficient by introducing domain parallelism among different processors. We also comment on the factors that affect the speedup of this problem. In the end by analyzing the experimental data we claim to propose a novel and efficient parallel Reinhard’s algorithm.

Keywords: Reinhard et al’s algorithm, color transferring, parallelism, speedup

Procedia PDF Downloads 516
3907 Parallel 2-Opt Local Search on GPU

Authors: Wen-Bao Qiao, Jean-Charles Créput


To accelerate the solution for large scale traveling salesman problems (TSP), a parallel 2-opt local search algorithm with simple implementation based on Graphics Processing Unit (GPU) is presented and tested in this paper. The parallel scheme is based on technique of data decomposition by dynamically assigning multiple K processors on the integral tour to treat K edges’ 2-opt local optimization simultaneously on independent sub-tours, where K can be user-defined or have a function relationship with input size N. We implement this algorithm with doubly linked list on GPU. The implementation only requires O(N) memory. We compare this parallel 2-opt local optimization against sequential exhaustive 2-opt search along integral tour on TSP instances from TSPLIB with more than 10000 cities.

Keywords: parallel 2-opt, double links, large scale TSP, GPU

Procedia PDF Downloads 266
3906 Classification Rule Discovery by Using Parallel Ant Colony Optimization

Authors: Waseem Shahzad, Ayesha Tahir Khan, Hamid Hussain Awan


Ant-Miner algorithm that lies under ACO algorithms is used to extract knowledge from data in the form of rules. A variant of Ant-Miner algorithm named as cAnt-MinerPB is used to generate list of rules using pittsburgh approach in order to maintain the rule interaction among the rules that are generated. In this paper, we propose a parallel Ant MinerPB in which Ant colony optimization algorithm runs parallel. In this technique, a data set is divided vertically (i-e attributes) into different subsets. These subsets are created based on the correlation among attributes using Mutual Information (MI). It generates rules in a parallel manner and then merged to form a final list of rules. The results have shown that the proposed technique achieved higher accuracy when compared with original cAnt-MinerPB and also the execution time has also reduced.

Keywords: ant colony optimization, parallel Ant-MinerPB, vertical partitioning, classification rule discovery

Procedia PDF Downloads 216
3905 A Genetic Algorithm for the Load Balance of Parallel Computational Fluid Dynamics Computation with Multi-Block Structured Mesh

Authors: Chunye Gong, Ming Tie, Jie Liu, Weimin Bao, Xinbiao Gan, Shengguo Li, Bo Yang, Xuguang Chen, Tiaojie Xiao, Yang Sun


Large-scale CFD simulation relies on high-performance parallel computing, and the load balance is the key role which affects the parallel efficiency. This paper focuses on the load-balancing problem of parallel CFD simulation with structured mesh. A mathematical model for this load-balancing problem is presented. The genetic algorithm, fitness computing, two-level code are designed. Optimal selector, robust operator, and local optimization operator are designed. The properties of the presented genetic algorithm are discussed in-depth. The effects of optimal selector, robust operator, and local optimization operator are proved by experiments. The experimental results of different test sets, DLR-F4, and aircraft design applications show the presented load-balancing algorithm is robust, quickly converged, and is useful in real engineering problems.

Keywords: genetic algorithm, load-balancing algorithm, optimal variation, local optimization

Procedia PDF Downloads 40
3904 A Parallel Algorithm for Solving the PFSP on the Grid

Authors: Samia Kouki


Solving NP-hard combinatorial optimization problems by exact search methods, such as Branch-and-Bound, may degenerate to complete enumeration. For that reason, exact approaches limit us to solve only small or moderate size problem instances, due to the exponential increase in CPU time when problem size increases. One of the most promising ways to reduce significantly the computational burden of sequential versions of Branch-and-Bound is to design parallel versions of these algorithms which employ several processors. This paper describes a parallel Branch-and-Bound algorithm called GALB for solving the classical permutation flowshop scheduling problem as well as its implementation on a Grid computing infrastructure. The experimental study of our distributed parallel algorithm gives promising results and shows clearly the benefit of the parallel paradigm to solve large-scale instances in moderate CPU time.

Keywords: grid computing, permutation flow shop problem, branch and bound, load balancing

Procedia PDF Downloads 208
3903 An Improved Parallel Algorithm of Decision Tree

Authors: Jiameng Wang, Yunfei Yin, Xiyu Deng


Parallel optimization is one of the important research topics of data mining at this stage. Taking Classification and Regression Tree (CART) parallelization as an example, this paper proposes a parallel data mining algorithm based on SSP-OGini-PCCP. Aiming at the problem of choosing the best CART segmentation point, this paper designs an S-SP model without data association; and in order to calculate the Gini index efficiently, a parallel OGini calculation method is designed. In addition, in order to improve the efficiency of the pruning algorithm, a synchronous PCCP pruning strategy is proposed in this paper. In this paper, the optimal segmentation calculation, Gini index calculation, and pruning algorithm are studied in depth. These are important components of parallel data mining. By constructing a distributed cluster simulation system based on SPARK, data mining methods based on SSP-OGini-PCCP are tested. Experimental results show that this method can increase the search efficiency of the best segmentation point by an average of 89%, increase the search efficiency of the Gini segmentation index by 3853%, and increase the pruning efficiency by 146% on average; and as the size of the data set increases, the performance of the algorithm remains stable, which meets the requirements of contemporary massive data processing.

Keywords: classification, Gini index, parallel data mining, pruning ahead

Procedia PDF Downloads 52
3902 Verification & Validation of Map Reduce Program Model for Parallel K-Mediod Algorithm on Hadoop Cluster

Authors: Trapti Sharma, Devesh Kumar Srivastava


This paper is basically a analysis study of above MapReduce implementation and also to verify and validate the MapReduce solution model for Parallel K-Mediod algorithm on Hadoop Cluster. MapReduce is a programming model which authorize the managing of huge amounts of data in parallel, on a large number of devices. It is specially well suited to constant or moderate changing set of data since the implementation point of a position is usually high. MapReduce has slowly become the framework of choice for “big data”. The MapReduce model authorizes for systematic and instant organizing of large scale data with a cluster of evaluate nodes. One of the primary affect in Hadoop is how to minimize the completion length (i.e. makespan) of a set of MapReduce duty. In this paper, we have verified and validated various MapReduce applications like wordcount, grep, terasort and parallel K-Mediod clustering algorithm. We have found that as the amount of nodes increases the completion time decreases.

Keywords: hadoop, mapreduce, k-mediod, validation, verification

Procedia PDF Downloads 278
3901 Series-Parallel Systems Reliability Optimization Using Genetic Algorithm and Statistical Analysis

Authors: Essa Abrahim Abdulgader Saleem, Thien-My Dao


The main objective of this paper is to optimize series-parallel system reliability using Genetic Algorithm (GA) and statistical analysis; considering system reliability constraints which involve the redundant numbers of selected components, total cost, and total weight. To perform this work, firstly the mathematical model which maximizes system reliability subject to maximum system cost and maximum system weight constraints is presented; secondly, a statistical analysis is used to optimize GA parameters, and thirdly GA is used to optimize series-parallel systems reliability. The objective is to determine the strategy choosing the redundancy level for each subsystem to maximize the overall system reliability subject to total cost and total weight constraints. Finally, the series-parallel system case study reliability optimization results are showed, and comparisons with the other previous results are presented to demonstrate the performance of our GA.

Keywords: reliability, optimization, meta-heuristic, genetic algorithm, redundancy

Procedia PDF Downloads 255
3900 Parallel Evaluation of Sommerfeld Integrals for Multilayer Dyadic Green's Function

Authors: Duygu Kan, Mehmet Cayoren


Sommerfeld-integrals (SIs) are commonly encountered in electromagnetics problems involving analysis of antennas and scatterers embedded in planar multilayered media. Generally speaking, the analytical solution of SIs is unavailable, and it is well known that numerical evaluation of SIs is very time consuming and computationally expensive due to the highly oscillating and slowly decaying nature of the integrands. Therefore, fast computation of SIs has a paramount importance. In this paper, a parallel code has been developed to speed up the computation of SI in the framework of calculation of dyadic Green’s function in multilayered media. OpenMP shared memory approach is used to parallelize the SI algorithm and resulted in significant time savings. Moreover accelerating the computation of dyadic Green’s function is discussed based on the parallel SI algorithm developed.

Keywords: Sommerfeld-integrals, multilayer dyadic Green’s function, OpenMP, shared memory parallel programming

Procedia PDF Downloads 169
3899 Fast Prediction Unit Partition Decision and Accelerating the Algorithm Using Cudafor Intra and Inter Prediction of HEVC

Authors: Qiang Zhang, Chun Yuan


Since the PU (Prediction Unit) decision process is the most time consuming part of the emerging HEVC (High Efficient Video Coding) standardin intra and inter frame coding, this paper proposes the fast PU decision algorithm and speed up the algorithm using CUDA (Compute Unified Device Architecture). In intra frame coding, the fast PU decision algorithm uses the texture features to skip intra-frame prediction or terminal the intra-frame prediction for smaller PU size. In inter frame coding of HEVC, the fast PU decision algorithm takes use of the similarity of its own two Nx2N size PU's motion vectors and the hierarchical structure of CU (Coding Unit) partition to skip some modes of PU partition, so as to reduce the motion estimation times. The accelerate algorithm using CUDA is based on the fast PU decision algorithm which uses the GPU to make the motion search and the gradient computation could be parallel computed. The proposed algorithm achieves up to 57% time saving compared to the HM 10.0 with little rate-distortion losses (0.043dB drop and 1.82% bitrate increase on average).

Keywords: HEVC, PU decision, inter prediction, intra prediction, CUDA, parallel

Procedia PDF Downloads 315
3898 The Vision Baed Parallel Robot Control

Authors: Sun Lim, Kyun Jung


In this paper, we describe the control strategy of high speed parallel robot system with EtherCAT network. This work deals the parallel robot system with centralized control on the real-time operating system such as window TwinCAT3. Most control scheme and algorithm is implemented master platform on the PC, the input and output interface is ported on the slave side. The data is transferred by maximum 20usecond with 1000byte. EtherCAT is very high speed and stable industrial network. The control strategy with EtherCAT is very useful and robust on Ethernet network environment. The developed parallel robot is controlled pre-design nonlinear controller for 6G/0.43 cycle time of pick and place motion tracking. The experiment shows the good design and validation of the controller.

Keywords: parallel robot control, etherCAT, nonlinear control, parallel robot inverse kinematic

Procedia PDF Downloads 488
3897 Collision Detection Algorithm Based on Data Parallelism

Authors: Zhen Peng, Baifeng Wu


Modern computing technology enters the era of parallel computing with the trend of sustainable and scalable parallelism. Single Instruction Multiple Data (SIMD) is an important way to go along with the trend. It is able to gather more and more computing ability by increasing the number of processor cores without the need of modifying the program. Meanwhile, in the field of scientific computing and engineering design, many computation intensive applications are facing the challenge of increasingly large amount of data. Data parallel computing will be an important way to further improve the performance of these applications. In this paper, we take the accurate collision detection in building information modeling as an example. We demonstrate a model for constructing a data parallel algorithm. According to the model, a complex object is decomposed into the sets of simple objects; collision detection among complex objects is converted into those among simple objects. The resulting algorithm is a typical SIMD algorithm, and its advantages in parallelism and scalability is unparalleled in respect to the traditional algorithms.

Keywords: data parallelism, collision detection, single instruction multiple data, building information modeling, continuous scalability

Procedia PDF Downloads 202
3896 A Practical Protection Method for Parallel Transmission-Lines Based on the Fault Travelling-Waves

Authors: Mohammad Reza Ebrahimi


In new restructured power systems, swift fault detection is very important. The parallel transmission-lines are vastly used in this kind of power systems because of high amount of energy transferring. In this paper, a method based on the comparison of two schemes, i.e., i) maximum magnitude of travelling-wave (TW) energy ii) the instants of maximum energy occurrence at the circuits of parallel transmission-line is proposed. Using the travelling-wave of fault in order to faulted line identification this method has noticeable operation time. Moreover, the algorithm can cover for identification of faults as external or internal faults. For an internal fault, the exact location of the fault can be estimated confidently. A lot of simulations have been done with PSCAD/EMTDC to verify the performance of the proposed algorithm.

Keywords: travelling-wave, maximum energy, parallel transmission-line, fault location

Procedia PDF Downloads 99
3895 A Fast Parallel and Distributed Type-2 Fuzzy Algorithm Based on Cooperative Mobile Agents Model for High Performance Image Processing

Authors: Fatéma Zahra Benchara, Mohamed Youssfi, Omar Bouattane, Hassan Ouajji, Mohamed Ouadi Bensalah


The aim of this paper is to present a distributed implementation of the Type-2 Fuzzy algorithm in a parallel and distributed computing environment based on mobile agents. The proposed algorithm is assigned to be implemented on a SPMD (Single Program Multiple Data) architecture which is based on cooperative mobile agents as AVPE (Agent Virtual Processing Element) model in order to improve the processing resources needed for performing the big data image segmentation. In this work we focused on the application of this algorithm in order to process the big data MRI (Magnetic Resonance Images) image of size (n x m). It is encapsulated on the Mobile agent team leader in order to be split into (m x n) pixels one per AVPE. Each AVPE perform and exchange the segmentation results and maintain asynchronous communication with their team leader until the convergence of this algorithm. Some interesting experimental results are obtained in terms of accuracy and efficiency analysis of the proposed implementation, thanks to the mobile agents several interesting skills introduced in this distributed computational model.

Keywords: distributed type-2 fuzzy algorithm, image processing, mobile agents, parallel and distributed computing

Procedia PDF Downloads 314
3894 Task Scheduling on Parallel System Using Genetic Algorithm

Authors: Jasbir Singh Gill, Baljit Singh


Scheduling and mapping the application task graph on multiprocessor parallel systems is considered as the most crucial and critical NP-complete problem. Many genetic algorithms have been proposed to solve such problems. In this paper, two genetic approach based algorithms have been designed and developed with or without task duplication. The proposed algorithms work on two fitness functions. The first fitness i.e. task fitness is used to minimize the total finish time of the schedule (schedule length) while the second fitness function i.e. process fitness is concerned with allocating the tasks to the available highly efficient processor from the list of available processors (load balance). Proposed genetic-based algorithms have been experimentally implemented and evaluated with other state-of-art popular and widely used algorithms.

Keywords: parallel computing, task scheduling, task duplication, genetic algorithm

Procedia PDF Downloads 206
3893 A Parallel Approach for 3D-Variational Data Assimilation on GPUs in Ocean Circulation Models

Authors: Rossella Arcucci, Luisa D'Amore, Simone Celestino, Giuseppe Scotti, Giuliano Laccetti


This work is the first dowel in a rather wide research activity in collaboration with Euro Mediterranean Center for Climate Changes, aimed at introducing scalable approaches in Ocean Circulation Models. We discuss designing and implementation of a parallel algorithm for solving the Variational Data Assimilation (DA) problem on Graphics Processing Units (GPUs). The algorithm is based on the fully scalable 3DVar DA model, previously proposed by the authors, which uses a Domain Decomposition approach (we refer to this model as the DD-DA model). We proceed with an incremental porting process consisting of 3 distinct stages: requirements and source code analysis, incremental development of CUDA kernels, testing and optimization. Experiments confirm the theoretic performance analysis based on the so-called scale up factor demonstrating that the DD-DA model can be suitably mapped on GPU architectures.

Keywords: data assimilation, GPU architectures, ocean models, parallel algorithm

Procedia PDF Downloads 313
3892 Improved Pattern Matching Applied to Surface Mounting Devices Components Localization on Automated Optical Inspection

Authors: Pedro M. A. Vitoriano, Tito. G. Amaral


Automated Optical Inspection (AOI) Systems are commonly used on Printed Circuit Boards (PCB) manufacturing. The use of this technology has been proven as highly efficient for process improvements and quality achievements. The correct extraction of the component for posterior analysis is a critical step of the AOI process. Nowadays, the Pattern Matching Algorithm is commonly used, although this algorithm requires extensive calculations and is time consuming. This paper will present an improved algorithm for the component localization process, with the capability of implementation in a parallel execution system.

Keywords: AOI, automated optical inspection, SMD, surface mounting devices, pattern matching, parallel execution

Procedia PDF Downloads 230
3891 Damage Strain Analysis of Parallel Fiber Eutectic

Authors: Jian Zheng, Xinhua Ni, Xiequan Liu


According to isotropy of parallel fiber eutectic, the no- damage strain field in parallel fiber eutectic is obtained from the flexibility tensor of parallel fiber eutectic. Considering the damage behavior of parallel fiber eutectic, damage variables are introduced to determine the strain field of parallel fiber eutectic. The damage strains in the matrix, interphase, and fiber of parallel fiber eutectic are quantitatively analyzed. Results show that damage strains are not only associated with the fiber volume fraction of parallel fiber eutectic, but also with the damage degree.

Keywords: damage strain, initial strain, fiber volume fraction, parallel fiber eutectic

Procedia PDF Downloads 322
3890 Trajectory Tracking of a Redundant Hybrid Manipulator Using a Switching Control Method

Authors: Atilla Bayram


This paper presents the trajectory tracking control of a spatial redundant hybrid manipulator. This manipulator consists of two parallel manipulators which are a variable geometry truss (VGT) module. In fact, each VGT module with 3-degress of freedom (DOF) is a planar parallel manipulator and their operational planes of these VGT modules are arranged to be orthogonal to each other. Also, the manipulator contains a twist motion part attached to the top of the second VGT module to supply the missing orientation of the endeffector. These three modules constitute totally 7-DOF hybrid (parallel-parallel) redundant spatial manipulator. The forward kinematics equations of this manipulator are obtained, then, according to these equations, the inverse kinematics is solved based on an optimization with the joint limit avoidance. The dynamic equations are formed by using virtual work method. In order to test the performance of the redundant manipulator and the controllers presented, two different desired trajectories are followed by using the computed force control method and a switching control method. The switching control method is combined with the computed force control method and genetic algorithm. In the switching control method, the genetic algorithm is only used for fine tuning in the compensation of the trajectory tracking errors.

Keywords: computed force method, genetic algorithm, hybrid manipulator, inverse kinematics of redundant manipulators, variable geometry truss

Procedia PDF Downloads 265
3889 Resistivity Tomography Optimization Based on Parallel Electrode Linear Back Projection Algorithm

Authors: Yiwei Huang, Chunyu Zhao, Jingjing Ding


Electrical Resistivity Tomography has been widely used in the medicine and the geology, such as the imaging of the lung impedance and the analysis of the soil impedance, etc. Linear Back Projection is the core algorithm of Electrical Resistivity Tomography, but the traditional Linear Back Projection can not make full use of the information of the electric field. In this paper, an imaging method of Parallel Electrode Linear Back Projection for Electrical Resistivity Tomography is proposed, which generates the electric field distribution that is not linearly related to the traditional Linear Back Projection, captures the new information and improves the imaging accuracy without increasing the number of electrodes by changing the connection mode of the electrodes. The simulation results show that the accuracy of the image obtained by the inverse operation obtained by the Parallel Electrode Linear Back Projection can be improved by about 20%.

Keywords: electrical resistivity tomography, finite element simulation, image optimization, parallel electrode linear back projection

Procedia PDF Downloads 57
3888 Parallel PRBS Generation and Parallel BER Tester for 8-Gbps On-chip Interconnection Testing

Authors: Zhao Bin, Yan Dan Lei


In this paper, a multi-pattern parallel PRBS generator and a dedicated parallel BER tester is proposed for the 8-Gbps On-chip interconnection testing. A unique full-parallel PRBS checker is also proposed. The proposed design, together with the custom-designed high-speed parallel-to-serial and the serial-to-parallel circuit, will be used to test different on-chip interconnection transceivers. The design is implemented in TSMC 28nm CMOS technology with working voltage at 1.0 V. The serial to parallel ratio is 8:1 so the parallel PRBS generation and BER Tester can be run at lower speed.

Keywords: PRBS, BER, high speed, generator

Procedia PDF Downloads 543
3887 Algorithm Optimization to Sort in Parallel by Decreasing the Number of the Processors in SIMD (Single Instruction Multiple Data) Systems

Authors: Ali Hosseini


Paralleling is a mechanism to decrease the time necessary to execute the programs. Sorting is one of the important operations to be used in different systems in a way that the proper function of many algorithms and operations depend on sorted data. CRCW_SORT algorithm executes ‘N’ elements sorting in O(1) time on SIMD (Single Instruction Multiple Data) computers with n^2/2-n/2 number of processors. In this article having presented a mechanism by dividing the input string by the hinge element into two less strings the number of the processors to be used in sorting ‘N’ elements in O(1) time has decreased to n^2/8-n/4 in the best state; by this mechanism the best state is when the hinge element is the middle one and the worst state is when it is minimum. The findings from assessing the proposed algorithm by other methods on data collection and number of the processors indicate that the proposed algorithm uses less processors to sort during execution than other methods.

Keywords: CRCW, SIMD (Single Instruction Multiple Data) computers, parallel computers, number of the processors

Procedia PDF Downloads 240
3886 Optimal Design of Redundant Hybrid Manipulator for Minimum Singularity

Authors: Arash Rahmani, Ahmad Ghanbari, Abbas Baghernezhad, Babak Safaei


In the design of parallel manipulators, usually mean value of a dexterity measure over the workspace volume is considered as the objective function to be used in optimization algorithms. The mentioned indexes in a hybrid parallel manipulator (HPM) are quite complicated to solve thanks to infinite solutions for every point within the workspace of the redundant manipulators. In this paper, spatial isotropic design axioms are extended as a well-known method for optimum design of manipulators. An upper limit for the isotropy measure of HPM is calculated and instead of computing and minimizing isotropy measure, minimizing the obtained limit is considered. To this end, two different objective functions are suggested which are obtained from objective functions of comprising modules. Finally, by using genetic algorithm (GA), the best geometric parameters for a specific hybrid parallel robot which is composed of two modified Gough-Stewart platforms (MGSP) are achieved.

Keywords: hybrid manipulator, spatial isotropy, genetic algorithm, optimum design

Procedia PDF Downloads 223
3885 Parallel Pipelined Conjugate Gradient Algorithm on Heterogeneous Platforms

Authors: Sergey Kopysov, Nikita Nedozhogin, Leonid Tonkov


The article presents a parallel iterative solver for large sparse linear systems which can be used on a heterogeneous platform. Traditionally, the problem of solving linear systems does not scale well on multi-CPU/multi-GPUs clusters. For example, most of the attempts to implement the classical conjugate gradient method were at best counted in the same amount of time as the problem was enlarged. The paper proposes the pipelined variant of the conjugate gradient method (PCG), a formulation that is potentially better suited for hybrid CPU/GPU computing since it requires only one synchronization point per one iteration instead of two for standard CG. The standard and pipelined CG methods need the vector entries generated by the current GPU and other GPUs for matrix-vector products. So the communication between GPUs becomes a major performance bottleneck on multi GPU cluster. The article presents an approach to minimize the communications between parallel parts of algorithms. Additionally, computation and communication can be overlapped to reduce the impact of data exchange. Using the pipelined version of the CG method with one synchronization point, the possibility of asynchronous calculations and communications, load balancing between the CPU and GPU for solving the large linear systems allows for scalability. The algorithm is implemented with the combined use of technologies: MPI, OpenMP, and CUDA. We show that almost optimum speed up on 8-CPU/2GPU may be reached (relatively to a one GPU execution). The parallelized solver achieves a speedup of up to 5.49 times on 16 NVIDIA Tesla GPUs, as compared to one GPU.

Keywords: conjugate gradient, GPU, parallel programming, pipelined algorithm

Procedia PDF Downloads 90
3884 GPU Accelerated Fractal Image Compression for Medical Imaging in Parallel Computing Platform

Authors: Md. Enamul Haque, Abdullah Al Kaisan, Mahmudur R. Saniat, Aminur Rahman


In this paper, we have implemented both sequential and parallel version of fractal image compression algorithms using CUDA (Compute Unified Device Architecture) programming model for parallelizing the program in Graphics Processing Unit for medical images, as they are highly similar within the image itself. There is several improvements in the implementation of the algorithm as well. Fractal image compression is based on the self similarity of an image, meaning an image having similarity in majority of the regions. We take this opportunity to implement the compression algorithm and monitor the effect of it using both parallel and sequential implementation. Fractal compression has the property of high compression rate and the dimensionless scheme. Compression scheme for fractal image is of two kinds, one is encoding and another is decoding. Encoding is very much computational expensive. On the other hand decoding is less computational. The application of fractal compression to medical images would allow obtaining much higher compression ratios. While the fractal magnification an inseparable feature of the fractal compression would be very useful in presenting the reconstructed image in a highly readable form. However, like all irreversible methods, the fractal compression is connected with the problem of information loss, which is especially troublesome in the medical imaging. A very time consuming encoding process, which can last even several hours, is another bothersome drawback of the fractal compression.

Keywords: accelerated GPU, CUDA, parallel computing, fractal image compression

Procedia PDF Downloads 243