Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 958

Search results for: Parallel computation.

958 A Parallel Implementation of k-Means in MATLAB

Authors: Dimitris Varsamis, Christos Talagkozis, Alkiviadis Tsimpiris, Paris Mastorocostas

Abstract:

The aim of this work is the parallel implementation of k-means in MATLAB, in order to reduce the execution time. Specifically, a new function in MATLAB for serial k-means algorithm is developed, which meets all the requirements for the conversion to a function in MATLAB with parallel computations. Additionally, two different variants for the definition of initial values are presented. In the sequel, the parallel approach is presented. Finally, the performance tests for the computation times respect to the numbers of features and classes are illustrated.

Keywords: K-means algorithm, clustering, parallel computations, MATLAB.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 829
957 Specialization-based parallel Processing without Memo-trees

Authors: Hidemi Ogasawara, Kiyoshi Akama, Hiroshi Mabuchi

Abstract:

The purpose of this paper is to propose a framework for constructing correct parallel processing programs based on Equivalent Transformation Framework (ETF). ETF regards computation as In the framework, a problem-s domain knowledge and a query are described in definite clauses, and computation is regarded as transformation of the definite clauses. Its meaning is defined by a model of the set of definite clauses, and the transformation rules generated must preserve meaning. We have proposed a parallel processing method based on “specialization", a part of operation in the transformations, which resembles substitution in logic programming. The method requires “Memo-tree", a history of specialization to maintain correctness. In this paper we proposes the new method for the specialization-base parallel processing without Memo-tree.

Keywords: Parallel processing, Program correctness, Equivalent transformation, Specializer generation rule

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1056
956 Solving Facility Location Problem on Cluster Computing

Authors: Ei Phyo Wai, Nay Min Tun

Abstract:

Computation of facility location problem for every location in the country is not easy simultaneously. Solving the problem is described by using cluster computing. A technique is to design parallel algorithm by using local search with single swap method in order to solve that problem on clusters. Parallel implementation is done by the use of portable parallel programming, Message Passing Interface (MPI), on Microsoft Windows Compute Cluster. In this paper, it presents the algorithm that used local search with single swap method and implementation of the system of a facility to be opened by using MPI on cluster. If large datasets are considered, the process of calculating a reasonable cost for a facility becomes time consuming. The result shows parallel computation of facility location problem on cluster speedups and scales well as problem size increases.

Keywords: cluster, cost, demand, facility location

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1217
955 Analyzing the Factors that Cause Parallel Performance Degradation in Parallel Graph-Based Computations Using Graph500

Authors: Mustafa Elfituri, Jonathan Cook

Abstract:

Recently, graph-based computations have become more important in large-scale scientific computing as they can provide a methodology to model many types of relations between independent objects. They are being actively used in fields as varied as biology, social networks, cybersecurity, and computer networks. At the same time, graph problems have some properties such as irregularity and poor locality that make their performance different than regular applications performance. Therefore, parallelizing graph algorithms is a hard and challenging task. Initial evidence is that standard computer architectures do not perform very well on graph algorithms. Little is known exactly what causes this. The Graph500 benchmark is a representative application for parallel graph-based computations, which have highly irregular data access and are driven more by traversing connected data than by computation. In this paper, we present results from analyzing the performance of various example implementations of Graph500, including a shared memory (OpenMP) version, a distributed (MPI) version, and a hybrid version. We measured and analyzed all the factors that affect its performance in order to identify possible changes that would improve its performance. Results are discussed in relation to what factors contribute to performance degradation.

Keywords: Graph computation, Graph500 benchmark, parallel architectures, parallel programming, workload characterization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 167
954 Parallel-Distributed Software Implementation of Buchberger Algorithm

Authors: Praloy Kumar Biswas, Prof. Dipanwita Roy Chowdhury

Abstract:

Grobner basis calculation forms a key part of computational commutative algebra and many other areas. One important ramification of the theory of Grobner basis provides a means to solve a system of non-linear equations. This is why it has become very important in the areas where the solution of non-linear equations is needed, for instance in algebraic cryptanalysis and coding theory. This paper explores on a parallel-distributed implementation for Grobner basis calculation over GF(2). For doing so Buchberger algorithm is used. OpenMP and MPI-C language constructs have been used to implement the scheme. Some relevant results have been furnished to compare the performances between the standalone and hybrid (parallel-distributed) implementation.

Keywords: Grobner basis, Buchberger Algorithm, Distributed- Parallel Computation, OpenMP, MPI.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1582
953 FPGA Based Parallel Architecture for the Computation of Third-Order Cross Moments

Authors: Syed Manzoor Qasim, Shuja Abbasi, Saleh Alshebeili, Bandar Almashary, Ateeq Ahmad Khan

Abstract:

Higher-order Statistics (HOS), also known as cumulants, cross moments and their frequency domain counterparts, known as poly spectra have emerged as a powerful signal processing tool for the synthesis and analysis of signals and systems. Algorithms used for the computation of cross moments are computationally intensive and require high computational speed for real-time applications. For efficiency and high speed, it is often advantageous to realize computation intensive algorithms in hardware. A promising solution that combines high flexibility together with the speed of a traditional hardware is Field Programmable Gate Array (FPGA). In this paper, we present FPGA-based parallel architecture for the computation of third-order cross moments. The proposed design is coded in Very High Speed Integrated Circuit (VHSIC) Hardware Description Language (VHDL) and functionally verified by implementing it on Xilinx Spartan-3 XC3S2000FG900-4 FPGA. Implementation results are presented and it shows that the proposed design can operate at a maximum frequency of 86.618 MHz.

Keywords: Cross moments, Cumulants, FPGA, Hardware Implementation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1486
952 Parallel Direct Integration Variable Step Block Method for Solving Large System of Higher Order Ordinary Differential Equations

Authors: Zanariah Abdul Majid, Mohamed Suleiman

Abstract:

The aim of this paper is to investigate the performance of the developed two point block method designed for two processors for solving directly non stiff large systems of higher order ordinary differential equations (ODEs). The method calculates the numerical solution at two points simultaneously and produces two new equally spaced solution values within a block and it is possible to assign the computational tasks at each time step to a single processor. The algorithm of the method was developed in C language and the parallel computation was done on a parallel shared memory environment. Numerical results are given to compare the efficiency of the developed method to the sequential timing. For large problems, the parallel implementation produced 1.95 speed-up and 98% efficiency for the two processors.

Keywords: Numerical methods, parallel method, block method, higher order ODEs.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1184
951 Performance Comparison of Parallel Sorting Algorithms on the Cluster of Workstations

Authors: Lai Lai Win Kyi, Nay Min Tun

Abstract:

Sorting appears the most attention among all computational tasks over the past years because sorted data is at the heart of many computations. Sorting is of additional importance to parallel computing because of its close relation to the task of routing data among processes, which is an essential part of many parallel algorithms. Many parallel sorting algorithms have been investigated for a variety of parallel computer architectures. In this paper, three parallel sorting algorithms have been implemented and compared in terms of their overall execution time. The algorithms implemented are the odd-even transposition sort, parallel merge sort and parallel rank sort. Cluster of Workstations or Windows Compute Cluster has been used to compare the algorithms implemented. The C# programming language is used to develop the sorting algorithms. The MPI (Message Passing Interface) library has been selected to establish the communication and synchronization between processors. The time complexity for each parallel sorting algorithm will also be mentioned and analyzed.

Keywords: Cluster of Workstations, Parallel sorting algorithms, performance analysis, parallel computing and MPI.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1210
950 A Practical Distributed String Matching Algorithm Architecture and Implementation

Authors: Bi Kun, Gu Nai-jie, Tu Kun, Liu Xiao-hu, Liu Gang

Abstract:

Traditional parallel single string matching algorithms are always based on PRAM computation model. Those algorithms concentrate on the cost optimal design and the theoretical speed. Based on the distributed string matching algorithm proposed by CHEN, a practical distributed string matching algorithm architecture is proposed in this paper. And also an improved single string matching algorithm based on a variant Boyer-Moore algorithm is presented. We implement our algorithm on the above architecture and the experiments prove that it is really practical and efficient on distributed memory machine. Its computation complexity is O(n/p + m), where n is the length of the text, and m is the length of the pattern, and p is the number of the processors.

Keywords: Boyer-Moore algorithm, distributed algorithm, parallel string matching, string matching.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1862
949 Concurrent Approach to Data Parallel Model using Java

Authors: Bala Dhandayuthapani Veerasamy

Abstract:

Parallel programming models exist as an abstraction of hardware and memory architectures. There are several parallel programming models in commonly use; they are shared memory model, thread model, message passing model, data parallel model, hybrid model, Flynn-s models, embarrassingly parallel computations model, pipelined computations model. These models are not specific to a particular type of machine or memory architecture. This paper expresses the model program for concurrent approach to data parallel model through java programming.

Keywords: Concurrent, Data Parallel, JDK, Parallel, Thread

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1645
948 Damage Strain Analysis of Parallel Fiber Eutectic

Authors: Jian Zheng, Xinhua Ni, Xiequan Liu

Abstract:

According to isotropy of parallel fiber eutectic, the no- damage strain field in parallel fiber eutectic is obtained from the flexibility tensor of parallel fiber eutectic. Considering the damage behavior of parallel fiber eutectic, damage variables are introduced to determine the strain field of parallel fiber eutectic. The damage strains in the matrix, interphase, and fiber of parallel fiber eutectic are quantitatively analyzed. Results show that damage strains are not only associated with the fiber volume fraction of parallel fiber eutectic, but also with the damage degree.

Keywords: Parallel fiber eutectic, no-damage strain, damage strain, fiber volume fraction, damage degree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 608
947 Parallelization of Ensemble Kalman Filter (EnKF) for Oil Reservoirs with Time-lapse Seismic Data

Authors: Md Khairullah, Hai-Xiang Lin, Remus G. Hanea, Arnold W. Heemink

Abstract:

In this paper we describe the design and implementation of a parallel algorithm for data assimilation with ensemble Kalman filter (EnKF) for oil reservoir history matching problem. The use of large number of observations from time-lapse seismic leads to a large turnaround time for the analysis step, in addition to the time consuming simulations of the realizations. For efficient parallelization it is important to consider parallel computation at the analysis step. Our experiments show that parallelization of the analysis step in addition to the forecast step has good scalability, exploiting the same set of resources with some additional efforts.

Keywords: EnKF, Data assimilation, Parallel computing, Parallel efficiency.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2006
946 Implementation of Parallel Interface for Microprocessor Trainer

Authors: Moe Moe Htun, Khin Htar Nwe

Abstract:

In this paper, parallel interface for microprocessor trainer was implemented. A programmable parallel–port device such as the IC 8255A is initialized for simple input or output and for handshake input or output by choosing kinds of modes. The hardware connections and the programs can be used to interface microprocessor trainer and a personal computer by using IC 8255A. The assembly programs edited on PC-s editor can be downloaded to the trainer.

Keywords: Parallel I/O ports, parallel interface, trainer, two 8255 ICs.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2950
945 Some Characteristics of Systolic Arrays

Authors: Halil Snopce, Ilir Spahiu

Abstract:

In this paper is investigated a possible optimization of some linear algebra problems which can be solved by parallel processing using the special arrays called systolic arrays. In this paper are used some special types of transformations for the designing of these arrays. We show the characteristics of these arrays. The main focus is on discussing the advantages of these arrays in parallel computation of matrix product, with special approach to the designing of systolic array for matrix multiplication. Multiplication of large matrices requires a lot of computational time and its complexity is O(n3 ). There are developed many algorithms (both sequential and parallel) with the purpose of minimizing the time of calculations. Systolic arrays are good suited for this purpose. In this paper we show that using an appropriate transformation implicates in finding more optimal arrays for doing the calculations of this type.

Keywords: Data dependences, matrix multiplication, systolicarray, transformation matrix.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1334
944 Parallel and Distributed Mining of Association Rule on Knowledge Grid

Authors: U. Sakthi, R. Hemalatha, R. S. Bhuvaneswaran

Abstract:

In Virtual organization, Knowledge Discovery (KD) service contains distributed data resources and computing grid nodes. Computational grid is integrated with data grid to form Knowledge Grid, which implements Apriori algorithm for mining association rule on grid network. This paper describes development of parallel and distributed version of Apriori algorithm on Globus Toolkit using Message Passing Interface extended with Grid Services (MPICHG2). The creation of Knowledge Grid on top of data and computational grid is to support decision making in real time applications. In this paper, the case study describes design and implementation of local and global mining of frequent item sets. The experiments were conducted on different configurations of grid network and computation time was recorded for each operation. We analyzed our result with various grid configurations and it shows speedup of computation time is almost superlinear.

Keywords: Association rule, Grid computing, Knowledge grid, Mobility prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1946
943 Performance Analysis of Parallel Client-Server Model Versus Parallel Mobile Agent Model

Authors: K. B. Manwade, G. A. Patil

Abstract:

Mobile agent has motivated the creation of a new methodology for parallel computing. We introduce a methodology for the creation of parallel applications on the network. The proposed Mobile-Agent parallel processing framework uses multiple Javamobile Agents. Each mobile agent can travel to the specified machine in the network to perform its tasks. We also introduce the concept of master agent, which is Java object capable of implementing a particular task of the target application. Master agent is dynamically assigns the task to mobile agents. We have developed and tested a prototype application: Mobile Agent Based Parallel Computing. Boosted by the inherited benefits of using Java and Mobile Agents, our proposed methodology breaks the barriers between the environments, and could potentially exploit in a parallel manner all the available computational resources on the network. This paper elaborates performance issues of a mobile agent for parallel computing.

Keywords: Parallel Computing, Mobile Agent.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1367
942 A Technique for Reachability Graph Generation for the Petri Net Models of Parallel Processes

Authors: Farooq Ahmad, Hejiao Huang, Xiaolong Wang

Abstract:

Reachability graph (RG) generation suffers from the problem of exponential space and time complexity. To alleviate the more critical problem of time complexity, this paper presents the new approach for RG generation for the Petri net (PN) models of parallel processes. Independent RGs for each parallel process in the PN structure are generated in parallel and cross-product of these RGs turns into the exhaustive state space from which the RG of given parallel system is determined. The complexity analysis of the presented algorithm illuminates significant decrease in the time complexity cost of RG generation. The proposed technique is applicable to parallel programs having multiple threads with the synchronization problem.

Keywords: Parallel processes, Petri net, reachability graph, time complexity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1760
941 Harmonic Reduction In Three-Phase Parallel Connected Inverter

Authors: M.A.A. Younis, N. A. Rahim, S. Mekhilef

Abstract:

This paper presents the design and analysis of a parallel connected inverter configuration of. The configuration consists of parallel connected three-phase dc/ac inverter. Series resistors added to the inverter output to maintain same current in each inverter of the two parallel inverters, and to reduce the circulating current in the parallel inverters to the minimum. High frequency third harmonic injection PWM (THIPWM) employed to reduce the total harmonic distortion and to make maximum use of the voltage source. DSP was used to generate the THIPWM and the control algorithm for the converter. Selected experimental results have been shown to validate the proposed system.

Keywords: Three-phase inverter, Third harmonic injection PWM, inverters parallel connection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3570
940 Network Based High Performance Computing

Authors: Karanjeet Singh Kahlon, Gurvinder Singh, Arjan Singh

Abstract:

In the past few years there is a change in the view of high performance applications and parallel computing. Initially such applications were targeted towards dedicated parallel machines. Recently trend is changing towards building meta-applications composed of several modules that exploit heterogeneous platforms and employ hybrid forms of parallelism. The aim of this paper is to propose a model of virtual parallel computing. Virtual parallel computing system provides a flexible object oriented software framework that makes it easy for programmers to write various parallel applications.

Keywords: Applet, Efficiency, Java, LAN

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1213
939 Parallel Computation in Hypersonic Aerodynamic Heating Problem

Authors: Ding Guo-hao, Li Hua, Wang Wen-long

Abstract:

A parallel computational fluid dynamics code has been developed for the study of aerodynamic heating problem in hypersonic flows. The code employs the 3D Navier-Stokes equations as the basic governing equations to simulate the laminar hypersonic flow. The cell centered finite volume method based on structured grid is applied for spatial discretization. The AUSMPW+ scheme is used for the inviscid fluxes, and the MUSCL approach is used for higher order spatial accuracy. The implicit LU-SGS scheme is applied for time integration to accelerate the convergence of computations in steady flows. A parallel programming method based on MPI is employed to shorten the computing time. The validity of the code is demonstrated by comparing the numerical calculation result with the experimental data of a hypersonic flow field around a blunt body.

Keywords: Aerodynamic Heating, AUSMPW+, MPI, ParallelComputation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1719
938 Parallel Discrete Fourier Transform for Fast FIR Filtering Based on Overlapped-save Block Structure

Authors: Ying-Wen Bai, Ju-Maw Chen

Abstract:

To successfully provide a fast FIR filter with FTT algorithms, overlapped-save algorithms can be used to lower the computational complexity and achieve the desired real-time processing. As the length of the input block increases in order to improve the efficiency, a larger volume of zero padding will greatly increase the computation length of the FFT. In this paper, we use the overlapped block digital filtering to construct a parallel structure. As long as the down-sampling (or up-sampling) factor is an exact multiple lengths of the impulse response of a FIR filter, we can process the input block by using a parallel structure and thus achieve a low-complex fast FIR filter with overlapped-save algorithms. With a long filter length, the performance and the throughput of the digital filtering system will also be greatly enhanced.

Keywords: FIR Filter, Overlapped-save Algorithm, ParallelStructure

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1474
937 Some Results on Parallel Alternating Methods

Authors: Guangbin Wang, Fuping Tan

Abstract:

In this paper, we investigate two parallel alternating methods for solving the system of linear equations Ax = b and give convergence theorems for the parallel alternating methods when the coefficient matrix is a nonsingular H-matrix. Furthermore, we give one example to show our results.

Keywords: Nonsingular H-matrix, parallel alternating method, convergence.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 934
936 Motion Estimator Architecture with Optimized Number of Processing Elements for High Efficiency Video Coding

Authors: Seongsoo Lee

Abstract:

Motion estimation occupies the heaviest computation in HEVC (high efficiency video coding). Many fast algorithms such as TZS (test zone search) have been proposed to reduce the computation. Still the huge computation of the motion estimation is a critical issue in the implementation of HEVC video codec. In this paper, motion estimator architecture with optimized number of PEs (processing element) is presented by exploiting early termination. It also reduces hardware size by exploiting parallel processing. The presented motion estimator architecture has 8 PEs, and it can efficiently perform TZS with very high utilization of PEs.

Keywords: Motion estimation, test zone search, high efficiency video coding, processing element, optimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 680
935 Parallel Image Compression and Analysis with Wavelets

Authors: M. Kutila, J. Viitanen

Abstract:

This paper presents image compression with wavelet based method. The wavelet transformation divides image to low- and high pass filtered parts. The traditional JPEG compression technique requires lower computation power with feasible losses, when only compression is needed. However, there is obvious need for wavelet based methods in certain circumstances. The methods are intended to the applications in which the image analyzing is done parallel with compression. Furthermore, high frequency bands can be used to detect changes or edges. Wavelets enable hierarchical analysis for low pass filtered sub-images. The first analysis can be done for a small image, and only if any interesting is found, the whole image is processed or reconstructed.

Keywords: image compression, jpeg, wavelet, vlc

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1542
934 Parallel Branch and Bound Model Using Logarithmic Sampling (PBLS) for Symmetric Traveling Salesman Problem

Authors: Sheikh Muhammad Azam, Masood-ur-Rehman, Adnan Khalid Bhatti, Nadeem Daudpota

Abstract:

Very Large and/or computationally complex optimization problems sometimes require parallel or highperformance computing for achieving a reasonable time for computation. One of the most popular and most complicate problems of this family is “Traveling Salesman Problem". In this paper we have introduced a Branch & Bound based algorithm for the solution of such complicated problems. The main focus of the algorithm is to solve the “symmetric traveling salesman problem". We reviewed some of already available algorithms and felt that there is need of new algorithm which should give optimal solution or near to the optimal solution. On the basis of the use of logarithmic sampling, it was found that the proposed algorithm produced a relatively optimal solution for the problem and results excellent performance as compared with the traditional algorithms of this series.

Keywords: Parallel execution, symmetric traveling salesman problem, branch and bound algorithm, logarithmic sampling.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2123
933 A Parallel Quadtree Approach for Image Compression using Wavelets

Authors: Hamed Vahdat Nejad, Hossein Deldari

Abstract:

Wavelet transforms are multiresolution decompositions that can be used to analyze signals and images. Image compression is one of major applications of wavelet transforms in image processing. It is considered as one of the most powerful methods that provides a high compression ratio. However, its implementation is very time-consuming. At the other hand, parallel computing technologies are an efficient method for image compression using wavelets. In this paper, we propose a parallel wavelet compression algorithm based on quadtrees. We implement the algorithm using MatlabMPI (a parallel, message passing version of Matlab), and compute its isoefficiency function, and show that it is scalable. Our experimental results confirm the efficiency of the algorithm also.

Keywords: Image compression, MPI, Parallel computing, Wavelets.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1811
932 A Consideration of the Achievement of Productive Level Parallel Programming Skills

Authors: Tadayoshi Horita, Masakazu Akiba, Mina Terauchi, Tsuneo Kanno

Abstract:

This paper gives a consideration of the achievement of productive level parallel programming skills, based on the data of the graduation studies in the Polytechnic University of Japan. The data show that most students can achieve only parallel programming skills during the graduation study (about 600 to 700 hours), if the programming environment is limited to GPGPUs. However, the data also show that it is a very high level task that a student achieves productive level parallel programming skills during only the graduation study. In addition, it shows that the parallel programming environments for GPGPU, such as CUDA and OpenCL, may be more suitable for parallel computing education than other environments such as MPI on a cluster system and Cell.B.E. These results must be useful for the areas of not only software developments, but also hardware product developments using computer technologies.

Keywords: Parallel computing, programming education, GPU, GPGPU, CUDA, OpenCL, MPI, Cell.B.E.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1479
931 A Fully Parallel Reverse Converter

Authors: Mehdi Hosseinzadeh, Amir Sabbagh Molahosseini, Keivan Navi

Abstract:

The residue number system (RNS) is popular in high performance computation applications because of its carry-free nature. The challenges of RNS systems design lie in the moduli set selection and in the reverse conversion from residue representation to weighted representation. In this paper, we proposed a fully parallel reverse conversion algorithm for the moduli set {rn - 2, rn - 1, rn}, based on simple mathematical relationships. Also an efficient hardware realization of this algorithm is presented. Our proposed converter is very faster and results to hardware savings, compared to the other reverse converters.

Keywords: Reverse converter, residue to weighted converter, residue number system, multiple-valued logic, computer arithmetic.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1348
930 Development of Heterogeneous Parallel Genetic Simulated Annealing Using Multi-Niche Crowding

Authors: Z. G. Wang, M. Rahman, Y. S. Wong, K. S. Neo

Abstract:

In this paper, a new hybrid of genetic algorithm (GA) and simulated annealing (SA), referred to as GSA, is presented. In this algorithm, SA is incorporated into GA to escape from local optima. The concept of hierarchical parallel GA is employed to parallelize GSA for the optimization of multimodal functions. In addition, multi-niche crowding is used to maintain the diversity in the population of the parallel GSA (PGSA). The performance of the proposed algorithms is evaluated against a standard set of multimodal benchmark functions. The multi-niche crowding PGSA and normal PGSA show some remarkable improvement in comparison with the conventional parallel genetic algorithm and the breeder genetic algorithm (BGA).

Keywords: Crowding, genetic algorithm, parallel geneticalgorithm, simulated annealing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1357
929 Simulation of Dynamic Behavior of Seismic Isolators Using a Parallel Elasto-Plastic Model

Authors: Nicolò Vaiana, Giorgio Serino

Abstract:

In this paper, a one-dimensional (1d) Parallel Elasto- Plastic Model (PEPM), able to simulate the uniaxial dynamic behavior of seismic isolators having a continuously decreasing tangent stiffness with increasing displacement, is presented. The parallel modeling concept is applied to discretize the continuously decreasing tangent stiffness function, thus allowing to simulate the dynamic behavior of seismic isolation bearings by putting linear elastic and nonlinear elastic-perfectly plastic elements in parallel. The mathematical model has been validated by comparing the experimental force-displacement hysteresis loops, obtained testing a helical wire rope isolator and a recycled rubber-fiber reinforced bearing, with those predicted numerically. Good agreement between the simulated and experimental results shows that the proposed model can be an effective numerical tool to predict the forcedisplacement relationship of seismic isolators within relatively large displacements. Compared to the widely used Bouc-Wen model, the proposed one allows to avoid the numerical solution of a first order ordinary nonlinear differential equation for each time step of a nonlinear time history analysis, thus reducing the computation effort, and requires the evaluation of only three model parameters from experimental tests, namely the initial tangent stiffness, the asymptotic tangent stiffness, and a parameter defining the transition from the initial to the asymptotic tangent stiffness.

Keywords: Base isolation, earthquake engineering, parallel elasto-plastic model, seismic isolators, softening hysteresis loops.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 790