Search results for: parallel processors
536 Novel Ridge Orientation Based Approach for Fingerprint Identification Using Co-Occurrence Matrix
Authors: Mehran Yazdi, Zahra Adelpour, Batoul Bahraini, Yasaman Keshtkar Jahromi
Abstract:
In this paper we use the property of co-occurrence matrix in finding parallel lines in binary pictures for fingerprint identification. In our proposed algorithm, we reduce the noise by filtering the fingerprint images and then transfer the fingerprint images to binary images using a proper threshold. Next, we divide the binary images into some regions having parallel lines in the same direction. The lines in each region have a specific angle that can be used for comparison. This method is simple, performs the comparison step quickly and has a good resistance in the presence of the noise.Keywords: Parallel lines detection, co-occurrence matrix, fingerprint identification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1357535 3D Network-on-Chip with on-Chip DRAM: An Empirical Analysis for Future Chip Multiprocessor
Authors: Thomas Canhao Xu, Bo Yang, Alexander Wei Yin, Pasi Liljeberg, Hannu Tenhunen
Abstract:
With the increasing number of on-chip components and the critical requirement for processing power, Chip Multiprocessor (CMP) has gained wide acceptance in both academia and industry during the last decade. However, the conventional bus-based onchip communication schemes suffer from very high communication delay and low scalability in large scale systems. Network-on-Chip (NoC) has been proposed to solve the bottleneck of parallel onchip communications by applying different network topologies which separate the communication phase from the computation phase. Observing that the memory bandwidth of the communication between on-chip components and off-chip memory has become a critical problem even in NoC based systems, in this paper, we propose a novel 3D NoC with on-chip Dynamic Random Access Memory (DRAM) in which different layers are dedicated to different functionalities such as processors, cache or memory. Results show that, by using our proposed architecture, average link utilization has reduced by 10.25% for SPLASH-2 workloads. Our proposed design costs 1.12% less execution cycles than the traditional design on average.
Keywords: 3D integration, network-on-chip, memory-on-chip, DRAM, chip multiprocessor.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2447534 Analyzing the Factors that Cause Parallel Performance Degradation in Parallel Graph-Based Computations Using Graph500
Authors: Mustafa Elfituri, Jonathan Cook
Abstract:
Recently, graph-based computations have become more important in large-scale scientific computing as they can provide a methodology to model many types of relations between independent objects. They are being actively used in fields as varied as biology, social networks, cybersecurity, and computer networks. At the same time, graph problems have some properties such as irregularity and poor locality that make their performance different than regular applications performance. Therefore, parallelizing graph algorithms is a hard and challenging task. Initial evidence is that standard computer architectures do not perform very well on graph algorithms. Little is known exactly what causes this. The Graph500 benchmark is a representative application for parallel graph-based computations, which have highly irregular data access and are driven more by traversing connected data than by computation. In this paper, we present results from analyzing the performance of various example implementations of Graph500, including a shared memory (OpenMP) version, a distributed (MPI) version, and a hybrid version. We measured and analyzed all the factors that affect its performance in order to identify possible changes that would improve its performance. Results are discussed in relation to what factors contribute to performance degradation.
Keywords: Graph computation, Graph500 benchmark, parallel architectures, parallel programming, workload characterization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 548533 Isotropic Stress Distribution in Cu/(001) Fe Two Sheets
Authors: A. Derardja, L. Baroura, M. Brioua
Abstract:
The nanotechnology based on epitaxial systems includes single or arranged misfit dislocations. In general, whatever is the type of dislocation or the geometry of the array formed by the dislocations; it is important for experimental studies to know exactly the stress distribution for which there is no analytical expression [1, 2]. This work, using a numerical analysis, deals with relaxation of epitaxial layers having at their interface a periodic network of edge misfit dislocations. The stress distribution is estimated by using isotropic elasticity. The results show that the thickness of the two sheets is a crucial parameter in the stress distributions and then in the profile of the two sheets. A comparative study between the case of single dislocation and the case of parallel network shows that the layers relaxed better when the interface is covered by a parallel arrangement of misfit. Consequently, a single dislocation at the interface produces an important stress field which can be reduced by inserting a parallel network of dislocations with suitable periodicity.Keywords: Parallel array of misfit, interface, isotropic elasticity, single crystalline substrates, coherent interface
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1571532 On Fault Diagnosis of Asynchronous Sequential Machines with Parallel Composition
Authors: Jung-Min Yang
Abstract:
Fault diagnosis of composite asynchronous sequential machines with parallel composition is addressed in this paper. An adversarial input can infiltrate one of two submachines comprising the composite asynchronous machine, causing an unauthorized state transition. The objective is to characterize the condition under which the controller can diagnose any fault occurrence. Two control configurations, state feedback and output feedback, are considered in this paper. In the case of output feedback, the exact estimation of the state is impossible since the current state is inaccessible and the output feedback is given as the form of burst. A simple example is provided to demonstrate the proposed methodology.Keywords: Asynchronous sequential machines, parallel composition, fault diagnosis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 970531 Conditions for Fault Recovery of Interconnected Asynchronous Sequential Machines with State Feedback
Authors: Jung–Min Yang
Abstract:
In this paper, fault recovery for parallel interconnected asynchronous sequential machines is studied. An adversarial input can infiltrate into one of two submachines comprising parallel composition of the considered asynchronous sequential machine, causing an unauthorized state transition. The control objective is to elucidate the condition for the existence of a corrective controller that makes the closed-loop system immune against any occurrence of adversarial inputs. In particular, an efficient existence condition is presented that does not need the complete modeling of the interconnected asynchronous sequential machine.Keywords: Asynchronous sequential machines, parallel composition, corrective control, fault tolerance.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 839530 Hybrid Prefix Adder Architecture for Minimizing the Power Delay Product
Authors: P.Ramanathan, P.T.Vanathi
Abstract:
Parallel Prefix addition is a technique for improving the speed of binary addition. Due to continuing integrating intensity and the growing needs of portable devices, low-power and highperformance designs are of prime importance. The classical parallel prefix adder structures presented in the literature over the years optimize for logic depth, area, fan-out and interconnect count of logic circuits. In this paper, a new architecture for performing 8-bit, 16-bit and 32-bit Parallel Prefix addition is proposed. The proposed prefix adder structures is compared with several classical adders of same bit width in terms of power, delay and number of computational nodes. The results reveal that the proposed structures have the least power delay product when compared with its peer existing Prefix adder structures. Tanner EDA tool was used for simulating the adder designs in the TSMC 180 nm and TSMC 130 nm technologies.Keywords: Parallel Prefix Adder (PPA), Dot operator, Semi-Dotoperator, Complementary Metal Oxide Semiconductor (CMOS), Odd-dot operator, Even-dot operator, Odd-semi-dot operator andEven-semi-dot operator.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1726529 Some Computational Results on MPI Parallel Implementation of Dense Simplex Method
Authors: El-Said Badr, Mahmoud Moussa, Konstantinos Paparrizos, Nikolaos Samaras, Angelo Sifaleras
Abstract:
There are two major variants of the Simplex Algorithm: the revised method and the standard, or tableau method. Today, all serious implementations are based on the revised method because it is more efficient for sparse linear programming problems. Moreover, there are a number of applications that lead to dense linear problems so our aim in this paper is to present some computational results on parallel implementation of dense Simplex Method. Our implementation is implemented on a SMP cluster using C programming language and the Message Passing Interface MPI. Preliminary computational results on randomly generated dense linear programs support our results.Keywords: Linear Programming, MPI, Parallel Implementation, Simplex Algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2049528 Using the PGAS Programming Paradigm for Biological Sequence Alignment on a Chip Multi-Threading Architecture
Authors: M. Bakhouya, S. A. Bahra, T. El-Ghazawi
Abstract:
The Partitioned Global Address Space (PGAS) programming paradigm offers ease-of-use in expressing parallelism through a global shared address space while emphasizing performance by providing locality awareness through the partitioning of this address space. Therefore, the interest in PGAS programming languages is growing and many new languages have emerged and are becoming ubiquitously available on nearly all modern parallel architectures. Recently, new parallel machines with multiple cores are designed for targeting high performance applications. Most of the efforts have gone into benchmarking but there are a few examples of real high performance applications running on multicore machines. In this paper, we present and evaluate a parallelization technique for implementing a local DNA sequence alignment algorithm using a PGAS based language, UPC (Unified Parallel C) on a chip multithreading architecture, the UltraSPARC T1.Keywords: Partitioned Global Address Space, Unified Parallel C, Multicore machines, Multi-threading Architecture, Sequence alignment.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1390527 Fault Diagnosis of Nonlinear Systems Using Dynamic Neural Networks
Authors: E. Sobhani-Tehrani, K. Khorasani, N. Meskin
Abstract:
This paper presents a novel integrated hybrid approach for fault diagnosis (FD) of nonlinear systems. Unlike most FD techniques, the proposed solution simultaneously accomplishes fault detection, isolation, and identification (FDII) within a unified diagnostic module. At the core of this solution is a bank of adaptive neural parameter estimators (NPE) associated with a set of singleparameter fault models. The NPEs continuously estimate unknown fault parameters (FP) that are indicators of faults in the system. Two NPE structures including series-parallel and parallel are developed with their exclusive set of desirable attributes. The parallel scheme is extremely robust to measurement noise and possesses a simpler, yet more solid, fault isolation logic. On the contrary, the series-parallel scheme displays short FD delays and is robust to closed-loop system transients due to changes in control commands. Finally, a fault tolerant observer (FTO) is designed to extend the capability of the NPEs to systems with partial-state measurement.
Keywords: Hybrid fault diagnosis, Dynamic neural networks, Nonlinear systems.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2221526 Parallel Particle Swarm Optimization Optimized LDI Controller with Lyapunov Stability Criterion for Nonlinear Structural Systems
Authors: P.-W. Tsai, W.-L. Hong, C.-W. Chen, C.-Y. Chen
Abstract:
In this paper, we present a neural-network (NN) based approach to represent a nonlinear Tagagi-Sugeno (T-S) system. A linear differential inclusion (LDI) state-space representation is utilized to deal with the NN models. Taking advantage of the LDI representation, the stability conditions and controller design are derived for a class of nonlinear structural systems. Moreover, the concept of utilizing the Parallel Particle Swarm Optimization (PPSO) algorithm to solve the common P matrix under the stability criteria is given in this paper.
Keywords: Lyapunov Stability, Parallel Particle Swarm Optimization, Linear Differential Inclusion, Artificial Intelligence.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1865525 Simulation Modeling of Manufacturing Systems for the Serial Route and the Parallel One
Authors: Tadeusz Witkowski, Paweł Antczak, Arkadiusz Antczak
Abstract:
In the paper we discuss the influence of the route flexibility degree, the open rate of operations and the production type coefficient on makespan. The flexible job-open shop scheduling problem FJOSP (an extension of the classical job shop scheduling) is analyzed. For the analysis of the production process we used a hybrid heuristic of the GRASP (greedy randomized adaptive search procedure) with simulated annealing algorithm. Experiments with different levels of factors have been considered and compared. The GRASP+SA algorithm has been tested and illustrated with results for the serial route and the parallel one.Keywords: Makespan, open shop, route flexibility, serial and parallel route, simulation modeling, type of production.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1814524 Performance Evaluation of Parallel Surface Modeling and Generation on Actual and Virtual Multicore Systems
Authors: Nyeng P. Gyang
Abstract:
Even though past, current and future trends suggest that multicore and cloud computing systems are increasingly prevalent/ubiquitous, this class of parallel systems is nonetheless underutilized, in general, and barely used for research on employing parallel Delaunay triangulation for parallel surface modeling and generation, in particular. The performances, of actual/physical and virtual/cloud multicore systems/machines, at executing various algorithms, which implement various parallelization strategies of the incremental insertion technique of the Delaunay triangulation algorithm, were evaluated. T-tests were run on the data collected, in order to determine whether various performance metrics differences (including execution time, speedup and efficiency) were statistically significant. Results show that the actual machine is approximately twice faster than the virtual machine at executing the same programs for the various parallelization strategies. Results, which furnish the scalability behaviors of the various parallelization strategies, also show that some of the differences between the performances of these systems, during different runs of the algorithms on the systems, were statistically significant. A few pseudo superlinear speedup results, which were computed from the raw data collected, are not true superlinear speedup values. These pseudo superlinear speedup values, which arise as a result of one way of computing speedups, disappear and give way to asymmetric speedups, which are the accurate kind of speedups that occur in the experiments performed.Keywords: Cloud computing systems, multicore systems, parallel delaunay triangulation, parallel surface modeling and generation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 879523 Series-Parallel Systems Reliability Optimization Using Genetic Algorithm and Statistical Analysis
Authors: Essa Abrahim Abdulgader Saleem, Thien-My Dao
Abstract:
The main objective of this paper is to optimize series-parallel system reliability using Genetic Algorithm (GA) and statistical analysis; considering system reliability constraints which involve the redundant numbers of selected components, total cost, and total weight. To perform this work, firstly the mathematical model which maximizes system reliability subject to maximum system cost and maximum system weight constraints is presented; secondly, a statistical analysis is used to optimize GA parameters, and thirdly GA is used to optimize series-parallel systems reliability. The objective is to determine the strategy choosing the redundancy level for each subsystem to maximize the overall system reliability subject to total cost and total weight constraints. Finally, the series-parallel system case study reliability optimization results are showed, and comparisons with the other previous results are presented to demonstrate the performance of our GA.
Keywords: Genetic algorithm, optimization, reliability, statistical analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1156522 Performance Improvement of Moving Object Recognition and Tracking Algorithm using Parallel Processing of SURF and Optical Flow
Authors: Jungho Choi, Youngwan Cho
Abstract:
The paper proposes a way of parallel processing of SURF and Optical Flow for moving object recognition and tracking. The object recognition and tracking is one of the most important task in computer vision, however disadvantage are many operations cause processing speed slower so that it can-t do real-time object recognition and tracking. The proposed method uses a typical way of feature extraction SURF and moving object Optical Flow for reduce disadvantage and real-time moving object recognition and tracking, and parallel processing techniques for speed improvement. First analyse that an image from DB and acquired through the camera using SURF for compared to the same object recognition then set ROI (Region of Interest) for tracking movement of feature points using Optical Flow. Secondly, using Multi-Thread is for improved processing speed and recognition by parallel processing. Finally, performance is evaluated and verified efficiency of algorithm throughout the experiment.Keywords: moving object recognition, moving object tracking, SURF, Optical Flow, Multi-Thread.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2644521 Modified Techniques for Distribution System Reliability Improvement by Parallel Operation of Transformers
Authors: Ohn Zin Lin, Okka, Cho Cho Myint
Abstract:
It is important to consider the effects of transformers on distribution system because they have the highest impact on system reliability. It is generally said that parallel operation of transformers (POT) can improve the system reliability. However, the estimation approach can be also considered for accuracy. In this paper, we propose a three-state components model and equations to determine the reliability improvement by POT, and cooperation of POT and distributed generation (DG). Based on the proposed model and techniques, the effect of POT is analyzed in four different tests with the consideration of conventional distribution system, distribution automation system (DAS) and DG. According to the results, the reliability is greatly improved by cooperation of POT, DAS and DG. The proposed model and methods are applicable to not only developing countries which have conventional distribution system but also developed countries in which DAS has already installed.
Keywords: Distribution system, reliability, dispersed generator, energy not supply, transformer parallel operation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 702520 Edit Distance Algorithm to Increase Storage Efficiency of Javanese Corpora
Authors: Aji P. Wibawa, Andrew Nafalski, Neil Murray, Wayan F. Mahmudy
Abstract:
Since the one-to-one word translator does not have the facility to translate pragmatic aspects of Javanese, the parallel text alignment model described uses a phrase pair combination. The algorithm aligns the parallel text automatically from the beginning to the end of each sentence. Even though the results of the phrase pair combination outperform the previous algorithm, it is still inefficient. Recording all possible combinations consume more space in the database and time consuming. The original algorithm is modified by applying the edit distance coefficient to improve the data-storage efficiency. As a result, the data-storage consumption is 90% reduced as well as its learning period (42s).Keywords: edit distance coefficient, Javanese, parallel text alignment, phrase pair combination
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1728519 Some Characteristics of Systolic Arrays
Authors: Halil Snopce, Ilir Spahiu
Abstract:
In this paper is investigated a possible optimization of some linear algebra problems which can be solved by parallel processing using the special arrays called systolic arrays. In this paper are used some special types of transformations for the designing of these arrays. We show the characteristics of these arrays. The main focus is on discussing the advantages of these arrays in parallel computation of matrix product, with special approach to the designing of systolic array for matrix multiplication. Multiplication of large matrices requires a lot of computational time and its complexity is O(n3 ). There are developed many algorithms (both sequential and parallel) with the purpose of minimizing the time of calculations. Systolic arrays are good suited for this purpose. In this paper we show that using an appropriate transformation implicates in finding more optimal arrays for doing the calculations of this type.Keywords: Data dependences, matrix multiplication, systolicarray, transformation matrix.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1521518 Detection and Classification of Faults on Parallel Transmission Lines Using Wavelet Transform and Neural Network
Authors: V.S.Kale, S.R.Bhide, P.P.Bedekar, G.V.K.Mohan
Abstract:
The protection of parallel transmission lines has been a challenging task due to mutual coupling between the adjacent circuits of the line. This paper presents a novel scheme for detection and classification of faults on parallel transmission lines. The proposed approach uses combination of wavelet transform and neural network, to solve the problem. While wavelet transform is a powerful mathematical tool which can be employed as a fast and very effective means of analyzing power system transient signals, artificial neural network has a ability to classify non-linear relationship between measured signals by identifying different patterns of the associated signals. The proposed algorithm consists of time-frequency analysis of fault generated transients using wavelet transform, followed by pattern recognition using artificial neural network to identify the type of the fault. MATLAB/Simulink is used to generate fault signals and verify the correctness of the algorithm. The adaptive discrimination scheme is tested by simulating different types of fault and varying fault resistance, fault location and fault inception time, on a given power system model. The simulation results show that the proposed scheme for fault diagnosis is able to classify all the faults on the parallel transmission line rapidly and correctly.
Keywords: Artificial neural network, fault detection and classification, parallel transmission lines, wavelet transform.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3011517 On the Efficient Implementation of a Serial and Parallel Decomposition Algorithm for Fast Support Vector Machine Training Including a Multi-Parameter Kernel
Authors: Tatjana Eitrich, Bruno Lang
Abstract:
This work deals with aspects of support vector machine learning for large-scale data mining tasks. Based on a decomposition algorithm for support vector machine training that can be run in serial as well as shared memory parallel mode we introduce a transformation of the training data that allows for the usage of an expensive generalized kernel without additional costs. We present experiments for the Gaussian kernel, but usage of other kernel functions is possible, too. In order to further speed up the decomposition algorithm we analyze the critical problem of working set selection for large training data sets. In addition, we analyze the influence of the working set sizes onto the scalability of the parallel decomposition scheme. Our tests and conclusions led to several modifications of the algorithm and the improvement of overall support vector machine learning performance. Our method allows for using extensive parameter search methods to optimize classification accuracy.
Keywords: Support Vector Machine Training, Multi-ParameterKernels, Shared Memory Parallel Computing, Large Data
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1443516 Comparative Evaluation of Adaptive and Conventional Distance Relay for Parallel Transmission Line with Mutual Coupling
Authors: S.G. Srivani, Chandrasekhar Reddy Atla, K.P.Vittal
Abstract:
This paper presents the development of adaptive distance relay for protection of parallel transmission line with mutual coupling. The proposed adaptive relay, automatically adjusts its operation based on the acquisition of the data from distance relay of adjacent line and status of adjacent line from line circuit breaker IED (Intelligent Electronic Device). The zero sequence current of the adjacent parallel transmission line is used to compute zero sequence current ratio and the mutual coupling effect is fully compensated. The relay adapts to changing circumstances, like failure in communication from other relays and non - availability of adjacent transmission line. The performance of the proposed adaptive relay is tested using steady state and dynamic test procedures. The fault transients are obtained by simulating a realistic parallel transmission line system with mutual coupling effect in PSCAD. The evaluation test results show the efficacy of adaptive distance relay over the conventional distance relay.Keywords: Adaptive relaying, distance measurement, mutualcoupling, quadrilateral trip characteristic, zones of protection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3145515 Parallel Explicit Group Domain Decomposition Methods for the Telegraph Equation
Authors: Kew Lee Ming, Norhashidah Hj. Mohd. Ali
Abstract:
In a previous work, we presented the numerical solution of the two dimensional second order telegraph partial differential equation discretized by the centred and rotated five-point finite difference discretizations, namely the explicit group (EG) and explicit decoupled group (EDG) iterative methods, respectively. In this paper, we utilize a domain decomposition algorithm on these group schemes to divide the tasks involved in solving the same equation. The objective of this study is to describe the development of the parallel group iterative schemes under OpenMP programming environment as a way to reduce the computational costs of the solution processes using multicore technologies. A detailed performance analysis of the parallel implementations of points and group iterative schemes will be reported and discussed.Keywords: Telegraph equation, explicit group iterative scheme, domain decomposition algorithm, parallelization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1525514 Low Power and Less Area Architecture for Integer Motion Estimation
Authors: C Hisham, K Komal, Amit K Mishra
Abstract:
Full search block matching algorithm is widely used for hardware implementation of motion estimators in video compression algorithms. In this paper we are proposing a new architecture, which consists of a 2D parallel processing unit and a 1D unit both working in parallel. The proposed architecture reduces both data access power and computational power which are the main causes of power consumption in integer motion estimation. It also completes the operations with nearly the same number of clock cycles as compared to a 2D systolic array architecture. In this work sum of absolute difference (SAD)-the most repeated operation in block matching, is calculated in two steps. The first step is to calculate the SAD for alternate rows by a 2D parallel unit. If the SAD calculated by the parallel unit is less than the stored minimum SAD, the SAD of the remaining rows is calculated by the 1D unit. Early termination, which stops avoidable computations has been achieved with the help of alternate rows method proposed in this paper and by finding a low initial SAD value based on motion vector prediction. Data reuse has been applied to the reference blocks in the same search area which significantly reduced the memory access.
Keywords: Sum of absolute difference, high speed DSP.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1492513 Applying Autonomic Computing Concepts to Parallel Computing using Intelligent Agents
Authors: Blesson Varghese, Gerard T. McKee
Abstract:
The work reported in this paper is motivated by the fact that there is a need to apply autonomic computing concepts to parallel computing systems. Advancing on prior work based on intelligent cores [36], a swarm-array computing approach, this paper focuses on 'Intelligent agents' another swarm-array computing approach in which the task to be executed on a parallel computing core is considered as a swarm of autonomous agents. A task is carried to a computing core by carrier agents and is seamlessly transferred between cores in the event of a predicted failure, thereby achieving self-ware objectives of autonomic computing. The feasibility of the proposed swarm-array computing approach is validated on a multi-agent simulator.
Keywords: Autonomic computing, intelligent agents, swarm-array computing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1592512 A TFETI Domain Decompositon Solver for Von Mises Elastoplasticity Model with Combination of Linear Isotropic-Kinematic Hardening
Authors: Martin Cermak, Stanislav Sysala
Abstract:
In this paper we present the efficient parallel implementation of elastoplastic problems based on the TFETI (Total Finite Element Tearing and Interconnecting) domain decomposition method. This approach allow us to use parallel solution and compute this nonlinear problem on the supercomputers and decrease the solution time and compute problems with millions of DOFs. In our approach we consider an associated elastoplastic model with the von Mises plastic criterion and the combination of linear isotropic-kinematic hardening law. This model is discretized by the implicit Euler method in time and by the finite element method in space. We consider the system of nonlinear equations with a strongly semismooth and strongly monotone operator. The semismooth Newton method is applied to solve this nonlinear system. Corresponding linearized problems arising in the Newton iterations are solved in parallel by the above mentioned TFETI. The implementation of this problem is realized in our in-house MatSol packages developed in MatLab.
Keywords: Isotropic-kinematic hardening, TFETI, domain decomposition, parallel solution.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1759511 Discrete Breeding Swarm for Cost Minimization of Parallel Job Shop Scheduling Problem
Authors: Tarek Aboueldah, Hanan Farag
Abstract:
Parallel Job Shop Scheduling Problem (JSSP) is a multi-objective and multi constrains NP-optimization problem. Traditional Artificial Intelligence techniques have been widely used; however, they could be trapped into the local minimum without reaching the optimum solution. Thus, we propose a hybrid Artificial Intelligence (AI) model with Discrete Breeding Swarm (DBS) added to traditional AI to avoid this trapping. This model is applied in the cost minimization of the Car Sequencing and Operator Allocation (CSOA) problem. The practical experiment shows that our model outperforms other techniques in cost minimization.
Keywords: Parallel Job Shop Scheduling Problem, Artificial Intelligence, Discrete Breeding Swarm, Car Sequencing and Operator Allocation, cost minimization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 610510 Design and Fabrication of an Electrostatically Actuated Parallel-Plate Mirror by 3D-Printer
Authors: J. Mizuno, S. Takahashi
Abstract:
In this paper, design and fabrication of an actuated parallel-plate mirror based on a 3D-printer is described. The mirror and electrode layers are fabricated separately and assembled thereafter. The alignment is performed by dowel pin-hole pairs fabricated on the respective layers. The electrodes are formed on the surface of the electrode layer by Au ion sputtering using a suitable mask, which is also fabricated by a 3D-printer.For grounding the mirror layer, except the contact area with the electrode paths, all the surface is Au ion sputtered. 3D-printers are widely used for creating 3D models or mock-ups. The authors have recently proposed that these models can perform electromechanical functions such as actuators by suitably masking them followed by metallization process. Since the smallest possible fabrication size is in the order of sub-millimeters, these electromechanical devices are named by the authors as SMEMS (Sub-Milli Electro-Mechanical Systems) devices. The proposed mirror described in this paper which consists of parallel-plate electrostatic actuators is also one type of SMEMS devices. In addition, SMEMS is totally environment-clean compared to MEMS (Micro Electro-Mechanical Systems) fabrication processes because any hazardous chemicals or gases are utilized.
Keywords: MEMS, parallel-plate mirror, SMEMS, 3D-printer.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1810509 Parallel-computing Approach for FFT Implementation on Digital Signal Processor (DSP)
Authors: Yi-Pin Hsu, Shin-Yu Lin
Abstract:
An efficient parallel form in digital signal processor can improve the algorithm performance. The butterfly structure is an important role in fast Fourier transform (FFT), because its symmetry form is suitable for hardware implementation. Although it can perform a symmetric structure, the performance will be reduced under the data-dependent flow characteristic. Even though recent research which call as novel memory reference reduction methods (NMRRM) for FFT focus on reduce memory reference in twiddle factor, the data-dependent property still exists. In this paper, we propose a parallel-computing approach for FFT implementation on digital signal processor (DSP) which is based on data-independent property and still hold the property of low-memory reference. The proposed method combines final two steps in NMRRM FFT to perform a novel data-independent structure, besides it is very suitable for multi-operation-unit digital signal processor and dual-core system. We have applied the proposed method of radix-2 FFT algorithm in low memory reference on TI TMSC320C64x DSP. Experimental results show the method can reduce 33.8% clock cycles comparing with the NMRRM FFT implementation and keep the low-memory reference property.
Keywords: Parallel-computing, FFT, low-memory reference, TIDSP.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2198508 Parallel Computation in Hypersonic Aerodynamic Heating Problem
Authors: Ding Guo-hao, Li Hua, Wang Wen-long
Abstract:
A parallel computational fluid dynamics code has been developed for the study of aerodynamic heating problem in hypersonic flows. The code employs the 3D Navier-Stokes equations as the basic governing equations to simulate the laminar hypersonic flow. The cell centered finite volume method based on structured grid is applied for spatial discretization. The AUSMPW+ scheme is used for the inviscid fluxes, and the MUSCL approach is used for higher order spatial accuracy. The implicit LU-SGS scheme is applied for time integration to accelerate the convergence of computations in steady flows. A parallel programming method based on MPI is employed to shorten the computing time. The validity of the code is demonstrated by comparing the numerical calculation result with the experimental data of a hypersonic flow field around a blunt body.Keywords: Aerodynamic Heating, AUSMPW+, MPI, ParallelComputation
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1965507 Parallel Discrete Fourier Transform for Fast FIR Filtering Based on Overlapped-save Block Structure
Authors: Ying-Wen Bai, Ju-Maw Chen
Abstract:
To successfully provide a fast FIR filter with FTT algorithms, overlapped-save algorithms can be used to lower the computational complexity and achieve the desired real-time processing. As the length of the input block increases in order to improve the efficiency, a larger volume of zero padding will greatly increase the computation length of the FFT. In this paper, we use the overlapped block digital filtering to construct a parallel structure. As long as the down-sampling (or up-sampling) factor is an exact multiple lengths of the impulse response of a FIR filter, we can process the input block by using a parallel structure and thus achieve a low-complex fast FIR filter with overlapped-save algorithms. With a long filter length, the performance and the throughput of the digital filtering system will also be greatly enhanced.
Keywords: FIR Filter, Overlapped-save Algorithm, ParallelStructure
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1669