Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 608

Search results for: parallel processing

608 Specialization-based parallel Processing without Memo-trees

Authors: Hidemi Ogasawara, Kiyoshi Akama, Hiroshi Mabuchi

Abstract:

The purpose of this paper is to propose a framework for constructing correct parallel processing programs based on Equivalent Transformation Framework (ETF). ETF regards computation as In the framework, a problem-s domain knowledge and a query are described in definite clauses, and computation is regarded as transformation of the definite clauses. Its meaning is defined by a model of the set of definite clauses, and the transformation rules generated must preserve meaning. We have proposed a parallel processing method based on “specialization", a part of operation in the transformations, which resembles substitution in logic programming. The method requires “Memo-tree", a history of specialization to maintain correctness. In this paper we proposes the new method for the specialization-base parallel processing without Memo-tree.

Keywords: Parallel processing, Program correctness, Equivalent transformation, Specializer generation rule

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
607 Parallel Vector Processing Using Multi Level Orbital DATA

Authors: Nagi Mekhiel

Abstract:

Many applications use vector operations by applying single instruction to multiple data that map to different locations in conventional memory. Transferring data from memory is limited by access latency and bandwidth affecting the performance gain of vector processing. We present a memory system that makes all of its content available to processors in time so that processors need not to access the memory, we force each location to be available to all processors at a specific time. The data move in different orbits to become available to other processors in higher orbits at different time. We use this memory to apply parallel vector operations to data streams at first orbit level. Data processed in the first level move to upper orbit one data element at a time, allowing a processor in that orbit to apply another vector operation to deal with serial code limitations inherited in all parallel applications and interleaved it with lower level vector operations.

Keywords: Memory organization, parallel processors, serial code, vector processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
606 A Multi-Level WEB Based Parallel Processing System A Hierarchical Volunteer Computing Approach

Authors: Abdelrahman Ahmed Mohamed Osman

Abstract:

Over the past few years, a number of efforts have been exerted to build parallel processing systems that utilize the idle power of LAN-s and PC-s available in many homes and corporations. The main advantage of these approaches is that they provide cheap parallel processing environments for those who cannot afford the expenses of supercomputers and parallel processing hardware. However, most of the solutions provided are not very flexible in the use of available resources and very difficult to install and setup. In this paper, a multi-level web-based parallel processing system (MWPS) is designed (appendix). MWPS is based on the idea of volunteer computing, very flexible, easy to setup and easy to use. MWPS allows three types of subscribers: simple volunteers (single computers), super volunteers (full networks) and end users. All of these entities are coordinated transparently through a secure web site. Volunteer nodes provide the required processing power needed by the system end users. There is no limit on the number of volunteer nodes, and accordingly the system can grow indefinitely. Both volunteer and system users must register and subscribe. Once, they subscribe, each entity is provided with the appropriate MWPS components. These components are very easy to install. Super volunteer nodes are provided with special components that make it possible to delegate some of the load to their inner nodes. These inner nodes may also delegate some of the load to some other lower level inner nodes .... and so on. It is the responsibility of the parent super nodes to coordinate the delegation process and deliver the results back to the user. MWPS uses a simple behavior-based scheduler that takes into consideration the current load and previous behavior of processing nodes. Nodes that fulfill their contracts within the expected time get a high degree of trust. Nodes that fail to satisfy their contract get a lower degree of trust. MWPS is based on the .NET framework and provides the minimal level of security expected in distributed processing environments. Users and processing nodes are fully authenticated. Communications and messages between nodes are very secure. The system has been implemented using C#. MWPS may be used by any group of people or companies to establish a parallel processing or grid environment.

Keywords: Volunteer computing, Parallel Processing, XMLWebServices, .NET Remoting, Tuplespace.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
605 Concurrency without Locking in Parallel Hash Structures used for Data Processing

Authors: Ákos Dudás, Sándor Juhász

Abstract:

Various mechanisms providing mutual exclusion and thread synchronization can be used to support parallel processing within a single computer. Instead of using locks, semaphores, barriers or other traditional approaches in this paper we focus on alternative ways for making better use of modern multithreaded architectures and preparing hash tables for concurrent accesses. Hash structures will be used to demonstrate and compare two entirely different approaches (rule based cooperation and hardware synchronization support) to an efficient parallel implementation using traditional locks. Comparison includes implementation details, performance ranking and scalability issues. We aim at understanding the effects the parallelization schemes have on the execution environment with special focus on the memory system and memory access characteristics.

Keywords: Lock-free synchronization, mutual exclusion, parallel hash tables, parallel performance

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
604 Parallel Priority Region Approach to Detect Background

Authors: Sallama Athab, Hala Bahjat, Zhang Yinghui

Abstract:

Background detection is essential in video analyses; optimization is often needed in order to achieve real time calculation. Information gathered by dual cameras placed in the front and rear part of an Autonomous Vehicle (AV) is integrated for background detection. In this paper, real time calculation is achieved on the proposed technique by using Priority Regions (PR) and Parallel Processing together where each frame is divided into regions then and each region process is processed in parallel. PR division depends upon driver view limitations. A background detection system is built on the Temporal Difference (TD) and Gaussian Filtering (GF). Temporal Difference and Gaussian Filtering with multi threshold and sigma (weight) value are be based on PR characteristics. The experiment result is prepared on real scene. Comparison of the speed and accuracy with traditional background detection techniques, the effectiveness of PR and parallel processing are also discussed in this paper.

Keywords: Autonomous Vehicle, Background Detection, Dual Camera, Gaussian Filtering, Parallel Processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
603 High Performance in Parallel Data Integration: An Empirical Evaluation of the Ratio Between Processing Time and Number of Physical Nodes

Authors: Caspar von Seckendorff, Eldar Sultanow

Abstract:

Many studies have shown that parallelization decreases efficiency [1], [2]. There are many reasons for these decrements. This paper investigates those which appear in the context of parallel data integration. Integration processes generally cannot be allocated to packages of identical size (i. e. tasks of identical complexity). The reason for this is unknown heterogeneous input data which result in variable task lengths. Process delay is defined by the slowest processing node. It leads to a detrimental effect on the total processing time. With a real world example, this study will show that while process delay does initially increase with the introduction of more nodes it ultimately decreases again after a certain point. The example will make use of the cloud computing platform Hadoop and be run inside Amazon-s EC2 compute cloud. A stochastic model will be set up which can explain this effect.

Keywords: Process delay, speedup, efficiency, parallel computing, data integration, E-Commerce, Amazon Elastic Compute Cloud (EC2), Hadoop, Nutch.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
602 Parallel Text Processing: Alignment of Indonesian to Javanese Language

Authors: Aji P. Wibawa, Andrew Nafalski, Neil Murray, Wayan F. Mahmudy

Abstract:

Parallel text alignment is proposed as a way of aligning bahasa Indonesia to words in Javanese. Since the one-to-one word translator does not have the facility to translate pragmatic aspects of Javanese, the parallel text alignment model described uses a phrase pair combination. The algorithm aligns the parallel text automatically from the beginning to the end of each sentence. Even though the results of the phrase pair combination outperform the previous algorithm, it is still inefficient. Recording all possible combinations consume more space in the database and time consuming. The original algorithm is modified by applying the edit distance coefficient to improve the data-storage efficiency. As a result, the data-storage consumption is 90% reduced as well as its learning period (42s).

Keywords: Parallel text alignment, phrase pair combination, edit distance coefficient, Javanese-Indonesian language.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
601 Processor Scheduling on Parallel Computers

Authors: Mohammad S. Laghari, Gulzar A. Khuwaja

Abstract:

Many problems in computer vision and image processing present potential for parallel implementations through one of the three major paradigms of geometric parallelism, algorithmic parallelism and processor farming. Static process scheduling techniques are used successfully to exploit geometric and algorithmic parallelism, while dynamic process scheduling is better suited to dealing with the independent processes inherent in the process farming paradigm. This paper considers the application of parallel or multi-computers to a class of problems exhibiting spatial data characteristic of the geometric paradigm. However, by using processor farming paradigm, a dynamic scheduling technique is developed to suit the MIMD structure of the multi-computers. A hybrid scheme of scheduling is also developed and compared with the other schemes. The specific problem chosen for the investigation is the Hough transform for line detection.

Keywords: Hough transforms, parallel computer, parallel paradigms, scheduling.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
600 Implementation of Parallel Interface for Microprocessor Trainer

Authors: Moe Moe Htun, Khin Htar Nwe

Abstract:

In this paper, parallel interface for microprocessor trainer was implemented. A programmable parallel–port device such as the IC 8255A is initialized for simple input or output and for handshake input or output by choosing kinds of modes. The hardware connections and the programs can be used to interface microprocessor trainer and a personal computer by using IC 8255A. The assembly programs edited on PC-s editor can be downloaded to the trainer.

Keywords: Parallel I/O ports, parallel interface, trainer, two 8255 ICs.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
599 Concurrent Approach to Data Parallel Model using Java

Authors: Bala Dhandayuthapani Veerasamy

Abstract:

Parallel programming models exist as an abstraction of hardware and memory architectures. There are several parallel programming models in commonly use; they are shared memory model, thread model, message passing model, data parallel model, hybrid model, Flynn-s models, embarrassingly parallel computations model, pipelined computations model. These models are not specific to a particular type of machine or memory architecture. This paper expresses the model program for concurrent approach to data parallel model through java programming.

Keywords: Concurrent, Data Parallel, JDK, Parallel, Thread

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
598 A Message Passing Implementation of a New Parallel Arrangement Algorithm

Authors: Ezequiel Herruzo, Juan José Cruz, José Ignacio Benavides, Oscar Plata

Abstract:

This paper describes a new algorithm of arrangement in parallel, based on Odd-Even Mergesort, called division and concurrent mixes. The main idea of the algorithm is to achieve that each processor uses a sequential algorithm for ordering a part of the vector, and after that, for making the processors work in pairs in order to mix two of these sections ordered in a greater one, also ordered; after several iterations, the vector will be completely ordered. The paper describes the implementation of the new algorithm on a Message Passing environment (such as MPI). Besides, it compares the obtained experimental results with the quicksort sequential algorithm and with the parallel implementations (also on MPI) of the algorithms quicksort and bitonic sort. The comparison has been realized in an 8 processors cluster under GNU/Linux which is running on a unique PC processor.

Keywords: Parallel algorithm, arrangement, MPI, sorting, parallel program.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
597 Modified Scaling-Free CORDIC Based Pipelined Parallel MDC FFT and IFFT Architecture for Radix 2^2 Algorithm

Authors: C. Paramasivam, K. B. Jayanthi

Abstract:

An innovative approach to develop modified scaling free CORDIC based two parallel pipelined Multipath Delay Commutator (MDC) FFT and IFFT architectures for radix 22 FFT algorithm is presented. Multipliers and adders are the most important data paths in FFT and IFFT architectures. Multipliers occupy high area and consume more power. In order to optimize the area and power overhead, modified scaling-free CORDIC based complex multiplier is utilized in the proposed design. In general twiddle factor values are stored in RAM block. In the proposed work, modified scaling-free CORDIC based twiddle factor generator unit is used to generate the twiddle factor and efficient switching units are used. In addition to this, four point FFT operations are performed without complex multiplication which helps to reduce area and power in the last two stages of the pipelined architectures. The design proposed in this paper is based on multipath delay commutator method. The proposed design can be extended to any radix 2n based FFT/IFFT algorithm to improve the throughput. The work is synthesized using Synopsys design Compiler using TSMC 90-nm library. The proposed method proves to be better compared to the reference design in terms of area, throughput and power consumption. The comparative analysis of the proposed design with Xilinx FPGA platform is also discussed in the paper.

Keywords: Coordinate Rotational Digital Computer(CORDIC), Complex multiplier, Fast Fourier transform (FFT), Inverse fast Fourier transform (IFFT), Multipath delay Commutator (MDC), modified scaling free CORDIC, complex multiplier, pipelining, parallel processing, radix-2^2.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
596 Performance Analysis of Parallel Client-Server Model Versus Parallel Mobile Agent Model

Authors: K. B. Manwade, G. A. Patil

Abstract:

Mobile agent has motivated the creation of a new methodology for parallel computing. We introduce a methodology for the creation of parallel applications on the network. The proposed Mobile-Agent parallel processing framework uses multiple Javamobile Agents. Each mobile agent can travel to the specified machine in the network to perform its tasks. We also introduce the concept of master agent, which is Java object capable of implementing a particular task of the target application. Master agent is dynamically assigns the task to mobile agents. We have developed and tested a prototype application: Mobile Agent Based Parallel Computing. Boosted by the inherited benefits of using Java and Mobile Agents, our proposed methodology breaks the barriers between the environments, and could potentially exploit in a parallel manner all the available computational resources on the network. This paper elaborates performance issues of a mobile agent for parallel computing.

Keywords: Parallel Computing, Mobile Agent.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
595 A Parallel Architecture for the Real Time Correction of Stereoscopic Images

Authors: Zohir Irki, Michel Devy

Abstract:

In this paper, we will present an architecture for the implementation of a real time stereoscopic images correction's approach. This architecture is parallel and makes use of several memory blocs in which are memorized pre calculated data relating to the cameras used for the acquisition of images. The use of reduced images proves to be essential in the proposed approach; the suggested architecture must so be able to carry out the real time reduction of original images.

Keywords: Image reduction, Real-time correction, Parallel architecture, Parallel treatment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
594 Kinematic Analysis of a Novel Complex DoF Parallel Manipulator

Authors: M.A. Hosseini, P. Ebrahimi Naghani

Abstract:

In this research work, a novel parallel manipulator with high positioning and orienting rate is introduced. This mechanism has two rotational and one translational degree of freedom. Kinematics and Jacobian analysis are investigated. Moreover, workspace analysis and optimization has been performed by using genetic algorithm toolbox in Matlab software. Because of decreasing moving elements, it is expected much more better dynamic performance with respect to other counterpart mechanisms with the same degrees of freedom. In addition, using couple of cylindrical and revolute joints increased mechanism ability to have more extended workspace.

Keywords: Kinematics, Workspace, 3-CRS/PU, Parallel robot

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
593 Parallel 2-Opt Local Search on GPU

Authors: Wen-Bao Qiao, Jean-Charles Créput

Abstract:

To accelerate the solution for large scale traveling salesman problems (TSP), a parallel 2-opt local search algorithm with simple implementation based on Graphics Processing Unit (GPU) is presented and tested in this paper. The parallel scheme is based on technique of data decomposition by dynamically assigning multiple K processors on the integral tour to treat K edges’ 2-opt local optimization simultaneously on independent sub-tours, where K can be user-defined or have a function relationship with input size N. We implement this algorithm with doubly linked list on GPU. The implementation only requires O(N) memory. We compare this parallel 2-opt local optimization against sequential exhaustive 2-opt search along integral tour on TSP instances from TSPLIB with more than 10000 cities.

Keywords: Doubly linked list, parallel 2-opt, tour division, GPU.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
592 Performance Evaluation of Parallel Surface Modeling and Generation on Actual and Virtual Multicore Systems

Authors: Nyeng P. Gyang

Abstract:

Even though past, current and future trends suggest that multicore and cloud computing systems are increasingly prevalent/ubiquitous, this class of parallel systems is nonetheless underutilized, in general, and barely used for research on employing parallel Delaunay triangulation for parallel surface modeling and generation, in particular. The performances, of actual/physical and virtual/cloud multicore systems/machines, at executing various algorithms, which implement various parallelization strategies of the incremental insertion technique of the Delaunay triangulation algorithm, were evaluated. T-tests were run on the data collected, in order to determine whether various performance metrics differences (including execution time, speedup and efficiency) were statistically significant. Results show that the actual machine is approximately twice faster than the virtual machine at executing the same programs for the various parallelization strategies. Results, which furnish the scalability behaviors of the various parallelization strategies, also show that some of the differences between the performances of these systems, during different runs of the algorithms on the systems, were statistically significant. A few pseudo superlinear speedup results, which were computed from the raw data collected, are not true superlinear speedup values. These pseudo superlinear speedup values, which arise as a result of one way of computing speedups, disappear and give way to asymmetric speedups, which are the accurate kind of speedups that occur in the experiments performed.

Keywords: Cloud computing systems, multicore systems, parallel delaunay triangulation, parallel surface modeling and generation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
591 Massively-Parallel Bit-Serial Neural Networks for Fast Epilepsy Diagnosis: A Feasibility Study

Authors: Si Mon Kueh, Tom J. Kazmierski

Abstract:

There are about 1% of the world population suffering from the hidden disability known as epilepsy and major developing countries are not fully equipped to counter this problem. In order to reduce the inconvenience and danger of epilepsy, different methods have been researched by using a artificial neural network (ANN) classification to distinguish epileptic waveforms from normal brain waveforms. This paper outlines the aim of achieving massive ANN parallelization through a dedicated hardware using bit-serial processing. The design of this bit-serial Neural Processing Element (NPE) is presented which implements the functionality of a complete neuron using variable accuracy. The proposed design has been tested taking into consideration non-idealities of a hardware ANN. The NPE consists of a bit-serial multiplier which uses only 16 logic elements on an Altera Cyclone IV FPGA and a bit-serial ALU as well as a look-up table. Arrays of NPEs can be driven by a single controller which executes the neural processing algorithm. In conclusion, the proposed compact NPE design allows the construction of complex hardware ANNs that can be implemented in a portable equipment that suits the needs of a single epileptic patient in his or her daily activities to predict the occurrences of impending tonic conic seizures.

Keywords: Artificial Neural Networks, bit-serial neural processor, FPGA, Neural Processing Element.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
590 Some Results on Parallel Alternating Methods

Authors: Guangbin Wang, Fuping Tan

Abstract:

In this paper, we investigate two parallel alternating methods for solving the system of linear equations Ax = b and give convergence theorems for the parallel alternating methods when the coefficient matrix is a nonsingular H-matrix. Furthermore, we give one example to show our results.

Keywords: Nonsingular H-matrix, parallel alternating method, convergence.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
589 Parallel-computing Approach for FFT Implementation on Digital Signal Processor (DSP)

Authors: Yi-Pin Hsu, Shin-Yu Lin

Abstract:

An efficient parallel form in digital signal processor can improve the algorithm performance. The butterfly structure is an important role in fast Fourier transform (FFT), because its symmetry form is suitable for hardware implementation. Although it can perform a symmetric structure, the performance will be reduced under the data-dependent flow characteristic. Even though recent research which call as novel memory reference reduction methods (NMRRM) for FFT focus on reduce memory reference in twiddle factor, the data-dependent property still exists. In this paper, we propose a parallel-computing approach for FFT implementation on digital signal processor (DSP) which is based on data-independent property and still hold the property of low-memory reference. The proposed method combines final two steps in NMRRM FFT to perform a novel data-independent structure, besides it is very suitable for multi-operation-unit digital signal processor and dual-core system. We have applied the proposed method of radix-2 FFT algorithm in low memory reference on TI TMSC320C64x DSP. Experimental results show the method can reduce 33.8% clock cycles comparing with the NMRRM FFT implementation and keep the low-memory reference property.

Keywords: Parallel-computing, FFT, low-memory reference, TIDSP.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
588 Productive Design and Calculation of Intermittent Mechanisms with Radial Parallel Cams

Authors: Pavel Dostrašil, Petr Jirásko

Abstract:

The paper deals with the kinematics and automated calculation of intermittent mechanisms with radial cams. Currently, electronic cams are increasingly applied in the drives of working link mechanisms. Despite a huge advantage of electronic cams in their reprogrammability or instantaneous change of displacement diagrams, conventional cam mechanisms have an irreplaceable role in production and handling machines. With high frequency of working cycle periods, the dynamic load of the proper servomotor rotor increases and efficiency of electronic cams strongly decreases. Though conventional intermittent mechanisms with radial cams are representatives of fixed automation, they have distinct advantages in their high speed (high dynamics), positional accuracy and relatively easy manufacture. We try to remove the disadvantage of firm displacement diagram by reducing costs for simple design and automated calculation that leads reliably to high-quality and inexpensive manufacture.

Keywords: Cam mechanism, displacement diagram, intermittentmechanism, radial parallel cam

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
587 Sub-Image Detection Using Fast Neural Processors and Image Decomposition

Authors: Hazem M. El-Bakry, Qiangfu Zhao

Abstract:

In this paper, an approach to reduce the computation steps required by fast neural networksfor the searching process is presented. The principle ofdivide and conquer strategy is applied through imagedecomposition. Each image is divided into small in sizesub-images and then each one is tested separately usinga fast neural network. The operation of fast neuralnetworks based on applying cross correlation in thefrequency domain between the input image and theweights of the hidden neurons. Compared toconventional and fast neural networks, experimentalresults show that a speed up ratio is achieved whenapplying this technique to locate human facesautomatically in cluttered scenes. Furthermore, fasterface detection is obtained by using parallel processingtechniques to test the resulting sub-images at the sametime using the same number of fast neural networks. Incontrast to using only fast neural networks, the speed upratio is increased with the size of the input image whenusing fast neural networks and image decomposition.

Keywords: Fast Neural Networks, 2D-FFT, CrossCorrelation, Image decomposition, Parallel Processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
586 Automatic Music Score Recognition System Using Digital Image Processing

Authors: Yuan-Hsiang Chang, Zhong-Xian Peng, Li-Der Jeng

Abstract:

Music has always been an integral part of human’s daily lives. But, for the most people, reading musical score and turning it into melody is not easy. This study aims to develop an Automatic music score recognition system using digital image processing, which can be used to read and analyze musical score images automatically. The technical approaches included: (1) staff region segmentation; (2) image preprocessing; (3) note recognition; and (4) accidental and rest recognition. Digital image processing techniques (e.g., horizontal /vertical projections, connected component labeling, morphological processing, template matching, etc.) were applied according to musical notes, accidents, and rests in staff notations. Preliminary results showed that our system could achieve detection and recognition rates of 96.3% and 91.7%, respectively. In conclusion, we presented an effective automated musical score recognition system that could be integrated in a system with a media player to play music/songs given input images of musical score. Ultimately, this system could also be incorporated in applications for mobile devices as a learning tool, such that a music player could learn to play music/songs.

Keywords: Connected component labeling, image processing, morphological processing, optical musical recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
585 Performance Comparison of Parallel Sorting Algorithms on the Cluster of Workstations

Authors: Lai Lai Win Kyi, Nay Min Tun

Abstract:

Sorting appears the most attention among all computational tasks over the past years because sorted data is at the heart of many computations. Sorting is of additional importance to parallel computing because of its close relation to the task of routing data among processes, which is an essential part of many parallel algorithms. Many parallel sorting algorithms have been investigated for a variety of parallel computer architectures. In this paper, three parallel sorting algorithms have been implemented and compared in terms of their overall execution time. The algorithms implemented are the odd-even transposition sort, parallel merge sort and parallel rank sort. Cluster of Workstations or Windows Compute Cluster has been used to compare the algorithms implemented. The C# programming language is used to develop the sorting algorithms. The MPI (Message Passing Interface) library has been selected to establish the communication and synchronization between processors. The time complexity for each parallel sorting algorithm will also be mentioned and analyzed.

Keywords: Cluster of Workstations, Parallel sorting algorithms, performance analysis, parallel computing and MPI.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
584 A New High Speed Neural Model for Fast Character Recognition Using Cross Correlation and Matrix Decomposition

Authors: Hazem M. El-Bakry

Abstract:

Neural processors have shown good results for detecting a certain character in a given input matrix. In this paper, a new idead to speed up the operation of neural processors for character detection is presented. Such processors are designed based on cross correlation in the frequency domain between the input matrix and the weights of neural networks. This approach is developed to reduce the computation steps required by these faster neural networks for the searching process. The principle of divide and conquer strategy is applied through image decomposition. Each image is divided into small in size sub-images and then each one is tested separately by using a single faster neural processor. Furthermore, faster character detection is obtained by using parallel processing techniques to test the resulting sub-images at the same time using the same number of faster neural networks. In contrast to using only faster neural processors, the speed up ratio is increased with the size of the input image when using faster neural processors and image decomposition. Moreover, the problem of local subimage normalization in the frequency domain is solved. The effect of image normalization on the speed up ratio of character detection is discussed. Simulation results show that local subimage normalization through weight normalization is faster than subimage normalization in the spatial domain. The overall speed up ratio of the detection process is increased as the normalization of weights is done off line.

Keywords: Fast Character Detection, Neural Processors, Cross Correlation, Image Normalization, Parallel Processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
583 Dynamic Analysis of Offshore 2-HUS/U Parallel Platform

Authors: Xie Kefeng, Zhang He

Abstract:

For the stability and control demand of offshore small floating platform, a 2-HUS/U parallel mechanism was presented as offshore platform. Inverse kinematics was obtained by institutional constraint equation, and the dynamic model of offshore 2-HUS/U parallel platform was derived based on rigid body’s Lagrangian method. The equivalent moment of inertia, damping and driving force/torque variation of offshore 2-HUS/U parallel platform were analyzed. A numerical example shows that, for parallel platform of given motion, system’s equivalent inertia changes 1.25 times maximally. During the movement of platform, they change dramatically with the system configuration and have coupling characteristics. The maximum equivalent drive torque is 800 N. At the same time, the curve of platform’s driving force/torque is smooth and has good sine features. The control system needs to be adjusted according to kinetic equation during stability and control and it provides a basis for the optimization of control system.

Keywords: 2-HUS/U platform, Dynamics, Lagrange, Parallel platform.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
582 Experimental Investigation of Plane Jets Exiting Five Parallel Channels with Large Aspect Ratio

Authors: Laurentiu Moruz, Jens Kitzhofer, Mircea Dinulescu

Abstract:

The paper aims to extend the knowledge about jet behavior and jet interaction between five plane unventilated jets with large aspect ratio (AR). The distance between the single plane jets is two times the channel height. The experimental investigation applies 2D Particle Image Velocimetry (PIV) and static pressure measurements. Our study focuses on the influence of two different outlet nozzle geometries (triangular shape with 2 x 7.5° and blunt geometry) with respect to variation of Reynolds number from 5500 - 12000. It is shown that the outlet geometry has a major influence on the jet formation in terms of uniformity of velocity profiles downstream of the sudden expansion. Furthermore, we describe characteristic regions like converging region, merging region and combined region. The triangular outlet geometry generates most uniform velocity distributions in comparison to a blunt outlet nozzle geometry. The blunt outlet geometry shows an unstable behavior where the jets tend to attach to one side of the walls (ceiling) generating a large recirculation region on the opposite side. Static pressure measurements confirm the observation and indicate that the recirculation region is connected to larger pressure drop.

Keywords: 2D particle image velocimetry, parallel jet interaction, pressure drop, sudden expansion.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
581 A Consideration of the Achievement of Productive Level Parallel Programming Skills

Authors: Tadayoshi Horita, Masakazu Akiba, Mina Terauchi, Tsuneo Kanno

Abstract:

This paper gives a consideration of the achievement of productive level parallel programming skills, based on the data of the graduation studies in the Polytechnic University of Japan. The data show that most students can achieve only parallel programming skills during the graduation study (about 600 to 700 hours), if the programming environment is limited to GPGPUs. However, the data also show that it is a very high level task that a student achieves productive level parallel programming skills during only the graduation study. In addition, it shows that the parallel programming environments for GPGPU, such as CUDA and OpenCL, may be more suitable for parallel computing education than other environments such as MPI on a cluster system and Cell.B.E. These results must be useful for the areas of not only software developments, but also hardware product developments using computer technologies.

Keywords: Parallel computing, programming education, GPU, GPGPU, CUDA, OpenCL, MPI, Cell.B.E.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
580 Impact of Fair Share and its Configurations on Parallel Job Scheduling Algorithms

Authors: Sangsuree Vasupongayya

Abstract:

To provide a better understanding of fair share policies supported by current production schedulers and their impact on scheduling performance, A relative fair share policy supported in four well-known production job schedulers is evaluated in this study. The experimental results show that fair share indeed reduces heavy-demand users from dominating the system resources. However, the detailed per-user performance analysis show that some types of users may suffer unfairness under fair share, possibly due to priority mechanisms used by the current production schedulers. These users typically are not heavy-demands users but they have mixture of jobs that do not spread out.

Keywords: Fair share, Parallel job scheduler, Backfill, Measures

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
579 A Parallel Implementation of k-Means in MATLAB

Authors: Dimitris Varsamis, Christos Talagkozis, Alkiviadis Tsimpiris, Paris Mastorocostas

Abstract:

The aim of this work is the parallel implementation of k-means in MATLAB, in order to reduce the execution time. Specifically, a new function in MATLAB for serial k-means algorithm is developed, which meets all the requirements for the conversion to a function in MATLAB with parallel computations. Additionally, two different variants for the definition of initial values are presented. In the sequel, the parallel approach is presented. Finally, the performance tests for the computation times respect to the numbers of features and classes are illustrated.

Keywords: K-means algorithm, clustering, parallel computations, MATLAB.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF