Search results for: parallel processors
626 Parallel Vector Processing Using Multi Level Orbital DATA
Authors: Nagi Mekhiel
Abstract:
Many applications use vector operations by applying single instruction to multiple data that map to different locations in conventional memory. Transferring data from memory is limited by access latency and bandwidth affecting the performance gain of vector processing. We present a memory system that makes all of its content available to processors in time so that processors need not to access the memory, we force each location to be available to all processors at a specific time. The data move in different orbits to become available to other processors in higher orbits at different time. We use this memory to apply parallel vector operations to data streams at first orbit level. Data processed in the first level move to upper orbit one data element at a time, allowing a processor in that orbit to apply another vector operation to deal with serial code limitations inherited in all parallel applications and interleaved it with lower level vector operations.Keywords: Memory organization, parallel processors, serial code, vector processing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1062625 A Message Passing Implementation of a New Parallel Arrangement Algorithm
Authors: Ezequiel Herruzo, Juan José Cruz, José Ignacio Benavides, Oscar Plata
Abstract:
This paper describes a new algorithm of arrangement in parallel, based on Odd-Even Mergesort, called division and concurrent mixes. The main idea of the algorithm is to achieve that each processor uses a sequential algorithm for ordering a part of the vector, and after that, for making the processors work in pairs in order to mix two of these sections ordered in a greater one, also ordered; after several iterations, the vector will be completely ordered. The paper describes the implementation of the new algorithm on a Message Passing environment (such as MPI). Besides, it compares the obtained experimental results with the quicksort sequential algorithm and with the parallel implementations (also on MPI) of the algorithms quicksort and bitonic sort. The comparison has been realized in an 8 processors cluster under GNU/Linux which is running on a unique PC processor.Keywords: Parallel algorithm, arrangement, MPI, sorting, parallel program.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1692624 Parallel Direct Integration Variable Step Block Method for Solving Large System of Higher Order Ordinary Differential Equations
Authors: Zanariah Abdul Majid, Mohamed Suleiman
Abstract:
The aim of this paper is to investigate the performance of the developed two point block method designed for two processors for solving directly non stiff large systems of higher order ordinary differential equations (ODEs). The method calculates the numerical solution at two points simultaneously and produces two new equally spaced solution values within a block and it is possible to assign the computational tasks at each time step to a single processor. The algorithm of the method was developed in C language and the parallel computation was done on a parallel shared memory environment. Numerical results are given to compare the efficiency of the developed method to the sequential timing. For large problems, the parallel implementation produced 1.95 speed-up and 98% efficiency for the two processors.Keywords: Numerical methods, parallel method, block method, higher order ODEs.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1382623 Performance Comparison of Parallel Sorting Algorithms on the Cluster of Workstations
Authors: Lai Lai Win Kyi, Nay Min Tun
Abstract:
Sorting appears the most attention among all computational tasks over the past years because sorted data is at the heart of many computations. Sorting is of additional importance to parallel computing because of its close relation to the task of routing data among processes, which is an essential part of many parallel algorithms. Many parallel sorting algorithms have been investigated for a variety of parallel computer architectures. In this paper, three parallel sorting algorithms have been implemented and compared in terms of their overall execution time. The algorithms implemented are the odd-even transposition sort, parallel merge sort and parallel rank sort. Cluster of Workstations or Windows Compute Cluster has been used to compare the algorithms implemented. The C# programming language is used to develop the sorting algorithms. The MPI (Message Passing Interface) library has been selected to establish the communication and synchronization between processors. The time complexity for each parallel sorting algorithm will also be mentioned and analyzed.
Keywords: Cluster of Workstations, Parallel sorting algorithms, performance analysis, parallel computing and MPI.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1482622 A New High Speed Neural Model for Fast Character Recognition Using Cross Correlation and Matrix Decomposition
Authors: Hazem M. El-Bakry
Abstract:
Neural processors have shown good results for detecting a certain character in a given input matrix. In this paper, a new idead to speed up the operation of neural processors for character detection is presented. Such processors are designed based on cross correlation in the frequency domain between the input matrix and the weights of neural networks. This approach is developed to reduce the computation steps required by these faster neural networks for the searching process. The principle of divide and conquer strategy is applied through image decomposition. Each image is divided into small in size sub-images and then each one is tested separately by using a single faster neural processor. Furthermore, faster character detection is obtained by using parallel processing techniques to test the resulting sub-images at the same time using the same number of faster neural networks. In contrast to using only faster neural processors, the speed up ratio is increased with the size of the input image when using faster neural processors and image decomposition. Moreover, the problem of local subimage normalization in the frequency domain is solved. The effect of image normalization on the speed up ratio of character detection is discussed. Simulation results show that local subimage normalization through weight normalization is faster than subimage normalization in the spatial domain. The overall speed up ratio of the detection process is increased as the normalization of weights is done off line.Keywords: Fast Character Detection, Neural Processors, Cross Correlation, Image Normalization, Parallel Processing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1537621 Parallel 2-Opt Local Search on GPU
Authors: Wen-Bao Qiao, Jean-Charles Créput
Abstract:
To accelerate the solution for large scale traveling salesman problems (TSP), a parallel 2-opt local search algorithm with simple implementation based on Graphics Processing Unit (GPU) is presented and tested in this paper. The parallel scheme is based on technique of data decomposition by dynamically assigning multiple K processors on the integral tour to treat K edges’ 2-opt local optimization simultaneously on independent sub-tours, where K can be user-defined or have a function relationship with input size N. We implement this algorithm with doubly linked list on GPU. The implementation only requires O(N) memory. We compare this parallel 2-opt local optimization against sequential exhaustive 2-opt search along integral tour on TSP instances from TSPLIB with more than 10000 cities.Keywords: Doubly linked list, parallel 2-opt, tour division, GPU.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1223620 Some Preconditioners for Block Pentadiagonal Linear Systems Based on New Approximate Factorization Methods
Authors: Xian Ming Gu, Ting Zhu Huang, Hou Biao Li
Abstract:
In this paper, getting an high-efficiency parallel algorithm to solve sparse block pentadiagonal linear systems suitable for vectors and parallel processors, stair matrices are used to construct some parallel polynomial approximate inverse preconditioners. These preconditioners are appropriate when the desired target is to maximize parallelism. Moreover, some theoretical results about these preconditioners are presented and how to construct preconditioners effectively for any nonsingular block pentadiagonal H-matrices is also described. In addition, the availability of these preconditioners is illustrated with some numerical experiments arising from two dimensional biharmonic equation.
Keywords: Parallel algorithm, Pentadiagonal matrix, Polynomial approximate inverse, Preconditioners, Stair matrix.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2239619 Performance Evaluation of Task Scheduling Algorithm on LCQ Network
Authors: Zaki Ahmad Khan, Jamshed Siddiqui, Abdus Samad
Abstract:
The Scheduling and mapping of tasks on a set of processors is considered as a critical problem in parallel and distributed computing system. This paper deals with the problem of dynamic scheduling on a special type of multiprocessor architecture known as Linear Crossed Cube (LCQ) network. This proposed multiprocessor is a hybrid network which combines the features of both linear types of architectures as well as cube based architectures. Two standard dynamic scheduling schemes namely Minimum Distance Scheduling (MDS) and Two Round Scheduling (TRS) schemes are implemented on the LCQ network. Parallel tasks are mapped and the imbalance of load is evaluated on different set of processors in LCQ network. The simulations results are evaluated and effort is made by means of through analysis of the results to obtain the best solution for the given network in term of load imbalance left and execution time. The other performance matrices like speedup and efficiency are also evaluated with the given dynamic algorithms.Keywords: Dynamic algorithm, Load imbalance, Mapping, Task scheduling.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2021618 Performance Analysis of Digital Signal Processors Using SMV Benchmark
Authors: Erh-Wen Hu, Cyril S. Ku, Andrew T. Russo, Bogong Su, Jian Wang
Abstract:
Unlike general-purpose processors, digital signal processors (DSP processors) are strongly application-dependent. To meet the needs for diverse applications, a wide variety of DSP processors based on different architectures ranging from the traditional to VLIW have been introduced to the market over the years. The functionality, performance, and cost of these processors vary over a wide range. In order to select a processor that meets the design criteria for an application, processor performance is usually the major concern for digital signal processing (DSP) application developers. Performance data are also essential for the designers of DSP processors to improve their design. Consequently, several DSP performance benchmarks have been proposed over the past decade or so. However, none of these benchmarks seem to have included recent new DSP applications. In this paper, we use a new benchmark that we recently developed to compare the performance of popular DSP processors from Texas Instruments and StarCore. The new benchmark is based on the Selectable Mode Vocoder (SMV), a speech-coding program from the recent third generation (3G) wireless voice applications. All benchmark kernels are compiled by the compilers of the respective DSP processors and run on their simulators. Weighted arithmetic mean of clock cycles and arithmetic mean of code size are used to compare the performance of five DSP processors. In addition, we studied how the performance of a processor is affected by code structure, features of processor architecture and optimization of compiler. The extensive experimental data gathered, analyzed, and presented in this paper should be helpful for DSP processor and compiler designers to meet their specific design goals.Keywords: digital signal processors, DSP benchmark, instruction level parallelism, modified cyclomatic complexity, performance analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1608617 Application and Limitation of Parallel Modelingin Multidimensional Sequential Pattern
Authors: Mahdi Esmaeili, Mansour Tarafdar
Abstract:
The goal of data mining algorithms is to discover useful information embedded in large databases. One of the most important data mining problems is discovery of frequently occurring patterns in sequential data. In a multidimensional sequence each event depends on more than one dimension. The search space is quite large and the serial algorithms are not scalable for very large datasets. To address this, it is necessary to study scalable parallel implementations of sequence mining algorithms. In this paper, we present a model for multidimensional sequence and describe a parallel algorithm based on data parallelism. Simulation experiments show good load balancing and scalable and acceptable speedup over different processors and problem sizes and demonstrate that our approach can works efficiently in a real parallel computing environment.Keywords: Sequential Patterns, Data Mining, ParallelAlgorithm, Multidimensional Sequence Data
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1476616 Optimal Placement of Processors based on Effective Communication Load
Authors: A. R. Aswatha, T. Basavaraju, N. Bhaskara Rao
Abstract:
This paper presents a new technique for the optimum placement of processors to minimize the total effective communication load under multi-processor communication dominated environment. This is achieved by placing heavily loaded processors near each other and lightly loaded ones far away from one another in the physical grid locations. The results are mathematically proved for the Algorithms are described.Keywords: Ascending Sort Index Vector, EffectiveCommunication Load, Effective Distance Matrix, OptimalPlacement, Sorting Order.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1349615 A Parallel Algorithm for 2-D Cylindrical Geometry Transport Equation with Interface Corrections
Authors: Wei Jun-xia, Yuan Guang-wei, Yang Shu-lin, Shen Wei-dong
Abstract:
In order to make conventional implicit algorithm to be applicable in large scale parallel computers , an interface prediction and correction of discontinuous finite element method is presented to solve time-dependent neutron transport equations under 2-D cylindrical geometry. Domain decomposition is adopted in the computational domain.The numerical experiments show that our parallel algorithm with explicit prediction and implicit correction has good precision, parallelism and simplicity. Especially, it can reach perfect speedup even on hundreds of processors for large-scale problems.
Keywords: Transport Equation, Discontinuous Finite Element, Domain Decomposition, Interface Prediction And Correction
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1665614 A Survey on Performance Tools for OpenMP
Authors: Mubarak S. Mohsen, Rosni Abdullah, Yong M. Teo
Abstract:
Advances in processors architecture, such as multicore, increase the size of complexity of parallel computer systems. With multi-core architecture there are different parallel languages that can be used to run parallel programs. One of these languages is OpenMP which embedded in C/Cµ or FORTRAN. Because of this new architecture and the complexity, it is very important to evaluate the performance of OpenMP constructs, kernels, and application program on multi-core systems. Performance is the activity of collecting the information about the execution characteristics of a program. Performance tools consists of at least three interfacing software layers, including instrumentation, measurement, and analysis. The instrumentation layer defines the measured performance events. The measurement layer determines what performance event is actually captured and how it is measured by the tool. The analysis layer processes the performance data and summarizes it into a form that can be displayed in performance tools. In this paper, a number of OpenMP performance tools are surveyed, explaining how each is used to collect, analyse, and display data collection.Keywords: Parallel performance tools, OpenMP, multi-core.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1922613 An Innovational Intermittent Algorithm in Networks-On-Chip (NOC)
Authors: Ahmad M. Shafiee, Mehrdad Montazeri, Mahdi Nikdast
Abstract:
Every day human life experiences new equipments more automatic and with more abilities. So the need for faster processors doesn-t seem to finish. Despite new architectures and higher frequencies, a single processor is not adequate for many applications. Parallel processing and networks are previous solutions for this problem. The new solution to put a network of resources on a chip is called NOC (network on a chip). The more usual topology for NOC is mesh topology. There are several routing algorithms suitable for this topology such as XY, fully adaptive, etc. In this paper we have suggested a new algorithm named Intermittent X, Y (IX/Y). We have developed the new algorithm in simulation environment to compare delay and power consumption with elders' algorithms.Keywords: Computer architecture, parallel computing, NOC, routing algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1678612 A Methodology for the Synthesis of Multi-Processors
Authors: Hamid Yasinian
Abstract:
Random epistemologies and hash tables have garnered minimal interest from both security experts and experts in the last several years. In fact, few information theorists would disagree with the evaluation of expert systems. In our research, we discover how flip-flop gates can be applied to the study of superpages. Though such a hypothesis at first glance seems perverse, it is derived from known results.
Keywords: Synthesis, Multi-Processors, Interactive Model, Moor’s Law.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2300611 Concurrent Approach to Data Parallel Model using Java
Authors: Bala Dhandayuthapani Veerasamy
Abstract:
Parallel programming models exist as an abstraction of hardware and memory architectures. There are several parallel programming models in commonly use; they are shared memory model, thread model, message passing model, data parallel model, hybrid model, Flynn-s models, embarrassingly parallel computations model, pipelined computations model. These models are not specific to a particular type of machine or memory architecture. This paper expresses the model program for concurrent approach to data parallel model through java programming.Keywords: Concurrent, Data Parallel, JDK, Parallel, Thread
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2097610 Damage Strain Analysis of Parallel Fiber Eutectic
Authors: Jian Zheng, Xinhua Ni, Xiequan Liu
Abstract:
According to isotropy of parallel fiber eutectic, the no- damage strain field in parallel fiber eutectic is obtained from the flexibility tensor of parallel fiber eutectic. Considering the damage behavior of parallel fiber eutectic, damage variables are introduced to determine the strain field of parallel fiber eutectic. The damage strains in the matrix, interphase, and fiber of parallel fiber eutectic are quantitatively analyzed. Results show that damage strains are not only associated with the fiber volume fraction of parallel fiber eutectic, but also with the damage degree.
Keywords: Parallel fiber eutectic, no-damage strain, damage strain, fiber volume fraction, damage degree.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 954609 Performance Analysis of Load Balancing Algorithms
Authors: Sandeep Sharma, Sarabjit Singh, Meenakshi Sharma
Abstract:
Load balancing is the process of improving the performance of a parallel and distributed system through a redistribution of load among the processors [1] [5]. In this paper we present the performance analysis of various load balancing algorithms based on different parameters, considering two typical load balancing approaches static and dynamic. The analysis indicates that static and dynamic both types of algorithm can have advancements as well as weaknesses over each other. Deciding type of algorithm to be implemented will be based on type of parallel applications to solve. The main purpose of this paper is to help in design of new algorithms in future by studying the behavior of various existing algorithms.Keywords: Load balancing (LB), workload, distributed systems, Static Load balancing, Dynamic Load Balancing
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5945608 A Practical Distributed String Matching Algorithm Architecture and Implementation
Authors: Bi Kun, Gu Nai-jie, Tu Kun, Liu Xiao-hu, Liu Gang
Abstract:
Traditional parallel single string matching algorithms are always based on PRAM computation model. Those algorithms concentrate on the cost optimal design and the theoretical speed. Based on the distributed string matching algorithm proposed by CHEN, a practical distributed string matching algorithm architecture is proposed in this paper. And also an improved single string matching algorithm based on a variant Boyer-Moore algorithm is presented. We implement our algorithm on the above architecture and the experiments prove that it is really practical and efficient on distributed memory machine. Its computation complexity is O(n/p + m), where n is the length of the text, and m is the length of the pattern, and p is the number of the processors.Keywords: Boyer-Moore algorithm, distributed algorithm, parallel string matching, string matching.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2189607 New VLSI Architecture for Motion Estimation Algorithm
Authors: V. S. K. Reddy, S. Sengupta, Y. M. Latha
Abstract:
This paper presents an efficient VLSI architecture design to achieve real time video processing using Full-Search Block Matching (FSBM) algorithm. The design employs parallel bank architecture with minimum latency, maximum throughput, and full hardware utilization. We use nine parallel processors in our architecture and each controlled by a state machine. State machine control implementation makes the design very simple and cost effective. The design is implemented using VHDL and the programming techniques we incorporated makes the design completely programmable in the sense that the search ranges and the block sizes can be varied to suit any given requirements. The design can operate at frequencies up to 36 MHz and it can function in QCIF and CIF video resolution at 1.46 MHz and 5.86 MHz, respectively.Keywords: Video Coding, Motion Estimation, Full-Search, Block-Matching, VLSI Architecture.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1807606 An Enhanced Distributed System to improve theTime Complexity of Binary Indexed Trees
Authors: Ahmed M. Elhabashy, A. Baes Mohamed, Abou El Nasr Mohamad
Abstract:
Distributed Computing Systems are usually considered the most suitable model for practical solutions of many parallel algorithms. In this paper an enhanced distributed system is presented to improve the time complexity of Binary Indexed Trees (BIT). The proposed system uses multi-uniform processors with identical architectures and a specially designed distributed memory system. The analysis of this system has shown that it has reduced the time complexity of the read query to O(Log(Log(N))), and the update query to constant complexity, while the naive solution has a time complexity of O(Log(N)) for both queries. The system was implemented and simulated using VHDL and Verilog Hardware Description Languages, with xilinx ISE 10.1, as the development environment and ModelSim 6.1c, similarly as the simulation tool. The simulation has shown that the overhead resulting by the wiring and communication between the system fragments could be fairly neglected, which makes it applicable to practically reach the maximum speed up offered by the proposed model.
Keywords: Binary Index Tree (BIT), Least Significant Bit (LSB), Parallel Adder (PA), Very High Speed Integrated Circuits HardwareDescription Language (VHDL), Distributed Parallel Computing System(DPCS).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1770605 Implementation of Parallel Interface for Microprocessor Trainer
Authors: Moe Moe Htun, Khin Htar Nwe
Abstract:
In this paper, parallel interface for microprocessor trainer was implemented. A programmable parallel–port device such as the IC 8255A is initialized for simple input or output and for handshake input or output by choosing kinds of modes. The hardware connections and the programs can be used to interface microprocessor trainer and a personal computer by using IC 8255A. The assembly programs edited on PC-s editor can be downloaded to the trainer.Keywords: Parallel I/O ports, parallel interface, trainer, two 8255 ICs.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3169604 Lattice Boltzmann Simulation of Binary Mixture Diffusion Using Modern Graphics Processors
Authors: Mohammad Amin Safi, Mahmud Ashrafizaadeh, Amir Ali Ashrafizaadeh
Abstract:
A highly optimized implementation of binary mixture diffusion with no initial bulk velocity on graphics processors is presented. The lattice Boltzmann model is employed for simulating the binary diffusion of oxygen and nitrogen into each other with different initial concentration distributions. Simulations have been performed using the latest proposed lattice Boltzmann model that satisfies both the indifferentiability principle and the H-theorem for multi-component gas mixtures. Contemporary numerical optimization techniques such as memory alignment and increasing the multiprocessor occupancy are exploited along with some novel optimization strategies to enhance the computational performance on graphics processors using the C for CUDA programming language. Speedup of more than two orders of magnitude over single-core processors is achieved on a variety of Graphical Processing Unit (GPU) devices ranging from conventional graphics cards to advanced, high-end GPUs, while the numerical results are in excellent agreement with the available analytical and numerical data in the literature.Keywords: Lattice Boltzmann model, Graphical processing unit, Binary mixture diffusion, 2D flow simulations, Optimized algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1556603 Performance Analysis of Parallel Client-Server Model Versus Parallel Mobile Agent Model
Authors: K. B. Manwade, G. A. Patil
Abstract:
Mobile agent has motivated the creation of a new methodology for parallel computing. We introduce a methodology for the creation of parallel applications on the network. The proposed Mobile-Agent parallel processing framework uses multiple Javamobile Agents. Each mobile agent can travel to the specified machine in the network to perform its tasks. We also introduce the concept of master agent, which is Java object capable of implementing a particular task of the target application. Master agent is dynamically assigns the task to mobile agents. We have developed and tested a prototype application: Mobile Agent Based Parallel Computing. Boosted by the inherited benefits of using Java and Mobile Agents, our proposed methodology breaks the barriers between the environments, and could potentially exploit in a parallel manner all the available computational resources on the network. This paper elaborates performance issues of a mobile agent for parallel computing.Keywords: Parallel Computing, Mobile Agent.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1653602 A Technique for Reachability Graph Generation for the Petri Net Models of Parallel Processes
Authors: Farooq Ahmad, Hejiao Huang, Xiaolong Wang
Abstract:
Reachability graph (RG) generation suffers from the problem of exponential space and time complexity. To alleviate the more critical problem of time complexity, this paper presents the new approach for RG generation for the Petri net (PN) models of parallel processes. Independent RGs for each parallel process in the PN structure are generated in parallel and cross-product of these RGs turns into the exhaustive state space from which the RG of given parallel system is determined. The complexity analysis of the presented algorithm illuminates significant decrease in the time complexity cost of RG generation. The proposed technique is applicable to parallel programs having multiple threads with the synchronization problem.Keywords: Parallel processes, Petri net, reachability graph, time complexity.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2014601 A Parallel Implementation of k-Means in MATLAB
Authors: Dimitris Varsamis, Christos Talagkozis, Alkiviadis Tsimpiris, Paris Mastorocostas
Abstract:
The aim of this work is the parallel implementation of k-means in MATLAB, in order to reduce the execution time. Specifically, a new function in MATLAB for serial k-means algorithm is developed, which meets all the requirements for the conversion to a function in MATLAB with parallel computations. Additionally, two different variants for the definition of initial values are presented. In the sequel, the parallel approach is presented. Finally, the performance tests for the computation times respect to the numbers of features and classes are illustrated.Keywords: K-means algorithm, clustering, parallel computations, MATLAB.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1157600 Sub-Image Detection Using Fast Neural Processors and Image Decomposition
Authors: Hazem M. El-Bakry, Qiangfu Zhao
Abstract:
In this paper, an approach to reduce the computation steps required by fast neural networksfor the searching process is presented. The principle ofdivide and conquer strategy is applied through imagedecomposition. Each image is divided into small in sizesub-images and then each one is tested separately usinga fast neural network. The operation of fast neuralnetworks based on applying cross correlation in thefrequency domain between the input image and theweights of the hidden neurons. Compared toconventional and fast neural networks, experimentalresults show that a speed up ratio is achieved whenapplying this technique to locate human facesautomatically in cluttered scenes. Furthermore, fasterface detection is obtained by using parallel processingtechniques to test the resulting sub-images at the sametime using the same number of fast neural networks. Incontrast to using only fast neural networks, the speed upratio is increased with the size of the input image whenusing fast neural networks and image decomposition.
Keywords: Fast Neural Networks, 2D-FFT, CrossCorrelation, Image decomposition, Parallel Processing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2179599 Harmonic Reduction In Three-Phase Parallel Connected Inverter
Authors: M.A.A. Younis, N. A. Rahim, S. Mekhilef
Abstract:
This paper presents the design and analysis of a parallel connected inverter configuration of. The configuration consists of parallel connected three-phase dc/ac inverter. Series resistors added to the inverter output to maintain same current in each inverter of the two parallel inverters, and to reduce the circulating current in the parallel inverters to the minimum. High frequency third harmonic injection PWM (THIPWM) employed to reduce the total harmonic distortion and to make maximum use of the voltage source. DSP was used to generate the THIPWM and the control algorithm for the converter. Selected experimental results have been shown to validate the proposed system.Keywords: Three-phase inverter, Third harmonic injection PWM, inverters parallel connection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3775598 Network Based High Performance Computing
Authors: Karanjeet Singh Kahlon, Gurvinder Singh, Arjan Singh
Abstract:
In the past few years there is a change in the view of high performance applications and parallel computing. Initially such applications were targeted towards dedicated parallel machines. Recently trend is changing towards building meta-applications composed of several modules that exploit heterogeneous platforms and employ hybrid forms of parallelism. The aim of this paper is to propose a model of virtual parallel computing. Virtual parallel computing system provides a flexible object oriented software framework that makes it easy for programmers to write various parallel applications.
Keywords: Applet, Efficiency, Java, LAN
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1905597 Some Results on Parallel Alternating Methods
Authors: Guangbin Wang, Fuping Tan
Abstract:
In this paper, we investigate two parallel alternating methods for solving the system of linear equations Ax = b and give convergence theorems for the parallel alternating methods when the coefficient matrix is a nonsingular H-matrix. Furthermore, we give one example to show our results.
Keywords: Nonsingular H-matrix, parallel alternating method, convergence.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1103