Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 6738

Search results for: parallel solution

6708 A Parallel Algorithm for Solving the PFSP on the Grid

Abstract:

Solving NP-hard combinatorial optimization problems by exact search methods, such as Branch-and-Bound, may degenerate to complete enumeration. For that reason, exact approaches limit us to solve only small or moderate size problem instances, due to the exponential increase in CPU time when problem size increases. One of the most promising ways to reduce significantly the computational burden of sequential versions of Branch-and-Bound is to design parallel versions of these algorithms which employ several processors. This paper describes a parallel Branch-and-Bound algorithm called GALB for solving the classical permutation flowshop scheduling problem as well as its implementation on a Grid computing infrastructure. The experimental study of our distributed parallel algorithm gives promising results and shows clearly the benefit of the parallel paradigm to solve large-scale instances in moderate CPU time.

Keywords: grid computing, permutation flow shop problem, branch and bound, load balancing

Procedia PDF Downloads 281

6707 The Parallelization of Algorithm Based on Partition Principle for Association Rules Discovery

Authors: Khadidja Belbachir, Hafida Belbachir

Abstract:

subsequently the expansion of the physical supports storage and the needs ceaseless to accumulate several data, the sequential algorithms of associations’ rules research proved to be ineffective. Thus the introduction of the new parallel versions is imperative. We propose in this paper, a parallel version of a sequential algorithm “Partition”. This last is fundamentally different from the other sequential algorithms, because it scans the data base only twice to generate the significant association rules. By consequence, the parallel approach does not require much communication between the sites. The proposed approach was implemented for an experimental study. The obtained results, shows a great reduction in execution time compared to the sequential version and Count Distributed algorithm.

Keywords: association rules, distributed data mining, partition, parallel algorithms

Procedia PDF Downloads 414

6706 Numerical Solution of Transient Natural Convection in Vertical Heated Rectangular Channel between Two Vertical Parallel MTR-Type Fuel Plates

Authors: Djalal Hamed

Abstract:

The aim of this paper is to perform, by mean of the finite volume method, a numerical solution of the transient natural convection in a narrow rectangular channel between two vertical parallel Material Testing Reactor (MTR)-type fuel plates, imposed under a heat flux with a cosine shape to determine the margin of the nuclear core power at which the natural convection cooling mode can ensure a safe core cooling, where the cladding temperature should not reach a specific safety limits (90 °C). For this purpose, a computer program is developed to determine the principal parameters related to the nuclear core safety, such as the temperature distribution in the fuel plate and in the coolant (light water) as a function of the reactor core power. Throughout the obtained results, we noticed that the core power should not reach 400 kW, to ensure a safe passive residual heat removing from the nuclear core by the upward natural convection cooling mode.

Keywords: buoyancy force, friction force, finite volume method, transient natural convection

Procedia PDF Downloads 194

6705 Study of Temperature Difference and Current Distribution in Parallel-Connected Cells at Low Temperature

Authors: Sara Kamalisiahroudi, Jun Huang, Zhe Li, Jianbo Zhang

Abstract:

Two types of commercial cylindrical lithium ion batteries (Panasonic 3.4 Ah NCR-18650B and Samsung 2.9 Ah INR-18650), were investigated experimentally. The capacities of these samples were individually measured using constant current-constant voltage (CC-CV) method at different ambient temperatures (-10 ℃, 0 ℃, 25 ℃). Their internal resistance was determined by electrochemical impedance spectroscopy (EIS) and pulse discharge methods. The cells with different configurations of parallel connection NCR-NCR, INR-INR and NCR-INR were charged/discharged at the aforementioned ambient temperatures. The results showed that the difference of internal resistance between cells much more evident at low temperatures. Furthermore, the parallel connection of NCR-NCR exhibits the most uniform temperature distribution in cells at -10 ℃, this feature is quite favorable for the safety of the battery pack.

Keywords: batteries in parallel connection, internal resistance, low temperature, temperature difference, current distribution

Procedia PDF Downloads 477

6704 On a Continuous Formulation of Block Method for Solving First Order Ordinary Differential Equations (ODEs)

Authors: A. M. Sagir

Abstract:

The aim of this paper is to investigate the performance of the developed linear multistep block method for solving first order initial value problem of Ordinary Differential Equations (ODEs). The method calculates the numerical solution at three points simultaneously and produces three new equally spaced solution values within a block. The continuous formulations enable us to differentiate and evaluate at some selected points to obtain three discrete schemes, which were used in block form for parallel or sequential solutions of the problems. A stability analysis and efficiency of the block method are tested on ordinary differential equations involving practical applications, and the results obtained compared favorably with the exact solution. Furthermore, comparison of error analysis has been developed with the help of computer software.

Keywords: block method, first order ordinary differential equations, linear multistep, self-starting

Procedia PDF Downloads 305

6703 4-DOFs Parallel Mechanism for Minimally Invasive Robotic Surgery

Authors: Khalil Ibrahim, Ahmed Ramadan, Mohamed Fanni, Yo Kobayashi, Ahmed Abo-Ismail, Masakatus G. Fujie

Abstract:

This paper deals with the design process and the dynamic control simulation of a new type of 4-DOFs parallel mechanism that can be used as an endoscopic surgical manipulator. The proposed mechanism, 2-PUU_2-PUS, is designed based on the screw theory and the parallel virtual chain type synthesis method. Based on the structure analysis of the 4-DOF parallel mechanism, the inverse position equation is studied using the inverse analysis theory of kinematics. The design and the stress analysis of the mechanism are investigated using SolidWorks software. The virtual prototype of the parallel mechanism is constructed, and the dynamic simulation is performed using ADAMS TM software. The system model utilizing PID and PI controllers has been built using MATLAB software. A more realistic simulation in accordance with a given bending angle and point to point control is implemented by the use of both ADAMS/MATLAB software. The simulation results showed that this control method has solved the coordinate control for the 4-DOF parallel manipulator so that each output is feedback to the four driving rods. From the results, the tracking performance is achieved. Other control techniques, such as intelligent ones, are recommended to improve the tracking performance and reduce the numerical truncation error.

Keywords: parallel mechanisms, medical robotics, tracjectory control, virtual chain type synthesis method

Procedia PDF Downloads 467

6702 Exploring MPI-Based Parallel Computing in Analyzing Very Large Sequences

Authors: Bilal Wajid, Erchin Serpedin

Abstract:

The health industry is aiming towards personalized medicine. If the patient’s genome needs to be sequenced it is important that the entire analysis be completed quickly. This paper explores use of parallel computing to analyze very large sequences. Two cases have been considered. In the first case, the sequence is kept constant and the effect of increasing the number of MPI-based processes is evaluated in terms of execution time, speed and efficiency. In the second case the number of MPI-based processes have been kept constant whereas, the length of the sequence was increased.

Keywords: parallel computing, alignment, genome assembly, alignment

Procedia PDF Downloads 271

6701 Security Over OFDM Fading Channels with Friendly Jammer

Authors: Munnujahan Ara

Abstract:

In this paper, we investigate the effect of friendly jamming power allocation strategies on the achievable average secrecy rate over a bank of parallel fading wiretap channels. We investigate the achievable average secrecy rate in parallel fading wiretap channels subject to Rayleigh and Rician fading. The achievable average secrecy rate, due to the presence of a line-of-sight component in the jammer channel is also evaluated. Moreover, we study the detrimental effect of correlation across the parallel sub-channels, and evaluate the corresponding decrease in the achievable average secrecy rate for the various fading configurations. We also investigate the tradeoff between the transmission power and the jamming power for a fixed total power budget. Our results, which are applicable to current orthogonal frequency division multiplexing (OFDM) communications systems, shed further light on the achievable average secrecy rates over a bank of parallel fading channels in the presence of friendly jammers.

Keywords: fading parallel channels, wire-tap channel, OFDM, secrecy capacity, power allocation

Procedia PDF Downloads 501

6700 Numerical Studies for Standard Bi-Conjugate Gradient Stabilized Method and the Parallel Variants for Solving Linear Equations

Authors: Kuniyoshi Abe

Abstract:

Bi-conjugate gradient (Bi-CG) is a well-known method for solving linear equations Ax = b, for x, where A is a given n-by-n matrix, and b is a given n-vector. Typically, the dimension of the linear equation is high and the matrix is sparse. A number of hybrid Bi-CG methods such as conjugate gradient squared (CGS), Bi-CG stabilized (Bi-CGSTAB), BiCGStab2, and BiCGstab(l) have been developed to improve the convergence of Bi-CG. Bi-CGSTAB has been most often used for efficiently solving the linear equation, but we have seen the convergence behavior with a long stagnation phase. In such cases, it is important to have Bi-CG coefficients that are as accurate as possible, and the stabilization strategy, which stabilizes the computation of the Bi-CG coefficients, has been proposed. It may avoid stagnation and lead to faster computation. Motivated by a large number of processors in present petascale high-performance computing hardware, the scalability of Krylov subspace methods on parallel computers has recently become increasingly prominent. The main bottleneck for efficient parallelization is the inner products which require a global reduction. The resulting global synchronization phases cause communication overhead on parallel computers. The parallel variants of Krylov subspace methods reducing the number of global communication phases and hiding the communication latency have been proposed. However, the numerical stability, specifically, the convergence speed of the parallel variants of Bi-CGSTAB may become worse than that of the standard Bi-CGSTAB. In this paper, therefore, we compare the convergence speed between the standard Bi-CGSTAB and the parallel variants by numerical experiments and show that the convergence speed of the standard Bi-CGSTAB is faster than the parallel variants. Moreover, we propose the stabilization strategy for the parallel variants.

Keywords: bi-conjugate gradient stabilized method, convergence speed, Krylov subspace methods, linear equations, parallel variant

Procedia PDF Downloads 160

6699 Constructing the Density of States from the Parallel Wang Landau Algorithm Overlapping Data

Authors: Arman S. Kussainov, Altynbek K. Beisekov

Abstract:

This work focuses on building an efficient universal procedure to construct a single density of states from the multiple pieces of data provided by the parallel implementation of the Wang Landau Monte Carlo based algorithm. The Ising and Pott models were used as the examples of the two-dimensional spin lattices to construct their densities of states. Sampled energy space was distributed between the individual walkers with certain overlaps. This was made to include the latest development of the algorithm as the density of states replica exchange technique. Several factors of immediate importance for the seamless stitching process have being considered. These include but not limited to the speed and universality of the initial parallel algorithm implementation as well as the data post-processing to produce the expected smooth density of states.

Keywords: density of states, Monte Carlo, parallel algorithm, Wang Landau algorithm

Procedia PDF Downloads 410

6698 GPU-Accelerated Triangle Mesh Simplification Using Parallel Vertex Removal

Authors: Thomas Odaker, Dieter Kranzlmueller, Jens Volkert

Abstract:

We present an approach to triangle mesh simplification designed to be executed on the GPU. We use a quadric error metric to calculate an error value for each vertex of the mesh and order all vertices based on this value. This step is followed by the parallel removal of a number of vertices with the lowest calculated error values. To allow for the parallel removal of multiple vertices we use a set of per-vertex boundaries that prevent mesh foldovers even when simplification operations are performed on neighbouring vertices. We execute multiple iterations of the calculation of the vertex errors, ordering of the error values and removal of vertices until either a desired number of vertices remains in the mesh or a minimum error value is reached. This parallel approach is used to speed up the simplification process while maintaining mesh topology and avoiding foldovers at every step of the simplification.

Keywords: computer graphics, half edge collapse, mesh simplification, precomputed simplification, topology preserving

Procedia PDF Downloads 365

6697 Parallel Asynchronous Multi-Splitting Methods for Differential Algebraic Systems

Authors: Malika Elkyal

Abstract:

We consider an iterative parallel multi-splitting method for differential algebraic equations. The main feature of the proposed idea is to use the asynchronous form. We prove that the multi-splitting technique can effectively accelerate the convergent performance of the iterative process. The main characteristic of an asynchronous mode is that the local algorithm does not have to wait at predetermined messages to become available. We allow some processors to communicate more frequently than others, and we allow the communication delays to be substantial and unpredictable. Accordingly, we note that synchronous algorithms in the computer science sense are particular cases of our formulation of asynchronous one.

Keywords: parallel methods, asynchronous mode, multisplitting, differential algebraic equations

Procedia PDF Downloads 556

6696 Natural Convection between Two Parallel Wavy Plates

Authors: Si Abdallah Mayouf

Abstract:

In this work, the effects of the wavy surface on free convection heat transfer boundary layer flow between two parallel wavy plates have been studied numerically. The two plates are considered at a constant temperature. The equations and the boundary conditions are discretized by the finite difference scheme and solved numerically using the Gauss-Seidel algorithm. The important parameters in this problem are the amplitude of the wavy surfaces and the distance between the two wavy plates. Results are presented as velocity profiles, temperature profiles and local Nusselt number according to the important parameters.

Keywords: free convection, wavy surface, parallel plates, fluid dynamics

Procedia PDF Downloads 306

6695 Parallelizing the Hybrid Pseudo-Spectral Time Domain/Finite Difference Time Domain Algorithms for the Large-Scale Electromagnetic Simulations Using Massage Passing Interface Library

Authors: Donggun Lee, Q-Han Park

Abstract:

Due to its coarse grid, the Pseudo-Spectral Time Domain (PSTD) method has advantages against the Finite Difference Time Domain (FDTD) method in terms of memory requirement and operation time. However, since the efficiency of parallelization is much lower than that of FDTD, PSTD is not a useful method for a large-scale electromagnetic simulation in a parallel platform. In this paper, we propose the parallelization technique of the hybrid PSTD-FDTD (HPF) method which simultaneously possesses the efficient parallelizability of FDTD and the quick speed and low memory requirement of PSTD. Parallelization cost of the HPF method is exactly the same as the parallel FDTD, but still, it occupies much less memory space and has faster operation speed than the parallel FDTD. Experiments in distributed memory systems have shown that the parallel HPF method saves up to 96% of the operation time and reduces 84% of the memory requirement. Also, by combining the OpenMP library to the MPI library, we further reduced the operation time of the parallel HPF method by 50%.

Keywords: FDTD, hybrid, MPI, OpenMP, PSTD, parallelization

Procedia PDF Downloads 146

6694 Detecting the Edge of Multiple Images in Parallel

Authors: Prakash K. Aithal, U. Dinesh Acharya, Rajesh Gopakumar

Abstract:

Edge is variation of brightness in an image. Edge detection is useful in many application areas such as finding forests, rivers from a satellite image, detecting broken bone in a medical image etc. The paper discusses about finding edge of multiple aerial images in parallel .The proposed work tested on 38 images 37 colored and one monochrome image. The time taken to process N images in parallel is equivalent to time taken to process 1 image in sequential. The proposed method achieves pixel level parallelism as well as image level parallelism.

Keywords: edge detection, multicore, gpu, opencl, mpi

Procedia PDF Downloads 476

6693 A Practical Protection Method for Parallel Transmission-Lines Based on the Fault Travelling-Waves

Authors: Mohammad Reza Ebrahimi

Abstract:

In new restructured power systems, swift fault detection is very important. The parallel transmission-lines are vastly used in this kind of power systems because of high amount of energy transferring. In this paper, a method based on the comparison of two schemes, i.e., i) maximum magnitude of travelling-wave (TW) energy ii) the instants of maximum energy occurrence at the circuits of parallel transmission-line is proposed. Using the travelling-wave of fault in order to faulted line identification this method has noticeable operation time. Moreover, the algorithm can cover for identification of faults as external or internal faults. For an internal fault, the exact location of the fault can be estimated confidently. A lot of simulations have been done with PSCAD/EMTDC to verify the performance of the proposed algorithm.

Keywords: travelling-wave, maximum energy, parallel transmission-line, fault location

Procedia PDF Downloads 184

6692 Optimizing Parallel Computing Systems: A Java-Based Approach to Modeling and Performance Analysis

Authors: Maher Ali Rusho, Sudipta Halder

Abstract:

The purpose of the study is to develop optimal solutions for models of parallel computing systems using the Java language. During the study, programmes were written for the examined models of parallel computing systems. The result of the parallel sorting code is the output of a sorted array of random numbers. When processing data in parallel, the time spent on processing and the first elements of the list of squared numbers are displayed. When processing requests asynchronously, processing completion messages are displayed for each task with a slight delay. The main results include the development of optimisation methods for algorithms and processes, such as the division of tasks into subtasks, the use of non-blocking algorithms, effective memory management, and load balancing, as well as the construction of diagrams and comparison of these methods by characteristics, including descriptions, implementation examples, and advantages. In addition, various specialised libraries were analysed to improve the performance and scalability of the models. The results of the work performed showed a substantial improvement in response time, bandwidth, and resource efficiency in parallel computing systems. Scalability and load analysis assessments were conducted, demonstrating how the system responds to an increase in data volume or the number of threads. Profiling tools were used to analyse performance in detail and identify bottlenecks in models, which improved the architecture and implementation of parallel computing systems. The obtained results emphasise the importance of choosing the right methods and tools for optimising parallel computing systems, which can substantially improve their performance and efficiency.

Keywords: algorithm optimisation, memory management, load balancing, performance profiling, asynchronous programming.

Procedia PDF Downloads 10

6691 An Improved Parallel Algorithm of Decision Tree

Authors: Jiameng Wang, Yunfei Yin, Xiyu Deng

Abstract:

Parallel optimization is one of the important research topics of data mining at this stage. Taking Classification and Regression Tree (CART) parallelization as an example, this paper proposes a parallel data mining algorithm based on SSP-OGini-PCCP. Aiming at the problem of choosing the best CART segmentation point, this paper designs an S-SP model without data association; and in order to calculate the Gini index efficiently, a parallel OGini calculation method is designed. In addition, in order to improve the efficiency of the pruning algorithm, a synchronous PCCP pruning strategy is proposed in this paper. In this paper, the optimal segmentation calculation, Gini index calculation, and pruning algorithm are studied in depth. These are important components of parallel data mining. By constructing a distributed cluster simulation system based on SPARK, data mining methods based on SSP-OGini-PCCP are tested. Experimental results show that this method can increase the search efficiency of the best segmentation point by an average of 89%, increase the search efficiency of the Gini segmentation index by 3853%, and increase the pruning efficiency by 146% on average; and as the size of the data set increases, the performance of the algorithm remains stable, which meets the requirements of contemporary massive data processing.

Keywords: classification, Gini index, parallel data mining, pruning ahead

Procedia PDF Downloads 121

6690 Identification of Vehicle Dynamic Parameters by Using Optimized Exciting Trajectory on 3- DOF Parallel Manipulator

Authors: Di Yao, Gunther Prokop, Kay Buttner

Abstract:

Dynamic parameters, including the center of gravity, mass and inertia moments of vehicle, play an essential role in vehicle simulation, collision test and real-time control of vehicle active systems. To identify the important vehicle dynamic parameters, a systematic parameter identification procedure is studied in this work. In the first step of the procedure, a conceptual parallel manipulator (virtual test rig), which possesses three rotational degrees-of-freedom, is firstly proposed. To realize kinematic characteristics of the conceptual parallel manipulator, the kinematic analysis consists of inverse kinematic and singularity architecture is carried out. Based on the Euler's rotation equations for rigid body dynamics, the dynamic model of parallel manipulator and derivation of measurement matrix for parameter identification are presented subsequently. In order to reduce the sensitivity of parameter identification to measurement noise and other unexpected disturbances, a parameter optimization process of searching for optimal exciting trajectory of parallel manipulator is conducted in the following section. For this purpose, the 321-Euler-angles defined by parameterized finite-Fourier-series are primarily used to describe the general exciting trajectory of parallel manipulator. To minimize the condition number of measurement matrix for achieving better parameter identification accuracy, the unknown coefficients of parameterized finite-Fourier-series are estimated by employing an iterative algorithm based on MATLAB®. Meanwhile, the iterative algorithm will ensure the parallel manipulator still keeps in an achievable working status during the execution of optimal exciting trajectory. It is showed that the proposed procedure and methods in this work can effectively identify the vehicle dynamic parameters and could be an important application of parallel manipulator in the fields of parameter identification and test rig development.

Keywords: parameter identification, parallel manipulator, singularity architecture, dynamic modelling, exciting trajectory

Procedia PDF Downloads 264

6689 Analyzing the Factors that Cause Parallel Performance Degradation in Parallel Graph-Based Computations Using Graph500

Authors: Mustafa Elfituri, Jonathan Cook

Abstract:

Recently, graph-based computations have become more important in large-scale scientific computing as they can provide a methodology to model many types of relations between independent objects. They are being actively used in fields as varied as biology, social networks, cybersecurity, and computer networks. At the same time, graph problems have some properties such as irregularity and poor locality that make their performance different than regular applications performance. Therefore, parallelizing graph algorithms is a hard and challenging task. Initial evidence is that standard computer architectures do not perform very well on graph algorithms. Little is known exactly what causes this. The Graph500 benchmark is a representative application for parallel graph-based computations, which have highly irregular data access and are driven more by traversing connected data than by computation. In this paper, we present results from analyzing the performance of various example implementations of Graph500, including a shared memory (OpenMP) version, a distributed (MPI) version, and a hybrid version. We measured and analyzed all the factors that affect its performance in order to identify possible changes that would improve its performance. Results are discussed in relation to what factors contribute to performance degradation.

Keywords: graph computation, graph500 benchmark, parallel architectures, parallel programming, workload characterization.

Procedia PDF Downloads 146

6688 Parallel Vector Processing Using Multi Level Orbital DATA

Authors: Nagi Mekhiel

Abstract:

Many applications use vector operations by applying single instruction to multiple data that map to different locations in conventional memory. Transferring data from memory is limited by access latency and bandwidth affecting the performance gain of vector processing. We present a memory system that makes all of its content available to processors in time so that processors need not to access the memory, we force each location to be available to all processors at a specific time. The data move in different orbits to become available to other processors in higher orbits at different time. We use this memory to apply parallel vector operations to data streams at first orbit level. Data processed in the first level move to upper orbit one data element at a time, allowing a processor in that orbit to apply another vector operation to deal with serial code limitations inherited in all parallel applications and interleaved it with lower level vector operations.

Keywords: Memory Organization, Parallel Processors, Serial Code, Vector Processing

Procedia PDF Downloads 268

6687 Development of Transmission and Packaging for Parallel Hybrid Light Commercial Vehicle

Authors: Vivek Thorat, Suhasini Desai

Abstract:

The hybrid electric vehicle is widely accepted as a promising short to mid-term technical solution due to noticeably improved efficiency and low emissions at competitive costs. Retro fitment of hybrid components into a conventional vehicle for achieving better performance is the best solution so far. But retro fitment includes major modifications into a conventional vehicle with a high cost. This paper focuses on the development of a P3x hybrid prototype with rear wheel drive parallel hybrid electric Light Commercial Vehicle (LCV) with minimum and low-cost modifications. This diesel Hybrid LCV is different from another hybrid with regard to the powertrain. The additional powertrain consists of continuous contact helical gear pair followed by chain and sprocket as a coupler for traction motor. Vehicle powertrain which is designed for the intended high-speed application. This work focuses on targeting of design, development, and packaging of this unique parallel diesel-electric vehicle which is based on multimode hybrid advantages. To demonstrate the practical applicability of this transmission with P3x hybrid configuration, one concept prototype vehicle has been build integrating the transmission. The hybrid system makes it easy to retrofit existing vehicle because the changes required into the vehicle chassis are a minimum. The additional system is designed for mainly five modes of operations which are engine only mode, electric-only mode, hybrid power mode, engine charging battery mode and regenerative braking mode. Its driving performance, fuel economy and emissions are measured and results are analyzed over a given drive cycle. Finally, the output results which are achieved by the first vehicle prototype during experimental testing is carried out on a chassis dynamometer using MIDC driving cycle. The results showed that the prototype hybrid vehicle is about 27% faster than the equivalent conventional vehicle. The fuel economy is increased by 20-25% approximately compared to the conventional powertrain.

Keywords: P3x configuration, LCV, hybrid electric vehicle, ROMAX, transmission

Procedia PDF Downloads 253

6686 Large-Scale Simulations of Turbulence Using Discontinuous Spectral Element Method

Authors: A. Peyvan, D. Li, J. Komperda, F. Mashayek

Abstract:

Turbulence can be observed in a variety fluid motions in nature and industrial applications. Recent investment in high-speed aircraft and propulsion systems has revitalized fundamental research on turbulent flows. In these systems, capturing chaotic fluid structures with different length and time scales is accomplished through the Direct Numerical Simulation (DNS) approach since it accurately simulates flows down to smallest dissipative scales, i.e., Kolmogorov’s scales. The discontinuous spectral element method (DSEM) is a high-order technique that uses spectral functions for approximating the solution. The DSEM code has been developed by our research group over the course of more than two decades. Recently, the code has been improved to run large cases in the order of billions of solution points. Running big simulations requires a considerable amount of RAM. Therefore, the DSEM code must be highly parallelized and able to start on multiple computational nodes on an HPC cluster with distributed memory. However, some pre-processing procedures, such as determining global element information, creating a global face list, and assigning global partitioning and element connection information of the domain for communication, must be done sequentially with a single processing core. A separate code has been written to perform the pre-processing procedures on a local machine. It stores the minimum amount of information that is required for the DSEM code to start in parallel, extracted from the mesh file, into text files (pre-files). It packs integer type information with a Stream Binary format in pre-files that are portable between machines. The files are generated to ensure fast read performance on different file-systems, such as Lustre and General Parallel File System (GPFS). A new subroutine has been added to the DSEM code to read the startup files using parallel MPI I/O, for Lustre, in a way that each MPI rank acquires its information from the file in parallel. In case of GPFS, in each computational node, a single MPI rank reads data from the file, which is specifically generated for the computational node, and send them to other ranks on the node using point to point non-blocking MPI communication. This way, communication takes place locally on each node and signals do not cross the switches of the cluster. The read subroutine has been tested on Argonne National Laboratory’s Mira (GPFS), National Center for Supercomputing Application’s Blue Waters (Lustre), San Diego Supercomputer Center’s Comet (Lustre), and UIC’s Extreme (Lustre). The tests showed that one file per node is suited for GPFS and parallel MPI I/O is the best choice for Lustre file system. The DSEM code relies on heavily optimized linear algebra operation such as matrix-matrix and matrix-vector products for calculation of the solution in every time-step. For this, the code can either make use of its matrix math library, BLAS, Intel MKL, or ATLAS. This fact and the discontinuous nature of the method makes the DSEM code run efficiently in parallel. The results of weak scaling tests performed on Blue Waters showed a scalable and efficient performance of the code in parallel computing.

Keywords: computational fluid dynamics, direct numerical simulation, spectral element, turbulent flow

Procedia PDF Downloads 132

6685 A Parallel Computation Based on GPU Programming for a 3D Compressible Fluid Flow Simulation

Authors: Sugeng Rianto, P.W. Arinto Yudi, Soemarno Muhammad Nurhuda

Abstract:

A computation of a 3D compressible fluid flow for virtual environment with haptic interaction can be a non-trivial issue. This is especially how to reach good performances and balancing between visualization, tactile feedback interaction, and computations. In this paper, we describe our approach of computation methods based on parallel programming on a GPU. The 3D fluid flow solvers have been developed for smoke dispersion simulation by using combinations of the cubic interpolated propagation (CIP) based fluid flow solvers and the advantages of the parallelism and programmability of the GPU. The fluid flow solver is generated in the GPU-CPU message passing scheme to get rapid development of haptic feedback modes for fluid dynamic data. A rapid solution in fluid flow solvers is developed by applying cubic interpolated propagation (CIP) fluid flow solvers. From this scheme, multiphase fluid flow equations can be solved simultaneously. To get more acceleration in the computation, the Navier-Stoke Equations (NSEs) is packed into channels of texel, where computation models are performed on pixels that can be considered to be a grid of cells. Therefore, despite of the complexity of the obstacle geometry, processing on multiple vertices and pixels can be done simultaneously in parallel. The data are also shared in global memory for CPU to control the haptic in providing kinaesthetic interaction and felling. The results show that GPU based parallel computation approaches provide effective simulation of compressible fluid flow model for real-time interaction in 3D computer graphic for PC platform. This report has shown the feasibility of a new approach of solving the compressible fluid flow equations on the GPU. The experimental tests proved that the compressible fluid flowing on various obstacles with haptic interactions on the few model obstacles can be effectively and efficiently simulated on the reasonable frame rate with a realistic visualization. These results confirm that good performances and balancing between visualization, tactile feedback interaction, and computations can be applied successfully.

Keywords: CIP, compressible fluid, GPU programming, parallel computation, real-time visualisation

Procedia PDF Downloads 431

6684 Design and Implementation of Security Middleware for Data Warehouse Signature, Framework

Authors: Mayada Al Meghari

Abstract:

Recently, grid middlewares have provided large integrated use of network resources as the shared data and the CPU to become a virtual supercomputer. In this work, we present the design and implementation of the middleware for Data Warehouse Signature, DWS Framework. The aim of using the middleware in our DWS framework is to achieve the high performance by the parallel computing. This middleware is developed on Alchemi.Net framework to increase the security among the network nodes through the authentication and group-key distribution model. This model achieves the key security and prevents any intermediate attacks in the middleware. This paper presents the flow process structures of the middleware design. In addition, the paper ensures the implementation of security for DWS middleware enhancement with the authentication and group-key distribution model. Finally, from the analysis of other middleware approaches, the developed middleware of DWS framework is the optimal solution of a complete covering of security issues.

Keywords: middleware, parallel computing, data warehouse, security, group-key, high performance

Procedia PDF Downloads 118

6683 Conditions for Fault Recovery of Interconnected Asynchronous Sequential Machines with State Feedback

Authors: Jung–Min Yang

Abstract:

In this paper, fault recovery for parallel interconnected asynchronous sequential machines is studied. An adversarial input can infiltrate into one of two submachines comprising parallel composition of the considered asynchronous sequential machine, causing an unauthorized state transition. The control objective is to elucidate the condition for the existence of a corrective controller that makes the closed-loop system immune against any occurrence of adversarial inputs. In particular, an efficient existence condition is presented that does not need the complete modeling of the interconnected asynchronous sequential machine.

Keywords: asynchronous sequential machines, parallel composi-tion, corrective control, fault tolerance

Procedia PDF Downloads 229

6682 Islamic Financial Instrument, Standard Parallel Salam as an Alternative to Conventional Derivatives

Authors: Alireza Naserpoor

Abstract:

Derivatives are the most important innovation which has happened in the past decades. When it comes to financial markets, it has changed the whole way of operations of stock, commodities and currency market. Beside a lot of advantages, Conventional derivatives contracts have some disadvantages too. Some problems have been caused by derivatives contain raising Volatility, increasing Bankruptcies and causing financial crises. Standard Parallel Salam contract as an Islamic financial product meanwhile is a financing instrument can be used for risk management by investors. Standard Parallel Salam is a Shari’ah-Compliant contract. Furthermore, it is an alternative to conventional derivatives. Despite the fact that the unstructured types of that, has been used in several Islamic countries, This contract as a structured and standard financial instrument introduced in Iran Mercantile Exchange in 2014. In this paper after introducing parallel Salam, we intend to examine a collection of international experience and local measure regarding launching standard parallel Salam contract and proceed to describe standard scenarios for trading this instrument and practical experience in Iran Mercantile Exchange about this instrument. Afterwards, we make a comparison between SPS and Futures contracts as a conventional derivative. Standard parallel salam contract as an Islamic financial product, can be used for risk management by investors. SPS is a Shariah-Compliant contract. Furthermore it is an alternative to conventional derivatives. This contract as a structured and standard financial instrument introduced in Iran Mercantile Exchange in 2014. despite the fact that the unstructured types of that, has been used in several Islamic countries. In this article after introducing parallel salam, we intend to examine a collection of international experience and local measure regarding launching standard parallel salam contract and proceed to describe standard scenarios for trading this instrument containing two main approaches in SPS using, And practical experience in IME about this instrument Afterwards, a comparison between SPS and Futures contracts as a conventional derivatives.

Keywords: futures contracts, hedging, shari’ah compliant instruments, standard parallel salam

Procedia PDF Downloads 389

6681 Parallel Magnetic Field Effect on Copper Cementation onto Rotating Iron Rod

Authors: Hamouda M. Mousa, M. Obaid, Chan Hee Park, Cheol Sang Kim

Abstract:

The rate of copper cementation on iron rod was investigated. The study was mainly dedicated to illustrate the effect of application of electromagnetic field (EMF) on the rate of cementation. The magnetic flux was placed parallel to the iron rod and different magnetic field strength was studied. The results showed that without EMF, the rate of mass transfer was correlated by the equation: Sh= 1.36 Re0. 098 Sc0.33. The application of EMF enhanced the time required to reach high percentage copper cementation by 50%. The rate of mass transfer was correlated by the equation: Sh= 2.29 Re0. 95 Sc0.33, with applying EMF. This work illustrates that the enhancement of copper recovery in presence of EMF is due to the induced motion of Fe+n in the solution which is limited in the range of rod rotation speed of 300~900 rpm. The calculation of power consumption of EMF showed that although the application of EMF partially reduced the cementation time, the reduction of power consumption due to utilization of magnetic field is comparable to the increase in power consumed by introducing magnetic field of 2462 A T/m.

Keywords: copper cementation, electromagnetic field, copper ions, iron cylinder

Procedia PDF Downloads 489

6680 Software Transactional Memory in a Dynamic Programming Language at Virtual Machine Level

Authors: Szu-Kai Hsu, Po-Ching Lin

Abstract:

As more and more multi-core processors emerge, traditional sequential programming paradigm no longer suffice. Yet only few modern dynamic programming languages can leverage such advantage. Ruby, for example, despite its wide adoption, only includes threads as a simple parallel primitive. The global virtual machine lock of official Ruby runtime makes it impossible to exploit full parallelism. Though various alternative Ruby implementations do eliminate the global virtual machine lock, they only provide developers dated locking mechanism for data synchronization. However, traditional locking mechanism error-prone by nature. Software Transactional Memory is one of the promising alternatives among others. This paper introduces a new virtual machine: GobiesVM to provide a native software transactional memory based solution for dynamic programming languages to exploit parallelism. We also proposed a simplified variation of Transactional Locking II algorithm. The empirical results of our experiments show that support of STM at virtual machine level enables developers to write straightforward code without compromising parallelism or sacrificing thread safety. Existing source code only requires minimal or even none modi cation, which allows developers to easily switch their legacy codebase to a parallel environment. The performance evaluations of GobiesVM also indicate the difference between sequential and parallel execution is significant.

Keywords: global interpreter lock, ruby, software transactional memory, virtual machine

Procedia PDF Downloads 283

6679 Performance Evaluation of Task Scheduling Algorithm on LCQ Network

Authors: Zaki Ahmad Khan, Jamshed Siddiqui, Abdus Samad

Abstract:

The Scheduling and mapping of tasks on a set of processors is considered as a critical problem in parallel and distributed computing system. This paper deals with the problem of dynamic scheduling on a special type of multiprocessor architecture known as Linear Crossed Cube (LCQ) network. This proposed multiprocessor is a hybrid network which combines the features of both linear type of architectures as well as cube based architectures. Two standard dynamic scheduling schemes namely Minimum Distance Scheduling (MDS) and Two Round Scheduling (TRS) schemes are implemented on the LCQ network. Parallel tasks are mapped and the imbalance of load is evaluated on different set of processors in LCQ network. The simulations results are evaluated and effort is made by means of through analysis of the results to obtain the best solution for the given network in term of load imbalance left and execution time. The other performance matrices like speedup and efficiency are also evaluated with the given dynamic algorithms.

Keywords: dynamic algorithm, load imbalance, mapping, task scheduling

Procedia PDF Downloads 447