Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32

Search results for: parallelization

32 OpenMP Parallelization of Three-Dimensional Magnetohydrodynamic Code FOI-PERFECT

Authors: Jiao F. Huang, Shi Chen, Shu C. Duan, Gang H. Wang

Abstract:

Due to its complex spatial structure as well as dynamic temporal evolution, an analytic solution of an X-pinch process is out of question, and numerical simulation becomes an important tool in X-pinch studies. Intrinsically, simulations of X-pinch are three-dimensional (3D) because of the specific structure of its load. Furthermore, in order to resolve both its μm-scales and ns-durations, fine spatial mesh grid and short time steps are usually adopted. The resulting large computational scales make the parallelization of codes a vital problem to be solved if any practical simulations are to be carried out. In this work, we report OpenMP parallelization of our 3D magnetohydrodynamic (MHD) code FOI-PERFECT. Results of test runs confirm that computational efficiency has been improved after parallelization, and both the sequential and parallel versions give the same physical results under the same initial conditions.

Keywords: MHD simulation, OpenMP, parallelization, X-pinch

Procedia PDF Downloads 311

31 Problems of Boolean Reasoning Based Biclustering Parallelization

Authors: Marcin Michalak

Abstract:

Biclustering is the way of two-dimensional data analysis. For several years it became possible to express such issue in terms of Boolean reasoning, for processing continuous, discrete and binary data. The mathematical backgrounds of such approach — proved ability of induction of exact and inclusion–maximal biclusters fulfilling assumed criteria — are strong advantages of the method. Unfortunately, the core of the method has quite high computational complexity. In the paper the basics of Boolean reasoning approach for biclustering are presented. In such context the problems of computation parallelization are risen.

Keywords: Boolean reasoning, biclustering, parallelization, prime implicant

Procedia PDF Downloads 89

30 Kernel Parallelization Equation for Identifying Structures under Unknown and Periodic Loads

Authors: Seyed Sadegh Naseralavi

Abstract:

This paper presents a Kernel parallelization equation for damage identification in structures under unknown periodic excitations. Herein, the dynamic differential equation of the motion of structure is viewed as a mapping from displacements to external forces. Utilizing this viewpoint, a new method for damage detection in structures under periodic loads is presented. The developed method requires only two periods of load. The method detects the damages without finding the input loads. The method is based on the fact that structural displacements under free and forced vibrations are associated with two parallel subspaces in the displacement space. Considering the concept, kernel parallelization equation (KPE) is derived for damage detection under unknown periodic loads. The method is verified for a case study under periodic loads.

Keywords: Kernel, unknown periodic load, damage detection, Kernel parallelization equation

Procedia PDF Downloads 251

29 Parallelizing the Hybrid Pseudo-Spectral Time Domain/Finite Difference Time Domain Algorithms for the Large-Scale Electromagnetic Simulations Using Massage Passing Interface Library

Authors: Donggun Lee, Q-Han Park

Abstract:

Due to its coarse grid, the Pseudo-Spectral Time Domain (PSTD) method has advantages against the Finite Difference Time Domain (FDTD) method in terms of memory requirement and operation time. However, since the efficiency of parallelization is much lower than that of FDTD, PSTD is not a useful method for a large-scale electromagnetic simulation in a parallel platform. In this paper, we propose the parallelization technique of the hybrid PSTD-FDTD (HPF) method which simultaneously possesses the efficient parallelizability of FDTD and the quick speed and low memory requirement of PSTD. Parallelization cost of the HPF method is exactly the same as the parallel FDTD, but still, it occupies much less memory space and has faster operation speed than the parallel FDTD. Experiments in distributed memory systems have shown that the parallel HPF method saves up to 96% of the operation time and reduces 84% of the memory requirement. Also, by combining the OpenMP library to the MPI library, we further reduced the operation time of the parallel HPF method by 50%.

Keywords: FDTD, hybrid, MPI, OpenMP, PSTD, parallelization

Procedia PDF Downloads 109

28 On Block Vandermonde Matrix Constructed from Matrix Polynomial Solvents

Authors: Malika Yaici, Kamel Hariche

Abstract:

In control engineering, systems described by matrix fractions are studied through properties of block roots, also called solvents. These solvents are usually dealt with in a block Vandermonde matrix form. Inverses and determinants of Vandermonde matrices and block Vandermonde matrices are used in solving problems of numerical analysis in many domains but require costly computations. Even though Vandermonde matrices are well known and method to compute inverse and determinants are many and, generally, based on interpolation techniques, methods to compute the inverse and determinant of a block Vandermonde matrix have not been well studied. In this paper, some properties of these matrices and iterative algorithms to compute the determinant and the inverse of a block Vandermonde matrix are given. These methods are deducted from the partitioned matrix inversion and determinant computing methods. Due to their great size, parallelization may be a solution to reduce the computations cost, so a parallelization of these algorithms is proposed and validated by a comparison using algorithmic complexity.

Keywords: block vandermonde matrix, solvents, matrix polynomial, matrix inverse, matrix determinant, parallelization

Procedia PDF Downloads 196

27 Flowing Online Vehicle GPS Data Clustering Using a New Parallel K-Means Algorithm

Authors: Orhun Vural, Oguz Bayat, Rustu Akay, Osman N. Ucan

Abstract:

This study presents a new parallel approach clustering of GPS data. Evaluation has been made by comparing execution time of various clustering algorithms on GPS data. This paper aims to propose a parallel based on neighborhood K-means algorithm to make it faster. The proposed parallelization approach assumes that each GPS data represents a vehicle and to communicate between vehicles close to each other after vehicles are clustered. This parallelization approach has been examined on different sized continuously changing GPS data and compared with serial K-means algorithm and other serial clustering algorithms. The results demonstrated that proposed parallel K-means algorithm has been shown to work much faster than other clustering algorithms.

Keywords: parallel k-means algorithm, parallel clustering, clustering algorithms, clustering on flowing data

Procedia PDF Downloads 186

26 Performance Evaluation of Parallel Surface Modeling and Generation on Actual and Virtual Multicore Systems

Authors: Nyeng P. Gyang

Abstract:

Even though past, current and future trends suggest that multicore and cloud computing systems are increasingly prevalent/ubiquitous, this class of parallel systems is nonetheless underutilized, in general, and barely used for research on employing parallel Delaunay triangulation for parallel surface modeling and generation, in particular. The performances, of actual/physical and virtual/cloud multicore systems/machines, at executing various algorithms, which implement various parallelization strategies of the incremental insertion technique of the Delaunay triangulation algorithm, were evaluated. T-tests were run on the data collected, in order to determine whether various performance metrics differences (including execution time, speedup and efficiency) were statistically significant. Results show that the actual machine is approximately twice faster than the virtual machine at executing the same programs for the various parallelization strategies. Results, which furnish the scalability behaviors of the various parallelization strategies, also show that some of the differences between the performances of these systems, during different runs of the algorithms on the systems, were statistically significant. A few pseudo superlinear speedup results, which were computed from the raw data collected, are not true superlinear speedup values. These pseudo superlinear speedup values, which arise as a result of one way of computing speedups, disappear and give way to asymmetric speedups, which are the accurate kind of speedups that occur in the experiments performed.

Keywords: cloud computing systems, multicore systems, parallel Delaunay triangulation, parallel surface modeling and generation

Procedia PDF Downloads 176

25 A Parallel Implementation of Artificial Bee Colony Algorithm within CUDA Architecture

Authors: Selcuk Aslan, Dervis Karaboga, Celal Ozturk

Abstract:

Artificial Bee Colony (ABC) algorithm is one of the most successful swarm intelligence based metaheuristics. It has been applied to a number of constrained or unconstrained numerical and combinatorial optimization problems. In this paper, we presented a parallelized version of ABC algorithm by adapting employed and onlooker bee phases to the Compute Unified Device Architecture (CUDA) platform which is a graphical processing unit (GPU) programming environment by NVIDIA. The execution speed and obtained results of the proposed approach and sequential version of ABC algorithm are compared on functions that are typically used as benchmarks for optimization algorithms. Tests on standard benchmark functions with different colony size and number of parameters showed that proposed parallelization approach for ABC algorithm decreases the execution time consumed by the employed and onlooker bee phases in total and achieved similar or better quality of the results compared to the standard sequential implementation of the ABC algorithm.

Keywords: Artificial Bee Colony algorithm, GPU computing, swarm intelligence, parallelization

Procedia PDF Downloads 339

24 Reed: An Approach Towards Quickly Bootstrapping Multilingual Acoustic Models

Authors: Bipasha Sen, Aditya Agarwal

Abstract:

Multilingual automatic speech recognition (ASR) system is a single entity capable of transcribing multiple languages sharing a common phone space. Performance of such a system is highly dependent on the compatibility of the languages. State of the art speech recognition systems are built using sequential architectures based on recurrent neural networks (RNN) limiting the computational parallelization in training. This poses a significant challenge in terms of time taken to bootstrap and validate the compatibility of multiple languages for building a robust multilingual system. Complex architectural choices based on self-attention networks are made to improve the parallelization thereby reducing the training time. In this work, we propose Reed, a simple system based on 1D convolutions which uses very short context to improve the training time. To improve the performance of our system, we use raw time-domain speech signals directly as input. This enables the convolutional layers to learn feature representations rather than relying on handcrafted features such as MFCC. We report improvement on training and inference times by atleast a factor of 4x and 7.4x respectively with comparable WERs against standard RNN based baseline systems on SpeechOcean's multilingual low resource dataset.

Keywords: convolutional neural networks, language compatibility, low resource languages, multilingual automatic speech recognition

Procedia PDF Downloads 92

23 Improving the Run Times of Existing and Historical Demand Models Using Simple Python Scripting

Authors: Abhijeet Ostawal, Parmjit Lall

Abstract:

The run times for a large strategic model that we were managing had become too long leading to delays in project delivery, increased costs and loss in productivity. Software developers are continuously working towards developing more efficient tools by changing their algorithms and processes. The issue faced by our team was how do you apply the latest technologies on validated existing models which are based on much older versions of software that do not have the latest software capabilities. The multi-model transport model that we had could only be run in sequential assignment order. Recent upgrades to the software now allowed the assignment to be run in parallel, a concept called parallelization. Parallelization is a Python script working only within the latest version of the software. A full model transfer to the latest version was not possible due to time, budget and the potential changes in trip assignment. This article is to show the method to adapt and update the Python script in such a way that it can be used in older software versions by calling the latest version and then recalling the old version for assignment model without affecting the results. Through a process of trial-and-error run time savings of up to 30-40% have been achieved. Assignment results were maintained within the older version and through this learning process we’ve applied this methodology to other even older versions of the software resulting in huge time savings, more productivity and efficiency for both client and consultant.

Keywords: model run time, demand model, parallelisation, python scripting

Procedia PDF Downloads 87

22 Cache Analysis and Software Optimizations for Faster on-Chip Network Simulations

Authors: Khyamling Parane, B. M. Prabhu Prasad, Basavaraj Talawar

Abstract:

Fast simulations are critical in reducing time to market in CMPs and SoCs. Several simulators have been used to evaluate the performance and power consumed by Network-on-Chips. Researchers and designers rely upon these simulators for design space exploration of NoC architectures. Our experiments show that simulating large NoC topologies take hours to several days for completion. To speed up the simulations, it is necessary to investigate and optimize the hotspots in simulator source code. Among several simulators available, we choose Booksim2.0, as it is being extensively used in the NoC community. In this paper, we analyze the cache and memory system behaviour of Booksim2.0 to accurately monitor input dependent performance bottlenecks. Our measurements show that cache and memory usage patterns vary widely based on the input parameters given to Booksim2.0. Based on these measurements, the cache configuration having least misses has been identified. To further reduce the cache misses, we use software optimization techniques such as removal of unused functions, loop interchanging and replacing post-increment operator with pre-increment operator for non-primitive data types. The cache misses were reduced by 18.52%, 5.34% and 3.91% by employing above technology respectively. We also employ thread parallelization and vectorization to improve the overall performance of Booksim2.0. The OpenMP programming model and SIMD are used for parallelizing and vectorizing the more time-consuming portions of Booksim2.0. Speedups of 2.93x and 3.97x were observed for the Mesh topology with 30 × 30 network size by employing thread parallelization and vectorization respectively.

Keywords: cache behaviour, network-on-chip, performance profiling, vectorization

Procedia PDF Downloads 164

21 Implementation of ADETRAN Language Using Message Passing Interface

Authors: Akiyoshi Wakatani

Abstract:

This paper describes the Message Passing Interface (MPI) implementation of ADETRAN language, and its evaluation on SX-ACE supercomputers. ADETRAN language includes pdo statement that specifies the data distribution and parallel computations and pass statement that specifies the redistribution of arrays. Two methods for implementation of pass statement are discussed and the performance evaluation using Splitting-Up CG method is presented. The effectiveness of the parallelization is evaluated and the advantage of one dimensional distribution is empirically confirmed by using the results of experiments.

Keywords: iterative methods, array redistribution, translator, distributed memory

Procedia PDF Downloads 239

20 Building a Scalable Telemetry Based Multiclass Predictive Maintenance Model in R

Authors: Jaya Mathew

Abstract:

Many organizations are faced with the challenge of how to analyze and build Machine Learning models using their sensitive telemetry data. In this paper, we discuss how users can leverage the power of R without having to move their big data around as well as a cloud based solution for organizations willing to host their data in the cloud. By using ScaleR technology to benefit from parallelization and remote computing or R Services on premise or in the cloud, users can leverage the power of R at scale without having to move their data around.

Keywords: predictive maintenance, machine learning, big data, cloud based, on premise solution, R

Procedia PDF Downloads 347

19 Grid Computing for Multi-Objective Optimization Problems

Authors: Aouaouche Elmaouhab, Hassina Beggar

Abstract:

Solving multi-objective discrete optimization applications has always been limited by the resources of one machine: By computing power or by memory, most often both. To speed up the calculations, the grid computing represents a primary solution for the treatment of these applications through the parallelization of these resolution methods. In this work, we are interested in the study of some methods for solving multiple objective integer linear programming problem based on Branch-and-Bound and the study of grid computing technology. This study allowed us to propose an implementation of the method of Abbas and Al on the grid by reducing the execution time. To enhance our contribution, the main results are presented.

Keywords: multi-objective optimization, integer linear programming, grid computing, parallel computing

Procedia PDF Downloads 451

18 The Parallelization of Algorithm Based on Partition Principle for Association Rules Discovery

Authors: Khadidja Belbachir, Hafida Belbachir

Abstract:

subsequently the expansion of the physical supports storage and the needs ceaseless to accumulate several data, the sequential algorithms of associations’ rules research proved to be ineffective. Thus the introduction of the new parallel versions is imperative. We propose in this paper, a parallel version of a sequential algorithm “Partition”. This last is fundamentally different from the other sequential algorithms, because it scans the data base only twice to generate the significant association rules. By consequence, the parallel approach does not require much communication between the sites. The proposed approach was implemented for an experimental study. The obtained results, shows a great reduction in execution time compared to the sequential version and Count Distributed algorithm.

Keywords: association rules, distributed data mining, partition, parallel algorithms

Procedia PDF Downloads 367

17 Using High Performance Computing for Online Flood Monitoring and Prediction

Authors: Stepan Kuchar, Martin Golasowski, Radim Vavrik, Michal Podhoranyi, Boris Sir, Jan Martinovic

Abstract:

The main goal of this article is to describe the online flood monitoring and prediction system Floreon+ primarily developed for the Moravian-Silesian region in the Czech Republic and the basic process it uses for running automatic rainfall-runoff and hydrodynamic simulations along with their calibration and uncertainty modeling. It takes a long time to execute such process sequentially, which is not acceptable in the online scenario, so the use of high-performance computing environment is proposed for all parts of the process to shorten their duration. Finally, a case study on the Ostravice river catchment is presented that shows actual durations and their gain from the parallel implementation.

Keywords: flood prediction process, high performance computing, online flood prediction system, parallelization

Procedia PDF Downloads 455

16 Parallelization of Random Accessible Progressive Streaming of Compressed 3D Models over Web

Authors: Aayushi Somani, Siba P. Samal

Abstract:

Three-dimensional (3D) meshes are data structures, which store geometric information of an object or scene, generally in the form of vertices and edges. Current technology in laser scanning and other geometric data acquisition technologies acquire high resolution sampling which leads to high resolution meshes. While high resolution meshes give better quality rendering and hence is used often, the processing, as well as storage of 3D meshes, is currently resource-intensive. At the same time, web applications for data processing have become ubiquitous owing to their accessibility. For 3D meshes, the advancement of 3D web technologies, such as WebGL, WebVR, has enabled high fidelity rendering of huge meshes. However, there exists a gap in ability to stream huge meshes to a native client and browser application due to high network latency. Also, there is an inherent delay of loading WebGL pages due to large and complex models. The focus of our work is to identify the challenges faced when such meshes are streamed into and processed on hand-held devices, owing to its limited resources. One of the solutions that are conventionally used in the graphics community to alleviate resource limitations is mesh compression. Our approach deals with a two-step approach for random accessible progressive compression and its parallel implementation. The first step includes partition of the original mesh to multiple sub-meshes, and then we invoke data parallelism on these sub-meshes for its compression. Subsequent threaded decompression logic is implemented inside the Web Browser Engine with modification of WebGL implementation in Chromium open source engine. This concept can be used to completely revolutionize the way e-commerce and Virtual Reality technology works for consumer electronic devices. These objects can be compressed in the server and can be transmitted over the network. The progressive decompression can be performed on the client device and rendered. Multiple views currently used in e-commerce sites for viewing the same product from different angles can be replaced by a single progressive model for better UX and smoother user experience. Can also be used in WebVR for commonly and most widely used activities like virtual reality shopping, watching movies and playing games. Our experiments and comparison with existing techniques show encouraging results in terms of latency (compressed size is ~10-15% of the original mesh), processing time (20-22% increase over serial implementation) and quality of user experience in web browser.

Keywords: 3D compression, 3D mesh, 3D web, chromium, client-server architecture, e-commerce, level of details, parallelization, progressive compression, WebGL, WebVR

Procedia PDF Downloads 136

15 Fast Short-Term Electrical Load Forecasting under High Meteorological Variability with a Multiple Equation Time Series Approach

Authors: Charline David, Alexandre Blondin Massé, Arnaud Zinflou

Abstract:

In 2016, Clements, Hurn, and Li proposed a multiple equation time series approach for the short-term load forecasting, reporting an average mean absolute percentage error (MAPE) of 1.36% on an 11-years dataset for the Queensland region in Australia. We present an adaptation of their model to the electrical power load consumption for the whole Quebec province in Canada. More precisely, we take into account two additional meteorological variables — cloudiness and wind speed — on top of temperature, as well as the use of multiple meteorological measurements taken at different locations on the territory. We also consider other minor improvements. Our final model shows an average MAPE score of 1:79% over an 8-years dataset.

Keywords: short-term load forecasting, special days, time series, multiple equations, parallelization, clustering

Procedia PDF Downloads 68

14 Exploring SSD Suitable Allocation Schemes Incompliance with Workload Patterns

Authors: Jae Young Park, Hwansu Jung, Jong Tae Kim

Abstract:

Whether the data has been well parallelized is an important factor in the Solid-State-Drive (SSD) performance. SSD parallelization is affected by allocation scheme and it is directly connected to SSD performance. There are dynamic allocation and static allocation in representative allocation schemes. Dynamic allocation is more adaptive in exploiting write operation parallelism, while static allocation is better in read operation parallelism. Therefore, it is hard to select the appropriate allocation scheme when the workload is mixed read and write operations. We simulated conditions on a few mixed data patterns and analyzed the results to help the right choice for better performance. As the results, if data arrival interval is long enough prior operations to be finished and continuous read intensive data environment static allocation is more suitable. Dynamic allocation performs the best on write performance and random data patterns.

Keywords: dynamic allocation, NAND flash based SSD, SSD parallelism, static allocation

Procedia PDF Downloads 301

13 Hardware Implementation for the Contact Force Reconstruction in Tactile Sensor Arrays

Authors: María-Luisa Pinto-Salamanca, Wilson-Javier Pérez-Holguín

Abstract:

Reconstruction of contact forces is a fundamental technique for analyzing the properties of a touched object and is essential for regulating the grip force in slip control loops. This is based on the processing of the distribution, intensity, and direction of the forces during the capture of the sensors. Currently, efficient hardware alternatives have been used more frequently in different fields of application, allowing the implementation of computationally complex algorithms, as is the case with tactile signal processing. The use of hardware for smart tactile sensing systems is a research area that promises to improve the processing time and portability requirements of applications such as artificial skin and robotics, among others. The literature review shows that hardware implementations are present today in almost all stages of smart tactile detection systems except in the force reconstruction process, a stage in which they have been less applied. This work presents a hardware implementation of a model-driven reported in the literature for the contact force reconstruction of flat and rigid tactile sensor arrays from normal stress data. From the analysis of a software implementation of such a model, this implementation proposes the parallelization of tasks that facilitate the execution of matrix operations and a two-dimensional optimization function to obtain a vector force by each taxel in the array. This work seeks to take advantage of the parallel hardware characteristics of Field Programmable Gate Arrays, FPGAs, and the possibility of applying appropriate techniques for algorithms parallelization using as a guide the rules of generalization, efficiency, and scalability in the tactile decoding process and considering the low latency, low power consumption, and real-time execution as the main parameters of design. The results show a maximum estimation error of 32% in the tangential forces and 22% in the normal forces with respect to the simulation by the Finite Element Modeling (FEM) technique of Hertzian and non-Hertzian contact events, over sensor arrays of 10×10 taxels of different sizes. The hardware implementation was carried out on an MPSoC XCZU9EG-2FFVB1156 platform of Xilinx® that allows the reconstruction of force vectors following a scalable approach, from the information captured by means of tactile sensor arrays composed of up to 48 × 48 taxels that use various transduction technologies. The proposed implementation demonstrates a reduction in estimation time of x / 180 compared to software implementations. Despite the relatively high values of the estimation errors, the information provided by this implementation on the tangential and normal tractions and the triaxial reconstruction of forces allows to adequately reconstruct the tactile properties of the touched object, which are similar to those obtained in the software implementation and in the two FEM simulations taken as reference. Although errors could be reduced, the proposed implementation is useful for decoding contact forces for portable tactile sensing systems, thus helping to expand electronic skin applications in robotic and biomedical contexts.

Keywords: contact forces reconstruction, forces estimation, tactile sensor array, hardware implementation

Procedia PDF Downloads 149

12 An Inviscid Compressible Flow Solver Based on Unstructured OpenFOAM Mesh Format

Authors: Utkan Caliskan

Abstract:

Two types of numerical codes based on finite volume method are developed in order to solve compressible Euler equations to simulate the flow through forward facing step channel. Both algorithms have AUSM+- up (Advection Upstream Splitting Method) scheme for flux splitting and two-stage Runge-Kutta scheme for time stepping. In this study, the flux calculations differentiate between the algorithm based on OpenFOAM mesh format which is called 'face-based' algorithm and the basic algorithm which is called 'element-based' algorithm. The face-based algorithm avoids redundant flux computations and also is more flexible with hybrid grids. Moreover, some of OpenFOAM’s preprocessing utilities can be used on the mesh. Parallelization of the face based algorithm for which atomic operations are needed due to the shared memory model, is also presented. For several mesh sizes, 2.13x speed up is obtained with face-based approach over the element-based approach.

Keywords: cell centered finite volume method, compressible Euler equations, OpenFOAM mesh format, OpenMP

Procedia PDF Downloads 285

11 Predictive Analysis for Big Data: Extension of Classification and Regression Trees Algorithm

Authors: Ameur Abdelkader, Abed Bouarfa Hafida

Abstract:

Since its inception, predictive analysis has revolutionized the IT industry through its robustness and decision-making facilities. It involves the application of a set of data processing techniques and algorithms in order to create predictive models. Its principle is based on finding relationships between explanatory variables and the predicted variables. Past occurrences are exploited to predict and to derive the unknown outcome. With the advent of big data, many studies have suggested the use of predictive analytics in order to process and analyze big data. Nevertheless, they have been curbed by the limits of classical methods of predictive analysis in case of a large amount of data. In fact, because of their volumes, their nature (semi or unstructured) and their variety, it is impossible to analyze efficiently big data via classical methods of predictive analysis. The authors attribute this weakness to the fact that predictive analysis algorithms do not allow the parallelization and distribution of calculation. In this paper, we propose to extend the predictive analysis algorithm, Classification And Regression Trees (CART), in order to adapt it for big data analysis. The major changes of this algorithm are presented and then a version of the extended algorithm is defined in order to make it applicable for a huge quantity of data.

Keywords: predictive analysis, big data, predictive analysis algorithms, CART algorithm

Procedia PDF Downloads 111

10 Massively-Parallel Bit-Serial Neural Networks for Fast Epilepsy Diagnosis: A Feasibility Study

Authors: Si Mon Kueh, Tom J. Kazmierski

Abstract:

There are about 1% of the world population suffering from the hidden disability known as epilepsy and major developing countries are not fully equipped to counter this problem. In order to reduce the inconvenience and danger of epilepsy, different methods have been researched by using a artificial neural network (ANN) classification to distinguish epileptic waveforms from normal brain waveforms. This paper outlines the aim of achieving massive ANN parallelization through a dedicated hardware using bit-serial processing. The design of this bit-serial Neural Processing Element (NPE) is presented which implements the functionality of a complete neuron using variable accuracy. The proposed design has been tested taking into consideration non-idealities of a hardware ANN. The NPE consists of a bit-serial multiplier which uses only 16 logic elements on an Altera Cyclone IV FPGA and a bit-serial ALU as well as a look-up table. Arrays of NPEs can be driven by a single controller which executes the neural processing algorithm. In conclusion, the proposed compact NPE design allows the construction of complex hardware ANNs that can be implemented in a portable equipment that suits the needs of a single epileptic patient in his or her daily activities to predict the occurrences of impending tonic conic seizures.

Keywords: Artificial Neural Networks (ANN), bit-serial neural processor, FPGA, Neural Processing Element (NPE)

Procedia PDF Downloads 286

9 A Parallel Cellular Automaton Model of Tumor Growth for Multicore and GPU Programming

Authors: Manuel I. Capel, Antonio Tomeu, Alberto Salguero

Abstract:

Tumor growth from a transformed cancer-cell up to a clinically apparent mass spans through a range of spatial and temporal magnitudes. Through computer simulations, Cellular Automata (CA) can accurately describe the complexity of the development of tumors. Tumor development prognosis can now be made -without making patients undergo through annoying medical examinations or painful invasive procedures- if we develop appropriate CA-based software tools. In silico testing mainly refers to Computational Biology research studies of application to clinical actions in Medicine. To establish sound computer-based models of cellular behavior, certainly reduces costs and saves precious time with respect to carrying out experiments in vitro at labs or in vivo with living cells and organisms. These aim to produce scientifically relevant results compared to traditional in vitro testing, which is slow, expensive, and does not generally have acceptable reproducibility under the same conditions. For speeding up computer simulations of cellular models, specific literature shows recent proposals based on the CA approach that include advanced techniques, such the clever use of supporting efficient data structures when modeling with deterministic stochastic cellular automata. Multiparadigm and multiscale simulation of tumor dynamics is just beginning to be developed by the concerned research community. The use of stochastic cellular automata (SCA), whose parallel programming implementations are open to yield a high computational performance, are of much interest to be explored up to their computational limits. There have been some approaches based on optimizations to advance in multiparadigm models of tumor growth, which mainly pursuit to improve performance of these models through efficient memory accesses guarantee, or considering the dynamic evolution of the memory space (grids, trees,…) that holds crucial data in simulations. In our opinion, the different optimizations mentioned above are not decisive enough to achieve the high performance computing power that cell-behavior simulation programs actually need. The possibility of using multicore and GPU parallelism as a promising multiplatform and framework to develop new programming techniques to speed-up the computation time of simulations is just starting to be explored in the few last years. This paper presents a model that incorporates parallel processing, identifying the synchronization necessary for speeding up tumor growth simulations implemented in Java and C++ programming environments. The speed up improvement that specific parallel syntactic constructs, such as executors (thread pools) in Java, are studied. The new tumor growth parallel model is proved using implementations with Java and C++ languages on two different platforms: chipset Intel core i-X and a HPC cluster of processors at our university. The parallelization of Polesczuk and Enderling model (normally used by researchers in mathematical oncology) proposed here is analyzed with respect to performance gain. We intend to apply the model and overall parallelization technique presented here to solid tumors of specific affiliation such as prostate, breast, or colon. Our final objective is to set up a multiparadigm model capable of modelling angiogenesis, or the growth inhibition induced by chemotaxis, as well as the effect of therapies based on the presence of cytotoxic/cytostatic drugs.

Keywords: cellular automaton, tumor growth model, simulation, multicore and manycore programming, parallel programming, high performance computing, speed up

Procedia PDF Downloads 209

8 Graph Codes - 2D Projections of Multimedia Feature Graphs for Fast and Effective Retrieval

Authors: Stefan Wagenpfeil, Felix Engel, Paul McKevitt, Matthias Hemmje

Abstract:

Multimedia Indexing and Retrieval is generally designed and implemented by employing feature graphs. These graphs typically contain a significant number of nodes and edges to reflect the level of detail in feature detection. A higher level of detail increases the effectiveness of the results but also leads to more complex graph structures. However, graph-traversal-based algorithms for similarity are quite inefficient and computation intensive, especially for large data structures. To deliver fast and effective retrieval, an efficient similarity algorithm, particularly for large graphs, is mandatory. Hence, in this paper, we define a graph-projection into a 2D space (Graph Code) as well as the corresponding algorithms for indexing and retrieval. We show that calculations in this space can be performed more efficiently than graph-traversals due to a simpler processing model and a high level of parallelization. In consequence, we prove that the effectiveness of retrieval also increases substantially, as Graph Codes facilitate more levels of detail in feature fusion. Thus, Graph Codes provide a significant increase in efficiency and effectiveness (especially for Multimedia indexing and retrieval) and can be applied to images, videos, audio, and text information.

Keywords: indexing, retrieval, multimedia, graph algorithm, graph code

Procedia PDF Downloads 126

7 An Improved Parallel Algorithm of Decision Tree

Authors: Jiameng Wang, Yunfei Yin, Xiyu Deng

Abstract:

Parallel optimization is one of the important research topics of data mining at this stage. Taking Classification and Regression Tree (CART) parallelization as an example, this paper proposes a parallel data mining algorithm based on SSP-OGini-PCCP. Aiming at the problem of choosing the best CART segmentation point, this paper designs an S-SP model without data association; and in order to calculate the Gini index efficiently, a parallel OGini calculation method is designed. In addition, in order to improve the efficiency of the pruning algorithm, a synchronous PCCP pruning strategy is proposed in this paper. In this paper, the optimal segmentation calculation, Gini index calculation, and pruning algorithm are studied in depth. These are important components of parallel data mining. By constructing a distributed cluster simulation system based on SPARK, data mining methods based on SSP-OGini-PCCP are tested. Experimental results show that this method can increase the search efficiency of the best segmentation point by an average of 89%, increase the search efficiency of the Gini segmentation index by 3853%, and increase the pruning efficiency by 146% on average; and as the size of the data set increases, the performance of the algorithm remains stable, which meets the requirements of contemporary massive data processing.

Keywords: classification, Gini index, parallel data mining, pruning ahead

Procedia PDF Downloads 97

6 Comparative Study and Parallel Implementation of Stochastic Models for Pricing of European Options Portfolios using Monte Carlo Methods

Authors: Vinayak Bassi, Rajpreet Singh

Abstract:

Over the years, with the emergence of sophisticated computers and algorithms, finance has been quantified using computational prowess. Asset valuation has been one of the key components of quantitative finance. In fact, it has become one of the embryonic steps in determining risk related to a portfolio, the main goal of quantitative finance. This study comprises a drawing comparison between valuation output generated by two stochastic dynamic models, namely Black-Scholes and Dupire’s bi-dimensionality model. Both of these models are formulated for computing the valuation function for a portfolio of European options using Monte Carlo simulation methods. Although Monte Carlo algorithms have a slower convergence rate than calculus-based simulation techniques (like FDM), they work quite effectively over high-dimensional dynamic models. A fidelity gap is analyzed between the static (historical) and stochastic inputs for a sample portfolio of underlying assets. In order to enhance the performance efficiency of the model, the study emphasized the use of variable reduction methods and customizing random number generators to implement parallelization. An attempt has been made to further implement the Dupire’s model on a GPU to achieve higher computational performance. Furthermore, ideas have been discussed around the performance enhancement and bottleneck identification related to the implementation of options-pricing models on GPUs.

Keywords: monte carlo, stochastic models, computational finance, parallel programming, scientific computing

Procedia PDF Downloads 128

5 Flood Modeling in Urban Area Using a Well-Balanced Discontinuous Galerkin Scheme on Unstructured Triangular Grids

Authors: Rabih Ghostine, Craig Kapfer, Viswanathan Kannan, Ibrahim Hoteit

Abstract:

Urban flooding resulting from a sudden release of water due to dam-break or excessive rainfall is a serious threatening environment hazard, which causes loss of human life and large economic losses. Anticipating floods before they occur could minimize human and economic losses through the implementation of appropriate protection, provision, and rescue plans. This work reports on the numerical modelling of flash flood propagation in urban areas after an excessive rainfall event or dam-break. A two-dimensional (2D) depth-averaged shallow water model is used with a refined unstructured grid of triangles for representing the urban area topography. The 2D shallow water equations are solved using a second-order well-balanced discontinuous Galerkin scheme. Theoretical test case and three flood events are described to demonstrate the potential benefits of the scheme: (i) wetting and drying in a parabolic basin (ii) flash flood over a physical model of the urbanized Toce River valley in Italy; (iii) wave propagation on the Reyran river valley in consequence of the Malpasset dam-break in 1959 (France); and (iv) dam-break flood in October 1982 at the town of Sumacarcel (Spain). The capability of the scheme is also verified against alternative models. Computational results compare well with recorded data and show that the scheme is at least as efficient as comparable second-order finite volume schemes, with notable efficiency speedup due to parallelization.

Keywords: dam-break, discontinuous Galerkin scheme, flood modeling, shallow water equations

Procedia PDF Downloads 145

4 Numerical Studies for Standard Bi-Conjugate Gradient Stabilized Method and the Parallel Variants for Solving Linear Equations

Authors: Kuniyoshi Abe

Abstract:

Bi-conjugate gradient (Bi-CG) is a well-known method for solving linear equations Ax = b, for x, where A is a given n-by-n matrix, and b is a given n-vector. Typically, the dimension of the linear equation is high and the matrix is sparse. A number of hybrid Bi-CG methods such as conjugate gradient squared (CGS), Bi-CG stabilized (Bi-CGSTAB), BiCGStab2, and BiCGstab(l) have been developed to improve the convergence of Bi-CG. Bi-CGSTAB has been most often used for efficiently solving the linear equation, but we have seen the convergence behavior with a long stagnation phase. In such cases, it is important to have Bi-CG coefficients that are as accurate as possible, and the stabilization strategy, which stabilizes the computation of the Bi-CG coefficients, has been proposed. It may avoid stagnation and lead to faster computation. Motivated by a large number of processors in present petascale high-performance computing hardware, the scalability of Krylov subspace methods on parallel computers has recently become increasingly prominent. The main bottleneck for efficient parallelization is the inner products which require a global reduction. The resulting global synchronization phases cause communication overhead on parallel computers. The parallel variants of Krylov subspace methods reducing the number of global communication phases and hiding the communication latency have been proposed. However, the numerical stability, specifically, the convergence speed of the parallel variants of Bi-CGSTAB may become worse than that of the standard Bi-CGSTAB. In this paper, therefore, we compare the convergence speed between the standard Bi-CGSTAB and the parallel variants by numerical experiments and show that the convergence speed of the standard Bi-CGSTAB is faster than the parallel variants. Moreover, we propose the stabilization strategy for the parallel variants.

Keywords: bi-conjugate gradient stabilized method, convergence speed, Krylov subspace methods, linear equations, parallel variant

Procedia PDF Downloads 131

3 The Effective Use of the Network in the Distributed Storage

Authors: Mamouni Mohammed Dhiya Eddine

Abstract:

This work aims at studying the exploitation of high-speed networks of clusters for distributed storage. Parallel applications running on clusters require both high-performance communications between nodes and efficient access to the storage system. Many studies on network technologies led to the design of dedicated architectures for clusters with very fast communications between computing nodes. Efficient distributed storage in clusters has been essentially developed by adding parallelization mechanisms so that the server(s) may sustain an increased workload. In this work, we propose to improve the performance of distributed storage systems in clusters by efficiently using the underlying high-performance network to access distant storage systems. The main question we are addressing is: do high-speed networks of clusters fit the requirements of a transparent, efficient and high-performance access to remote storage? We show that storage requirements are very different from those of parallel computation. High-speed networks of clusters were designed to optimize communications between different nodes of a parallel application. We study their utilization in a very different context, storage in clusters, where client-server models are generally used to access remote storage (for instance NFS, PVFS or LUSTRE). Our experimental study based on the usage of the GM programming interface of MYRINET high-speed networks for distributed storage raised several interesting problems. Firstly, the specific memory utilization in the storage access system layers does not easily fit the traditional memory model of high-speed networks. Secondly, client-server models that are used for distributed storage have specific requirements on message control and event processing, which are not handled by existing interfaces. We propose different solutions to solve communication control problems at the filesystem level. We show that a modification of the network programming interface is required. Data transfer issues need an adaptation of the operating system. We detail several propositions for network programming interfaces which make their utilization easier in the context of distributed storage. The integration of a flexible processing of data transfer in the new programming interface MYRINET/MX is finally presented. Performance evaluations show that its usage in the context of both storage and other types of applications is easy and efficient.

Keywords: distributed storage, remote file access, cluster, high-speed network, MYRINET, zero-copy, memory registration, communication control, event notification, application programming interface

Procedia PDF Downloads 192