Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 11

Search results for: multicore

11 A Study of the Trade-off Energy Consumption-Performance-Schedulability for DVFS Multicore Systems

Authors: Jalil Boudjadar


Dynamic Voltage and Frequency Scaling (DVFS) multicore platforms are promising execution platforms that enable high computational performance, less energy consumption and flexibility in scheduling the system processes. However, the resulting interleaving and memory interference together with per-core frequency tuning make real-time guarantees hard to be delivered. Besides, energy consumption represents a strong constraint for the deployment of such systems on energy-limited settings. Identifying the system configurations that would achieve a high performance and consume less energy while guaranteeing the system schedulability is a complex task in the design of modern embedded systems. This work studies the trade-off between energy consumption, cores utilization and memory bottleneck and their impact on the schedulability of DVFS multicore time-critical systems with a hierarchy of shared memories. We build a model-based framework using Parametrized Timed Automata of UPPAAL to analyze the mutual impact of performance, energy consumption and schedulability of DVFS multicore systems, and demonstrate the trade-off on an actual case study.

Keywords: time-critical systems, multicore systems, schedulability analysis, energy consumption, performance analysis

Procedia PDF Downloads 31
10 Study of Heat Conduction in Multicore Chips

Authors: K. N. Seetharamu, Naveen Teggi, Kiranakumar Dhavalagi, Narayana Kamath


A method of temperature calculations is developed to study the conditions leading to hot spot occurrence on multicore chips. A physical model which has salient features of multicore chips is incorporated for the analysis. The model consists of active and background cell laid out in a checkered pattern, and this pattern repeats itself in each fine grain active cells. The die has three layers i) body ii) buried oxide layer iii) wiring layer, stacked one above the other with heat source placed at the interface between wiring and buried oxide layer. With this model we propose analytical method to calculate the target hotspot temperature, heat flow to top and bottom layers of the die and thermal resistance components at each granularity level, assuming appropriate values of die dimensions and parameters. Finally we attempt to find an easier method for the calculation of the target hotspot temperature using graph.

Keywords: checkered pattern, granularity level, heat conduction, multicore chips, target hotspot temperature

Procedia PDF Downloads 174
9 Model-Based Automotive Partitioning and Mapping for Embedded Multicore Systems

Authors: Robert Höttger, Lukas Krawczyk, Burkhard Igel


This paper introduces novel approaches to partitioning and mapping in terms of model-based embedded multicore system engineering and further discusses benefits, industrial relevance and features in common with existing approaches. In order to assess and evaluate results, both approaches have been applied to a real industrial application as well as to various prototypical demonstrative applications, that have been developed and implemented for different purposes. Evaluations show, that such applications improve significantly according to performance, energy efficiency, meeting timing constraints and covering maintaining issues by using the AMALTHEA platform and the implemented approaches. Further- more, the model-based design provides an open, expandable, platform independent and scalable exchange format between OEMs, suppliers and developers on different levels. Our proposed mechanisms provide meaningful multicore system utilization since load balancing by means of partitioning and mapping is effectively performed with regard to the modeled systems including hardware, software, operating system, scheduling, constraints, configuration and more data.

Keywords: partitioning, mapping, distributed systems, scheduling, embedded multicore systems, model-based, system analysis

Procedia PDF Downloads 528
8 Performance Evaluation of Parallel Surface Modeling and Generation on Actual and Virtual Multicore Systems

Authors: Nyeng P. Gyang


Even though past, current and future trends suggest that multicore and cloud computing systems are increasingly prevalent/ubiquitous, this class of parallel systems is nonetheless underutilized, in general, and barely used for research on employing parallel Delaunay triangulation for parallel surface modeling and generation, in particular. The performances, of actual/physical and virtual/cloud multicore systems/machines, at executing various algorithms, which implement various parallelization strategies of the incremental insertion technique of the Delaunay triangulation algorithm, were evaluated. T-tests were run on the data collected, in order to determine whether various performance metrics differences (including execution time, speedup and efficiency) were statistically significant. Results show that the actual machine is approximately twice faster than the virtual machine at executing the same programs for the various parallelization strategies. Results, which furnish the scalability behaviors of the various parallelization strategies, also show that some of the differences between the performances of these systems, during different runs of the algorithms on the systems, were statistically significant. A few pseudo superlinear speedup results, which were computed from the raw data collected, are not true superlinear speedup values. These pseudo superlinear speedup values, which arise as a result of one way of computing speedups, disappear and give way to asymmetric speedups, which are the accurate kind of speedups that occur in the experiments performed.

Keywords: cloud computing systems, multicore systems, parallel Delaunay triangulation, parallel surface modeling and generation

Procedia PDF Downloads 124
7 Automatic Tuning for a Systemic Model of Banking Originated Losses (SYMBOL) Tool on Multicore

Authors: Ronal Muresano, Andrea Pagano


Nowadays, the mathematical/statistical applications are developed with more complexity and accuracy. However, these precisions and complexities have brought as result that applications need more computational power in order to be executed faster. In this sense, the multicore environments are playing an important role to improve and to optimize the execution time of these applications. These environments allow us the inclusion of more parallelism inside the node. However, to take advantage of this parallelism is not an easy task, because we have to deal with some problems such as: cores communications, data locality, memory sizes (cache and RAM), synchronizations, data dependencies on the model, etc. These issues are becoming more important when we wish to improve the application’s performance and scalability. Hence, this paper describes an optimization method developed for Systemic Model of Banking Originated Losses (SYMBOL) tool developed by the European Commission, which is based on analyzing the application's weakness in order to exploit the advantages of the multicore. All these improvements are done in an automatic and transparent manner with the aim of improving the performance metrics of our tool. Finally, experimental evaluations show the effectiveness of our new optimized version, in which we have achieved a considerable improvement on the execution time. The time has been reduced around 96% for the best case tested, between the original serial version and the automatic parallel version.

Keywords: algorithm optimization, bank failures, OpenMP, parallel techniques, statistical tool

Procedia PDF Downloads 289
6 Evaluating the Impact of Replacement Policies on the Cache Performance and Energy Consumption in Different Multicore Embedded Systems

Authors: Sajjad Rostami-Sani, Mojtaba Valinataj, Amir-Hossein Khojir-Angasi


The cache has an important role in the reduction of access delay between a processor and memory in high-performance embedded systems. In these systems, the energy consumption is one of the most important concerns, and it will become more important with smaller processor feature sizes and higher frequencies. Meanwhile, the cache system dissipates a significant portion of energy compared to the other components of a processor. There are some elements that can affect the energy consumption of the cache such as replacement policy and degree of associativity. Due to these points, it can be inferred that selecting an appropriate configuration for the cache is a crucial part of designing a system. In this paper, we investigate the effect of different cache replacement policies on both cache’s performance and energy consumption. Furthermore, the impact of different Instruction Set Architectures (ISAs) on cache’s performance and energy consumption has been investigated.

Keywords: energy consumption, replacement policy, instruction set architecture, multicore processor

Procedia PDF Downloads 58
5 A Survey on Constraint Solving Approaches Using Parallel Architectures

Authors: Nebras Gharbi, Itebeddine Ghorbel


In the latest years and with the advancements of the multicore computing world, the constraint programming community tried to benefit from the capacity of new machines and make the best use of them through several parallel schemes for constraint solving. In this paper, we propose a survey of the different proposed approaches to solve Constraint Satisfaction Problems using parallel architectures. These approaches use in a different way a parallel architecture: the problem itself could be solved differently by several solvers or could be split over solvers.

Keywords: constraint programming, parallel programming, constraint satisfaction problem, speed-up

Procedia PDF Downloads 211
4 Detecting the Edge of Multiple Images in Parallel

Authors: Prakash K. Aithal, U. Dinesh Acharya, Rajesh Gopakumar


Edge is variation of brightness in an image. Edge detection is useful in many application areas such as finding forests, rivers from a satellite image, detecting broken bone in a medical image etc. The paper discusses about finding edge of multiple aerial images in parallel .The proposed work tested on 38 images 37 colored and one monochrome image. The time taken to process N images in parallel is equivalent to time taken to process 1 image in sequential. The proposed method achieves pixel level parallelism as well as image level parallelism.

Keywords: edge detection, multicore, gpu, opencl, mpi

Procedia PDF Downloads 249
3 A Parallel Cellular Automaton Model of Tumor Growth for Multicore and GPU Programming

Authors: Manuel I. Capel, Antonio Tomeu, Alberto Salguero


Tumor growth from a transformed cancer-cell up to a clinically apparent mass spans through a range of spatial and temporal magnitudes. Through computer simulations, Cellular Automata (CA) can accurately describe the complexity of the development of tumors. Tumor development prognosis can now be made -without making patients undergo through annoying medical examinations or painful invasive procedures- if we develop appropriate CA-based software tools. In silico testing mainly refers to Computational Biology research studies of application to clinical actions in Medicine. To establish sound computer-based models of cellular behavior, certainly reduces costs and saves precious time with respect to carrying out experiments in vitro at labs or in vivo with living cells and organisms. These aim to produce scientifically relevant results compared to traditional in vitro testing, which is slow, expensive, and does not generally have acceptable reproducibility under the same conditions. For speeding up computer simulations of cellular models, specific literature shows recent proposals based on the CA approach that include advanced techniques, such the clever use of supporting efficient data structures when modeling with deterministic stochastic cellular automata. Multiparadigm and multiscale simulation of tumor dynamics is just beginning to be developed by the concerned research community. The use of stochastic cellular automata (SCA), whose parallel programming implementations are open to yield a high computational performance, are of much interest to be explored up to their computational limits. There have been some approaches based on optimizations to advance in multiparadigm models of tumor growth, which mainly pursuit to improve performance of these models through efficient memory accesses guarantee, or considering the dynamic evolution of the memory space (grids, trees,…) that holds crucial data in simulations. In our opinion, the different optimizations mentioned above are not decisive enough to achieve the high performance computing power that cell-behavior simulation programs actually need. The possibility of using multicore and GPU parallelism as a promising multiplatform and framework to develop new programming techniques to speed-up the computation time of simulations is just starting to be explored in the few last years. This paper presents a model that incorporates parallel processing, identifying the synchronization necessary for speeding up tumor growth simulations implemented in Java and C++ programming environments. The speed up improvement that specific parallel syntactic constructs, such as executors (thread pools) in Java, are studied. The new tumor growth parallel model is proved using implementations with Java and C++ languages on two different platforms: chipset Intel core i-X and a HPC cluster of processors at our university. The parallelization of Polesczuk and Enderling model (normally used by researchers in mathematical oncology) proposed here is analyzed with respect to performance gain. We intend to apply the model and overall parallelization technique presented here to solid tumors of specific affiliation such as prostate, breast, or colon. Our final objective is to set up a multiparadigm model capable of modelling angiogenesis, or the growth inhibition induced by chemotaxis, as well as the effect of therapies based on the presence of cytotoxic/cytostatic drugs.

Keywords: cellular automaton, tumor growth model, simulation, multicore and manycore programming, parallel programming, high performance computing, speed up

Procedia PDF Downloads 160
2 Parallel Computation of the Covariance-Matrix

Authors: Claude Tadonki


We address the issues related to the computation of the covariance matrix. This matrix is likely to be ill conditioned following its canonical expression, thus consequently raises serious numerical issues. The underlying linear system, which therefore should be solved by means of iterative approaches, becomes computationally challenging. A huge number of iterations is expected in order to reach an acceptable level of convergence, necessary to meet the required accuracy of the computation. In addition, this linear system needs to be solved at each iteration following the general form of the covariance matrix. Putting all together, its comes that we need to compute as fast as possible the associated matrix-vector product. This is our purpose in the work, where we consider and discuss skillful formulations of the problem, then propose a parallel implementation of the matrix-vector product involved. Numerical and performance oriented discussions are provided based on experimental evaluations.

Keywords: covariance-matrix, multicore, numerical computing, parallel computing

Procedia PDF Downloads 221
1 150 KVA Multifunction Laboratory Test Unit Based on Power-Frequency Converter

Authors: Bartosz Kedra, Robert Malkowski


This paper provides description and presentation of laboratory test unit built basing on 150 kVA power frequency converter and Simulink RealTime platform. Assumptions, based on criteria which load and generator types may be simulated using discussed device, are presented, as well as control algorithm structure. As laboratory setup contains transformer with thyristor controlled tap changer, a wider scope of setup capabilities is presented. Information about used communication interface, data maintenance, and storage solution as well as used Simulink real-time features is presented. List and description of all measurements are provided. Potential of laboratory setup modifications is evaluated. For purposes of Rapid Control Prototyping, a dedicated environment was used Simulink RealTime. Therefore, load model Functional Unit Controller is based on a PC computer with I/O cards and Simulink RealTime software. Simulink RealTime was used to create real-time applications directly from Simulink models. In the next step, applications were loaded on a target computer connected to physical devices that provided opportunity to perform Hardware in the Loop (HIL) tests, as well as the mentioned Rapid Control Prototyping process. With Simulink RealTime, Simulink models were extended with I/O cards driver blocks that made automatic generation of real-time applications and performing interactive or automated runs on a dedicated target computer equipped with a real-time kernel, multicore CPU, and I/O cards possible. Results of performed laboratory tests are presented. Different load configurations are described and experimental results are presented. This includes simulation of under frequency load shedding, frequency and voltage dependent characteristics of groups of load units, time characteristics of group of different load units in a chosen area and arbitrary active and reactive power regulation basing on defined schedule.

Keywords: MATLAB, power converter, Simulink Real-Time, thyristor-controlled tap changer

Procedia PDF Downloads 237