Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 113

Search results for: nMPRA processor

113 Improving the Performances of the nMPRA Architecture by Implementing Specific Functions in Hardware

Authors: Ionel Zagan, Vasile Gheorghita Gaitan

Abstract:

Minimizing the response time to asynchronous events in a real-time system is an important factor in increasing the speed of response and an interesting concept in designing equipment fast enough for the most demanding applications. The present article will present the results regarding the validation of the nMPRA (Multi Pipeline Register Architecture) architecture using the FPGA Virtex-7 circuit. The nMPRA concept is a hardware processor with the scheduler implemented at the processor level; this is done without affecting a possible bus communication, as is the case with the other CPU solutions. The implementation of static or dynamic scheduling operations in hardware and the improvement of handling interrupts and events by the real-time executive described in the present article represent a key solution for eliminating the overhead of the operating system functions. The nMPRA processor is capable of executing a preemptive scheduling, using various algorithms without a software scheduler. Therefore, we have also presented various scheduling methods and algorithms used in scheduling the real-time tasks.

Keywords: nMPRA architecture, pipeline processor, preemptive scheduling, real-time system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 661
112 CPU Architecture Based on Static Hardware Scheduler Engine and Multiple Pipeline Registers

Authors: Ionel Zagan, Vasile Gheorghita Gaitan

Abstract:

The development of CPUs and of real-time systems based on them made it possible to use time at increasingly low resolutions. Together with the scheduling methods and algorithms, time organizing has been improved so as to respond positively to the need for optimization and to the way in which the CPU is used. This presentation contains both a detailed theoretical description and the results obtained from research on improving the performances of the nMPRA (Multi Pipeline Register Architecture) processor by implementing specific functions in hardware. The proposed CPU architecture has been developed, simulated and validated by using the FPGA Virtex-7 circuit, via a SoC project. Although the nMPRA processor hardware structure with five pipeline stages is very complex, the present paper presents and analyzes the tests dedicated to the implementation of the CPU and of the memory on-chip for instructions and data. In order to practically implement and test the entire SoC project, various tests have been performed. These tests have been performed in order to verify the drivers for peripherals and the boot module named Bootloader.

Keywords: Hardware scheduler, nMPRA processor, real-time systems, scheduling methods.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 688
111 A Novel Implementation of Application Specific Instruction-set Processor (ASIP) using Verilog

Authors: Kamaraju.M, Lal Kishore.K, Tilak.A.V.N

Abstract:

The general purpose processors that are used in embedded systems must support constraints like execution time, power consumption, code size and so on. On the other hand an Application Specific Instruction-set Processor (ASIP) has advantages in terms of power consumption, performance and flexibility. In this paper, a 16-bit Application Specific Instruction-set processor for the sensor data transfer is proposed. The designed processor architecture consists of on-chip transmitter and receiver modules along with the processing and controlling units to enable the data transmission and reception on a single die. The data transfer is accomplished with less number of instructions as compared with the general purpose processor. The ASIP core operates at a maximum clock frequency of 1.132GHz with a delay of 0.883ns and consumes 569.63mW power at an operating voltage of 1.2V. The ASIP is implemented in Verilog HDL using the Xilinx platform on Virtex4.

Keywords: ASIP, Data transfer, Instruction set, Processor

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1802
110 Design of Multi-disease Diagnosis Processor using Hypernetworks Technique

Authors: Jae-Yeon Song, Seung-Yerl Lee, Kyu-Yeul Wang, Byung-Soo Kim, Sang-Seol Lee, Seong-Seob Shin, Jae-Young Choi, Chong Ho Lee, Jeahyun Park, Duck-Jin Chung

Abstract:

In this paper, we propose disease diagnosis hardware architecture by using Hypernetworks technique. It can be used to diagnose 3 different diseases (SPECT Heart, Leukemia, Prostate cancer). Generally, the disparate diseases require specified diagnosis hardware model for each disease. Using similarities of three diseases diagnosis processor, we design diagnosis processor that can diagnose three different diseases. Our proposed architecture that is combining three processors to one processor can reduce hardware size without decrease of the accuracy.

Keywords: Diagnosis processor, Hypernetworks, Leukemia, Mask, Prostate cancer, SPECT Heart data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1088
109 A Hyper-Domain Image Watermarking Method based on Macro Edge Block and Wavelet Transform for Digital Signal Processor

Authors: Yi-Pin Hsu, Shin-Yu Lin

Abstract:

In order to protect original data, watermarking is first consideration direction for digital information copyright. In addition, to achieve high quality image, the algorithm maybe can not run on embedded system because the computation is very complexity. However, almost nowadays algorithms need to build on consumer production because integrator circuit has a huge progress and cheap price. In this paper, we propose a novel algorithm which efficient inserts watermarking on digital image and very easy to implement on digital signal processor. In further, we select a general and cheap digital signal processor which is made by analog device company to fit consumer application. The experimental results show that the image quality by watermarking insertion can achieve 46 dB can be accepted in human vision and can real-time execute on digital signal processor.

Keywords: watermarking, digital signal processor, embedded system

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1026
108 Evaluating the Impact of Replacement Policies on the Cache Performance and Energy Consumption in Different Multicore Embedded Systems

Authors: Sajjad Rostami-Sani, Mojtaba Valinataj, Amir-Hossein Khojir-Angasi

Abstract:

The cache has an important role in the reduction of access delay between a processor and memory in high-performance embedded systems. In these systems, the energy consumption is one of the most important concerns, and it will become more important with smaller processor feature sizes and higher frequencies. Meanwhile, the cache system dissipates a significant portion of energy compared to the other components of a processor. There are some elements that can affect the energy consumption of the cache such as replacement policy and degree of associativity. Due to these points, it can be inferred that selecting an appropriate configuration for the cache is a crucial part of designing a system. In this paper, we investigate the effect of different cache replacement policies on both cache’s performance and energy consumption. Furthermore, the impact of different Instruction Set Architectures (ISAs) on cache’s performance and energy consumption has been investigated.

Keywords: L1-cache, energy consumption, replacement policy, Instruction set architecture, multicore processor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 478
107 The Fluid Limit of the Critical Processor Sharing Tandem Queue

Authors: Amal Ezzidani, Abdelghani Ben Tahar, Mohamed Hanini

Abstract:

A sequence of finite tandem queue is considered for this study. Each one has a single server, which operates under the egalitarian processor sharing discipline. External customers arrive at each queue according to a renewal input process and having a general service times distribution. Upon completing service, customers leave the current queue and enter to the next. Under mild assumptions, including critical data, we prove the existence and the uniqueness of the fluid solution. For asymptotic behavior, we provide necessary and sufficient conditions for the invariant state and the convergence to this invariant state. In the end, we establish the convergence of a correctly normalized state process to a fluid limit characterized by a system of algebraic and integral equations.

Keywords: Fluid Limit, fluid model, measure valued process, processor sharing, tandem queue.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 198
106 Implementing High Performance VPN Router using Cavium-s CN2560 Security Processor

Authors: Sang Su Lee, Sang Woo Lee, Yong Sung Jeon, Ki Young Kim

Abstract:

IPsec protocol[1] is a set of security extensions developed by the IETF and it provides privacy and authentication services at the IP layer by using modern cryptography. In this paper, we describe both of H/W and S/W architectures of our router system, SRS-10. The system is designed to support high performance routing and IPsec VPN. Especially, we used Cavium-s CN2560 processor to implement IPsec processing in inline-mode.

Keywords: IP, router, VPN, IPsec.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1825
105 Parallel-computing Approach for FFT Implementation on Digital Signal Processor (DSP)

Authors: Yi-Pin Hsu, Shin-Yu Lin

Abstract:

An efficient parallel form in digital signal processor can improve the algorithm performance. The butterfly structure is an important role in fast Fourier transform (FFT), because its symmetry form is suitable for hardware implementation. Although it can perform a symmetric structure, the performance will be reduced under the data-dependent flow characteristic. Even though recent research which call as novel memory reference reduction methods (NMRRM) for FFT focus on reduce memory reference in twiddle factor, the data-dependent property still exists. In this paper, we propose a parallel-computing approach for FFT implementation on digital signal processor (DSP) which is based on data-independent property and still hold the property of low-memory reference. The proposed method combines final two steps in NMRRM FFT to perform a novel data-independent structure, besides it is very suitable for multi-operation-unit digital signal processor and dual-core system. We have applied the proposed method of radix-2 FFT algorithm in low memory reference on TI TMSC320C64x DSP. Experimental results show the method can reduce 33.8% clock cycles comparing with the NMRRM FFT implementation and keep the low-memory reference property.

Keywords: Parallel-computing, FFT, low-memory reference, TIDSP.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1988
104 Optimization of SAD Algorithm on VLIW DSP

Authors: Hui-Jae You, Sun-Tae Chung, Souhwan Jung

Abstract:

SAD (Sum of Absolute Difference) algorithm is heavily used in motion estimation which is computationally highly demanding process in motion picture encoding. To enhance the performance of motion picture encoding on a VLIW processor, an efficient implementation of SAD algorithm on the VLIW processor is essential. SAD algorithm is programmed as a nested loop with a conditional branch. In VLIW processors, loop is usually optimized by software pipelining, but researches on optimal scheduling of software pipelining for nested loops, especially nested loops with conditional branches are rare. In this paper, we propose an optimal scheduling and implementation of SAD algorithm with conditional branch on a VLIW DSP processor. The proposed optimal scheduling first transforms the nested loop with conditional branch into a single loop with conditional branch with consideration of full utilization of ILP capability of the VLIW processor and realization of earlier escape from the loop. Next, the proposed optimal scheduling applies a modulo scheduling technique developed for single loop. Based on this optimal scheduling strategy, optimal implementation of SAD algorithm on TMS320C67x, a VLIW DSP is presented. Through experiments on TMS320C6713 DSK, it is shown that H.263 encoder with the proposed SAD implementation performs better than other H.263 encoder with other SAD implementations, and that the code size of the optimal SAD implementation is small enough to be appropriate for embedded environments.

Keywords: Optimal implementation, SAD algorithm, VLIW, TMS320C6713.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2048
103 Performance Analysis of Digital Signal Processors Using SMV Benchmark

Authors: Erh-Wen Hu, Cyril S. Ku, Andrew T. Russo, Bogong Su, Jian Wang

Abstract:

Unlike general-purpose processors, digital signal processors (DSP processors) are strongly application-dependent. To meet the needs for diverse applications, a wide variety of DSP processors based on different architectures ranging from the traditional to VLIW have been introduced to the market over the years. The functionality, performance, and cost of these processors vary over a wide range. In order to select a processor that meets the design criteria for an application, processor performance is usually the major concern for digital signal processing (DSP) application developers. Performance data are also essential for the designers of DSP processors to improve their design. Consequently, several DSP performance benchmarks have been proposed over the past decade or so. However, none of these benchmarks seem to have included recent new DSP applications. In this paper, we use a new benchmark that we recently developed to compare the performance of popular DSP processors from Texas Instruments and StarCore. The new benchmark is based on the Selectable Mode Vocoder (SMV), a speech-coding program from the recent third generation (3G) wireless voice applications. All benchmark kernels are compiled by the compilers of the respective DSP processors and run on their simulators. Weighted arithmetic mean of clock cycles and arithmetic mean of code size are used to compare the performance of five DSP processors. In addition, we studied how the performance of a processor is affected by code structure, features of processor architecture and optimization of compiler. The extensive experimental data gathered, analyzed, and presented in this paper should be helpful for DSP processor and compiler designers to meet their specific design goals.

Keywords: digital signal processors, DSP benchmark, instruction level parallelism, modified cyclomatic complexity, performance analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1355
102 Processor Scheduling on Parallel Computers

Authors: Mohammad S. Laghari, Gulzar A. Khuwaja

Abstract:

Many problems in computer vision and image processing present potential for parallel implementations through one of the three major paradigms of geometric parallelism, algorithmic parallelism and processor farming. Static process scheduling techniques are used successfully to exploit geometric and algorithmic parallelism, while dynamic process scheduling is better suited to dealing with the independent processes inherent in the process farming paradigm. This paper considers the application of parallel or multi-computers to a class of problems exhibiting spatial data characteristic of the geometric paradigm. However, by using processor farming paradigm, a dynamic scheduling technique is developed to suit the MIMD structure of the multi-computers. A hybrid scheme of scheduling is also developed and compared with the other schemes. The specific problem chosen for the investigation is the Hough transform for line detection.

Keywords: Hough transforms, parallel computer, parallel paradigms, scheduling.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1433
101 Verification and Proposal of Information Processing Model Using EEG-Based Brain Activity Monitoring

Authors: Toshitaka Higashino, Naoki Wakamiya

Abstract:

Human beings perform a task by perceiving information from outside, recognizing them, and responding them. There have been various attempts to analyze and understand internal processes behind the reaction to a given stimulus by conducting psychological experiments and analysis from multiple perspectives. Among these, we focused on Model Human Processor (MHP). However, it was built based on psychological experiments and thus the relation with brain activity was unclear so far. To verify the validity of the MHP and propose our model from a viewpoint of neuroscience, EEG (Electroencephalography) measurements are performed during experiments in this study. More specifically, first, experiments were conducted where Latin alphabet characters were used as visual stimuli. In addition to response time, ERPs (event-related potentials) such as N100 and P300 were measured by using EEG. By comparing cycle time predicted by the MHP and latency of ERPs, it was found that N100, related to perception of stimuli, appeared at the end of the perceptual processor. Furthermore, by conducting an additional experiment, it was revealed that P300, related to decision making, appeared during the response decision process, not at the end. Second, by experiments using Japanese Hiragana characters, i.e. Japan's own phonetic symbols, those findings were confirmed. Finally, Japanese Kanji characters were used as more complicated visual stimuli. A Kanji character usually has several readings and several meanings. Despite the difference, a reading-related task and a meaning-related task exhibited similar results, meaning that they involved similar information processing processes of the brain. Based on those results, our model was proposed which reflects response time and ERP latency. It consists of three processors: the perception processor from an input of a stimulus to appearance of N100, the cognitive processor from N100 to P300, and the decision-action processor from P300 to response. Using our model, an application system which reflects brain activity can be established.

Keywords: Brain activity, EEG, information processing model, model human processor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 234
100 A Processor with Dynamically Reconfigurable Circuit for Floating-Point Arithmetic

Authors: Yukinari Minagi , Akinori Kanasugi

Abstract:

This paper describes about dynamic reconfiguration to miniaturize arithmetic circuits in general-purpose processor. Dynamic reconfiguration is a technique to realize required functions by changing hardware construction during operation. The proposed arithmetic circuit performs floating-point arithmetic which is frequently used in science and technology. The data format is floating-point based on IEEE754. The proposed circuit is designed using VHDL, and verified the correct operation by simulations and experiments.

Keywords: dynamic reconfiguration, floating-point arithmetic, double precision, FPGA

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1269
99 A Multi Cordic Architecture on FPGA Platform

Authors: Ahmed Madian, Muaz Aljarhi

Abstract:

Coordinate Rotation Digital Computer (CORDIC) is a unique digital computing unit intended for the computation of mathematical operations and functions. This paper presents A multi CORDIC processor that integrates different CORDIC architectures on a single FPGA chip and allows the user to select the CORDIC architecture to proceed with based on what he wants to calculate and his needs. Synthesis show that radix 2 CORDIC has the lowest clock delay, radix 8 CORDIC has the highest LUT usage and lowest register usage while Hybrid Radix 4 CORDIC had the highest clock delay.

Keywords: Multi, CORDIC, FPGA, Processor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2482
98 Three-Stage Mining Metals Supply Chain Coordination and Product Quality Improvement with Revenue Sharing Contract

Authors: Hamed Homaei, Iraj Mahdavi, Ali Tajdin

Abstract:

One of the main concerns of miners is to increase the quality level of their products because the mining metals price depends on their quality level; however, increasing the quality level of these products has different costs at different levels of the supply chain. These costs usually increase after extractor level. This paper studies the coordination issue of a decentralized three-level supply chain with one supplier (extractor), one mineral processor and one manufacturer in which the increasing product quality level cost at the processor level is higher than the supplier and at the level of the manufacturer is more than the processor. We identify the optimal product quality level for each supply chain member by designing a revenue sharing contract. Finally, numerical examples show that the designed contract not only increases the final product quality level but also provides a win-win condition for all supply chain members and increases the whole supply chain profit.

Keywords: Three-stage supply chain, product quality improvement, channel coordination, revenue sharing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 782
97 A Virtual Simulation Environment for a Design and Verification of a GPGPU

Authors: Kwang Y. Lee, Tae R. Park, Jae C. Kwak, Yong S. Koo

Abstract:

When a small H/W IP is designed, we can develop an appropriate verification environment by observing the simulated signal waves, or using the serial test vectors for the fixed output. In the case of design and verification of a massive parallel processor with multiple IPs, it-s difficult to make a verification system with existing common verification environment, and to verify each partial IP. A TestDrive verification environment can build easy and reliable verification system that can produce highly intuitive results by applying Modelsim and SystemVerilog-s DPI. It shows many advantages, for example a high-level design of a GPGPU processor design can be migrate to FPGA board immediately.

Keywords: Virtual Simulation, Verification, IP Design, GPGPU

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1441
96 Digital Automatic Gain Control Integrated on WLAN Platform

Authors: Emilija Miletic, Milos Krstic, Maxim Piz, Michael Methfessel

Abstract:

In this work we present a solution for DAGC (Digital Automatic Gain Control) in WLAN receivers compatible to IEEE 802.11a/g standard. Those standards define communication in 5/2.4 GHz band using Orthogonal Frequency Division Multiplexing OFDM modulation scheme. WLAN Transceiver that we have used enables gain control over Low Noise Amplifier (LNA) and a Variable Gain Amplifier (VGA). The control over those signals is performed in our digital baseband processor using dedicated hardware block DAGC. DAGC in this process is used to automatically control the VGA and LNA in order to achieve better signal-to-noise ratio, decrease FER (Frame Error Rate) and hold the average power of the baseband signal close to the desired set point. DAGC function in baseband processor is done in few steps: measuring power levels of baseband samples of an RF signal,accumulating the differences between the measured power level and actual gain setting, adjusting a gain factor of the accumulation, and applying the adjusted gain factor the baseband values. Based on the measurement results of RSSI signal dependence to input power we have concluded that this digital AGC can be implemented applying the simple linearization of the RSSI. This solution is very simple but also effective and reduces complexity and power consumption of the DAGC. This DAGC is implemented and tested both in FPGA and in ASIC as a part of our WLAN baseband processor. Finally, we have integrated this circuit in a compact WLAN PCMCIA board based on MAC and baseband ASIC chips designed from us.

Keywords: WLAN, AGC, RSSI, baseband processor

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3740
95 Ec-A: A Task Allocation Algorithm for Energy Minimization in Multiprocessor Systems

Authors: Anju S. Pillai, T.B. Isha

Abstract:

With the necessity of increased processing capacity with less energy consumption; power aware multiprocessor system has gained more attention in the recent future. One of the additional challenges that is to be solved in a multi-processor system when compared to uni-processor system is job allocation. This paper presents a novel task dependent job allocation algorithm: Energy centric- Allocation (Ec-A) and Rate Monotonic (RM) scheduling to minimize energy consumption in a multiprocessor system. A simulation analysis is carried out to verify the performance increase with reduction in energy consumption and required number of processors in the system.

Keywords: Energy consumption, Job allocation, Multiprocessor systems, Task dependent.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1749
94 Designing and Implementing a Novel Scheduler for Multiprocessor System using Genetic Algorithm

Authors: Iman Zangeneh, Mostafa Moradi, Mazyar Baranpouyan

Abstract:

System is using multiple processors for computing and information processing, is increasing rapidly speed operation of these systems compared with single processor systems, very significant impact on system performance is increased .important differences to yield a single multi-processor cpu, the scheduling policies, to reduce the implementation time of all processes. Notwithstanding the famous algorithms such as SPT, LPT, LSPT and RLPT for scheduling and there, but none led to the answer are not optimal.In this paper scheduling using genetic algorithms and innovative way to finish the whole process faster that we do and the result compared with three algorithms we mentioned.

Keywords: Multiprocessor system, genetic algorithms, time implementation process.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1331
93 64 bit Computer Architectures for Space Applications – A study

Authors: Niveditha Domse, Kris Kumar, K. N. Balasubramanya Murthy

Abstract:

The more recent satellite projects/programs makes extensive usage of real – time embedded systems. 16 bit processors which meet the Mil-Std-1750 standard architecture have been used in on-board systems. Most of the Space Applications have been written in ADA. From a futuristic point of view, 32 bit/ 64 bit processors are needed in the area of spacecraft computing and therefore an effort is desirable in the study and survey of 64 bit architectures for space applications. This will also result in significant technology development in terms of VLSI and software tools for ADA (as the legacy code is in ADA). There are several basic requirements for a special processor for this purpose. They include Radiation Hardened (RadHard) devices, very low power dissipation, compatibility with existing operational systems, scalable architectures for higher computational needs, reliability, higher memory and I/O bandwidth, predictability, realtime operating system and manufacturability of such processors. Further on, these may include selection of FPGA devices, selection of EDA tool chains, design flow, partitioning of the design, pin count, performance evaluation, timing analysis etc. This project deals with a brief study of 32 and 64 bit processors readily available in the market and designing/ fabricating a 64 bit RISC processor named RISC MicroProcessor with added functionalities of an extended double precision floating point unit and a 32 bit signal processing unit acting as co-processors. In this paper, we emphasize the ease and importance of using Open Core (OpenSparc T1 Verilog RTL) and Open “Source" EDA tools such as Icarus to develop FPGA based prototypes quickly. Commercial tools such as Xilinx ISE for Synthesis are also used when appropriate.

Keywords: RISC MicroProcessor, RPC – RISC Processor Core, PBX – Processor to Block Interface part of the Interconnection Network, BPX – Block to Processor Interface part of the Interconnection Network, FPU – Floating Point Unit, SPU – Signal Processing Unit, WB – Wishbone Interface, CTU – Clock and Test Unit

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2018
92 Benchmarking: Performance on ALPS and Formosa Clusters

Authors: Chih-Wei Hsieh, Chau-Yi Chou, Sheng-HsiuKuo, Tsung-Che Tsai, I-Chen Wu

Abstract:

This paper presents the benchmarking results and performance evaluation of differentclustersbuilt atthe National Center for High-Performance Computingin Taiwan. Performance of processor, memory subsystem andinterconnect is a critical factor in the overall performance of high performance computing platforms. The evaluation compares different system architecture and software platforms. Most supercomputer used HPL to benchmark their system performance, in accordance with the requirement of the TOP500 List. In this paper we consider system memory access factors that affect benchmark performance, such as processor and memory performance.We hope these works will provide useful information for future development and construct cluster system.

Keywords: Performance Evaluation, Benchmarking and High-Performance Computing

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1342
91 A Case Study of Limited Dynamic Voltage Frequency Scaling in Low-Power Processors

Authors: Hwan Su Jung, Ahn Jun Gil, Jong Tae Kim

Abstract:

Power management techniques are necessary to save power in the microprocessor. By changing the frequency and/or operating voltage of processor, DVFS can control power consumption. In this paper, we perform a case study to find optimal power state transition for DVFS. We propose the equation to find the optimal ratio between executions of states while taking into account the deadline of processing time and the power state transition delay overhead. The experiment is performed on the Cortex-M4 processor, and average 6.5% power saving is observed when DVFS is applied under the deadline condition.

Keywords: Deadline, Dynamic Voltage Frequency Scaling, Power State Transition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 713
90 2D and 3D Finite Element Method Packages of CEMTool for Engineering PDE Problems

Authors: Choon Ki Ahn, Jung Hun Park, Wook Hyun Kwon

Abstract:

CEMTool is a command style design and analyzing package for scientific and technological algorithm and a matrix based computation language. In this paper, we present new 2D & 3D finite element method (FEM) packages for CEMTool. We discuss the detailed structures and the important features of pre-processor, solver, and post-processor of CEMTool 2D & 3D FEM packages. In contrast to the existing MATLAB PDE Toolbox, our proposed FEM packages can deal with the combination of the reserved words. Also, we can control the mesh in a very effective way. With the introduction of new mesh generation algorithm and fast solving technique, our FEM packages can guarantee the shorter computational time than MATLAB PDE Toolbox. Consequently, with our new FEM packages, we can overcome some disadvantages or limitations of the existing MATLAB PDE Toolbox.

Keywords: CEMTool, Finite element method (FEM), Numericalanalysis, Partial differential equation (PDE)

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3607
89 A Message Passing Implementation of a New Parallel Arrangement Algorithm

Authors: Ezequiel Herruzo, Juan José Cruz, José Ignacio Benavides, Oscar Plata

Abstract:

This paper describes a new algorithm of arrangement in parallel, based on Odd-Even Mergesort, called division and concurrent mixes. The main idea of the algorithm is to achieve that each processor uses a sequential algorithm for ordering a part of the vector, and after that, for making the processors work in pairs in order to mix two of these sections ordered in a greater one, also ordered; after several iterations, the vector will be completely ordered. The paper describes the implementation of the new algorithm on a Message Passing environment (such as MPI). Besides, it compares the obtained experimental results with the quicksort sequential algorithm and with the parallel implementations (also on MPI) of the algorithms quicksort and bitonic sort. The comparison has been realized in an 8 processors cluster under GNU/Linux which is running on a unique PC processor.

Keywords: Parallel algorithm, arrangement, MPI, sorting, parallel program.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1441
88 Performance Evaluation of a Prioritized, Limited Multi-Server Processor-Sharing System That Includes Servers with Various Capacities

Authors: Yoshiaki Shikata, Nobutane Hanayama

Abstract:

We present a prioritized, limited multi-server processor sharing (PS) system where each server has various capacities, and N (≥2) priority classes are allowed in each PS server. In each prioritized, limited server, different service ratio is assigned to each class request, and the number of requests to be processed is limited to less than a certain number. Routing strategies of such prioritized, limited multi-server PS systems that take into account the capacity of each server are also presented, and a performance evaluation procedure for these strategies is discussed. Practical performance measures of these strategies, such as loss probability, mean waiting time, and mean sojourn time, are evaluated via simulation. In the PS server, at the arrival (or departure) of a request, the extension (shortening) of the remaining sojourn time of each request receiving service can be calculated by using the number of requests of each class and the priority ratio. Utilising a simulation program which executes these events and calculations, the performance of the proposed prioritized, limited multi-server PS rule can be analyzed. From the evaluation results, most suitable routing strategy for the loss or waiting system is clarified.

Keywords: Processor sharing, multi-server, various capacity, N priority classes, routing strategy, loss probability, mean sojourn time, mean waiting time, simulation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 806
87 Performance Evaluation of a Limited Round-Robin System

Authors: Yoshiaki Shikata

Abstract:

Performance of a limited Round-Robin (RR) rule is studied in order to clarify the characteristics of a realistic sharing model of a processor. Under the limited RR rule, the processor allocates to each request a fixed amount of time, called a quantum, in a fixed order. The sum of the requests being allocated these quanta is kept below a fixed value. Arriving requests that cannot be allocated quanta because of such a restriction are queued or rejected. Practical performance measures, such as the relationship between the mean sojourn time, the mean number of requests, or the loss probability and the quantum size are evaluated via simulation. In the evaluation, the requested service time of an arriving request is converted into a quantum number. One of these quanta is included in an RR cycle, which means a series of quanta allocated to each request in a fixed order. The service time of the arriving request can be evaluated using the number of RR cycles required to complete the service, the number of requests receiving service, and the quantum size. Then an increase or decrease in the number of quanta that are necessary before service is completed is reevaluated at the arrival or departure of other requests. Tracking these events and calculations enables us to analyze the performance of our limited RR rule. In particular, we obtain the most suitable quantum size, which minimizes the mean sojourn time, for the case in which the switching time for each quantum is considered.

Keywords: Limited RR rule, quantum, processor sharing, sojourn time, performance measures, simulation, loss probability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1072
86 QSI Dynamical Fetch Policy for SMT

Authors: Shu-Chiao Yang, Jong-Jiann Shieh

Abstract:

A Simultaneous Multithreading (SMT) Processor is capable of executing instructions from multiple threads in the same cycle. SMT in fact was introduced as a powerful architecture to superscalar to increase the throughput of the processor. Simultaneous Multithreading is a technique that permits multiple instructions from multiple independent applications or threads to compete limited resources each cycle. While the fetch unit has been identified as one of the major bottlenecks of SMT architecture, several fetch schemes were proposed by prior works to enhance the fetching efficiency and overall performance. In this paper, we propose a novel fetch policy called queue situation identifier (QSI) which counts some kind of long latency instructions of each thread each cycle then properly selects which threads to fetch next cycle. Simulation results show that in best case our fetch policy can achieve 30% on speedup and also can reduce the data cache level 1 miss rate.

Keywords: SMT, QSI, DL1 miss rate.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1073
85 Application-Specific Instruction Sets Processor with Implicit Registers to Improve Register Bandwidth

Authors: Ginhsuan Li, Chiuyun Hung, Desheng Chen, Yiwen Wang

Abstract:

Application-Specific Instruction (ASI ) set Processors (ASIP) have become an important design choice for embedded systems due to runtime flexibility, which cannot be provided by custom ASIC solutions. One major bottleneck in maximizing ASIP performance is the limitation on the data bandwidth between the General Purpose Register File (GPRF) and ASIs. This paper presents the Implicit Registers (IRs) to provide the desirable data bandwidth. An ASI Input/Output model is proposed to formulate the overheads of the additional data transfer between the GPRF and IRs, therefore, an IRs allocation algorithm is used to achieve the better performance by minimizing the number of extra data transfer instructions. The experiment results show an up to 3.33x speedup compared to the results without using IRs.

Keywords: Application-Specific Instruction-set Processors, data bandwidth, configurable processor, implicit register.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1314
84 FPGA based Relative Distance Measurement using Stereo Vision Technology

Authors: Manasi Pathade, Prachi Kadam, Renuka Kulkarni, Tejas Teredesai

Abstract:

In this paper, we propose a novel concept of relative distance measurement using Stereo Vision Technology and discuss its implementation on a FPGA based real-time image processor. We capture two images using two CCD cameras and compare them. Disparity is calculated for each pixel using a real time dense disparity calculation algorithm. This algorithm is based on the concept of indexed histogram for matching. Disparity being inversely proportional to distance (Proved Later), we can thus get the relative distances of objects in front of the camera. The output is displayed on a TV screen in the form of a depth image (optionally using pseudo colors). This system works in real time on a full PAL frame rate (720 x 576 active pixels @ 25 fps).

Keywords: Stereo Vision, Relative Distance Measurement, Indexed Histogram, Real time FPGA Image Processor

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2661