Search results for: Application-Specific Instruction-set Processors

82 Performance Analysis of Digital Signal Processors Using SMV Benchmark

Authors: Erh-Wen Hu, Cyril S. Ku, Andrew T. Russo, Bogong Su, Jian Wang

Abstract:

Unlike general-purpose processors, digital signal processors (DSP processors) are strongly application-dependent. To meet the needs for diverse applications, a wide variety of DSP processors based on different architectures ranging from the traditional to VLIW have been introduced to the market over the years. The functionality, performance, and cost of these processors vary over a wide range. In order to select a processor that meets the design criteria for an application, processor performance is usually the major concern for digital signal processing (DSP) application developers. Performance data are also essential for the designers of DSP processors to improve their design. Consequently, several DSP performance benchmarks have been proposed over the past decade or so. However, none of these benchmarks seem to have included recent new DSP applications. In this paper, we use a new benchmark that we recently developed to compare the performance of popular DSP processors from Texas Instruments and StarCore. The new benchmark is based on the Selectable Mode Vocoder (SMV), a speech-coding program from the recent third generation (3G) wireless voice applications. All benchmark kernels are compiled by the compilers of the respective DSP processors and run on their simulators. Weighted arithmetic mean of clock cycles and arithmetic mean of code size are used to compare the performance of five DSP processors. In addition, we studied how the performance of a processor is affected by code structure, features of processor architecture and optimization of compiler. The extensive experimental data gathered, analyzed, and presented in this paper should be helpful for DSP processor and compiler designers to meet their specific design goals.

Keywords: digital signal processors, DSP benchmark, instruction level parallelism, modified cyclomatic complexity, performance analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1551

81 Optimal Placement of Processors based on Effective Communication Load

Authors: A. R. Aswatha, T. Basavaraju, N. Bhaskara Rao

Abstract:

This paper presents a new technique for the optimum placement of processors to minimize the total effective communication load under multi-processor communication dominated environment. This is achieved by placing heavily loaded processors near each other and lightly loaded ones far away from one another in the physical grid locations. The results are mathematically proved for the Algorithms are described.

Keywords: Ascending Sort Index Vector, EffectiveCommunication Load, Effective Distance Matrix, OptimalPlacement, Sorting Order.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1297

80 Parallel Vector Processing Using Multi Level Orbital DATA

Authors: Nagi Mekhiel

Abstract:

Many applications use vector operations by applying single instruction to multiple data that map to different locations in conventional memory. Transferring data from memory is limited by access latency and bandwidth affecting the performance gain of vector processing. We present a memory system that makes all of its content available to processors in time so that processors need not to access the memory, we force each location to be available to all processors at a specific time. The data move in different orbits to become available to other processors in higher orbits at different time. We use this memory to apply parallel vector operations to data streams at first orbit level. Data processed in the first level move to upper orbit one data element at a time, allowing a processor in that orbit to apply another vector operation to deal with serial code limitations inherited in all parallel applications and interleaved it with lower level vector operations.

Keywords: Memory organization, parallel processors, serial code, vector processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1014

79 A Methodology for the Synthesis of Multi-Processors

Authors: Hamid Yasinian

Abstract:

Random epistemologies and hash tables have garnered minimal interest from both security experts and experts in the last several years. In fact, few information theorists would disagree with the evaluation of expert systems. In our research, we discover how flip-flop gates can be applied to the study of superpages. Though such a hypothesis at first glance seems perverse, it is derived from known results.

Keywords: Synthesis, Multi-Processors, Interactive Model, Moor’s Law.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2259

78 A New High Speed Neural Model for Fast Character Recognition Using Cross Correlation and Matrix Decomposition

Authors: Hazem M. El-Bakry

Abstract:

Neural processors have shown good results for detecting a certain character in a given input matrix. In this paper, a new idead to speed up the operation of neural processors for character detection is presented. Such processors are designed based on cross correlation in the frequency domain between the input matrix and the weights of neural networks. This approach is developed to reduce the computation steps required by these faster neural networks for the searching process. The principle of divide and conquer strategy is applied through image decomposition. Each image is divided into small in size sub-images and then each one is tested separately by using a single faster neural processor. Furthermore, faster character detection is obtained by using parallel processing techniques to test the resulting sub-images at the same time using the same number of faster neural networks. In contrast to using only faster neural processors, the speed up ratio is increased with the size of the input image when using faster neural processors and image decomposition. Moreover, the problem of local subimage normalization in the frequency domain is solved. The effect of image normalization on the speed up ratio of character detection is discussed. Simulation results show that local subimage normalization through weight normalization is faster than subimage normalization in the spatial domain. The overall speed up ratio of the detection process is increased as the normalization of weights is done off line.

Keywords: Fast Character Detection, Neural Processors, Cross Correlation, Image Normalization, Parallel Processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1488

77 Lattice Boltzmann Simulation of Binary Mixture Diffusion Using Modern Graphics Processors

Authors: Mohammad Amin Safi, Mahmud Ashrafizaadeh, Amir Ali Ashrafizaadeh

Abstract:

A highly optimized implementation of binary mixture diffusion with no initial bulk velocity on graphics processors is presented. The lattice Boltzmann model is employed for simulating the binary diffusion of oxygen and nitrogen into each other with different initial concentration distributions. Simulations have been performed using the latest proposed lattice Boltzmann model that satisfies both the indifferentiability principle and the H-theorem for multi-component gas mixtures. Contemporary numerical optimization techniques such as memory alignment and increasing the multiprocessor occupancy are exploited along with some novel optimization strategies to enhance the computational performance on graphics processors using the C for CUDA programming language. Speedup of more than two orders of magnitude over single-core processors is achieved on a variety of Graphical Processing Unit (GPU) devices ranging from conventional graphics cards to advanced, high-end GPUs, while the numerical results are in excellent agreement with the available analytical and numerical data in the literature.

Keywords: Lattice Boltzmann model, Graphical processing unit, Binary mixture diffusion, 2D flow simulations, Optimized algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1499

76 Digital Predistorter with Pipelined Architecture Using CORDIC Processors

Authors: Kyunghoon Kim, Sungjoon Shim, Jun Tae Kim, Jong Tae Kim

Abstract:

In a wireless communication system, a predistorter(PD) is often employed to alleviate nonlinear distortions due to operating a power amplifier near saturation, thereby improving the system performance and reducing the interference to adjacent channels. This paper presents a new adaptive polynomial digital predistorter(DPD). The proposed DPD uses Coordinate Rotation Digital Computing(CORDIC) processors and PD process by pipelined architecture. It is simpler and faster than conventional adaptive polynomial DPD. The performance of the proposed DPD is proved by MATLAB simulation.

Keywords: DPD, CORDIC.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1738

75 Design of Local Interconnect Network Controller for Automotive Applications

Authors: Jong-Bae Lee, Seongsoo Lee

Abstract:

Local interconnect network (LIN) is a communication protocol that combines sensors, actuators, and processors to a functional module in automotive applications. In this paper, a LIN ver. 2.2A controller was designed in Verilog hardware description language (Verilog HDL) and implemented in field-programmable gate array (FPGA). Its operation was verified by making full-scale LIN network with the presented FPGA-implemented LIN controller, commercial LIN transceivers, and commercial processors. When described in Verilog HDL and synthesized in 0.18 μm technology, its gate size was about 2,300 gates.

Keywords: Local interconnect network, controller, transceiver, processor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1518

74 Rural Women’s Skill Acquisition in the Processing of Locust Bean in Ipokia Local Government Area of Ogun State, Nigeria

Authors: A. A. Adekunle, A. M. Omoare, W. O. Oyediran

Abstract:

This study was carried out to assess rural women’s skill acquisition in the processing of locust bean in Ipokia Local Government Area of Ogun State, Nigeria. Simple random sampling technique was used to select 90 women locust bean processors for this study. Data were analyzed with descriptive statistics and Pearson Product Moment Correlation. The result showed that the mean age of respondents was 40.72 years. Most (70.00%) of the respondents were married. The mean processing experience was 8.63 years. 93.30% of the respondents relied on information from fellow locust beans processors and friends. All (100%) the respondents did not acquire improved processing skill through trainings and workshops. It can be concluded that the rural women’s skill acquisition on modernized processing techniques was generally low. It is hereby recommend that the rural women processors should be trained by extension service providers through series of workshops and seminars on improved processing techniques.

Keywords: Locust bean, processing, skill acquisition, rural women.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2770

73 64 bit Computer Architectures for Space Applications – A study

Authors: Niveditha Domse, Kris Kumar, K. N. Balasubramanya Murthy

Abstract:

The more recent satellite projects/programs makes extensive usage of real – time embedded systems. 16 bit processors which meet the Mil-Std-1750 standard architecture have been used in on-board systems. Most of the Space Applications have been written in ADA. From a futuristic point of view, 32 bit/ 64 bit processors are needed in the area of spacecraft computing and therefore an effort is desirable in the study and survey of 64 bit architectures for space applications. This will also result in significant technology development in terms of VLSI and software tools for ADA (as the legacy code is in ADA). There are several basic requirements for a special processor for this purpose. They include Radiation Hardened (RadHard) devices, very low power dissipation, compatibility with existing operational systems, scalable architectures for higher computational needs, reliability, higher memory and I/O bandwidth, predictability, realtime operating system and manufacturability of such processors. Further on, these may include selection of FPGA devices, selection of EDA tool chains, design flow, partitioning of the design, pin count, performance evaluation, timing analysis etc. This project deals with a brief study of 32 and 64 bit processors readily available in the market and designing/ fabricating a 64 bit RISC processor named RISC MicroProcessor with added functionalities of an extended double precision floating point unit and a 32 bit signal processing unit acting as co-processors. In this paper, we emphasize the ease and importance of using Open Core (OpenSparc T1 Verilog RTL) and Open “Source" EDA tools such as Icarus to develop FPGA based prototypes quickly. Commercial tools such as Xilinx ISE for Synthesis are also used when appropriate.

Keywords: RISC MicroProcessor, RPC – RISC Processor Core, PBX – Processor to Block Interface part of the Interconnection Network, BPX – Block to Processor Interface part of the Interconnection Network, FPU – Floating Point Unit, SPU – Signal Processing Unit, WB – Wishbone Interface, CTU – Clock and Test Unit

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2199

72 An Approach to Task Modeling for User Interface Design

Authors: Costin Pribeanu

Abstract:

The model-based approach to user interface design relies on developing separate models capturing various aspects about users, tasks, application domain, presentation and dialog structures. This paper presents a task modeling approach for user interface design and aims at exploring mappings between task, domain and presentation models. The basic idea of our approach is to identify typical configurations in task and domain models and to investigate how they relate each other. A special emphasis is put on applicationspecific functions and mappings between domain objects and operational task structures. In this respect, we will address two layers in task decomposition: a functional (planning) layer and an operational layer.

Keywords: task modeling, user interface design, unit tasks, basic tasks, operational task model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1832

71 A Message Passing Implementation of a New Parallel Arrangement Algorithm

Authors: Ezequiel Herruzo, Juan José Cruz, José Ignacio Benavides, Oscar Plata

Abstract:

This paper describes a new algorithm of arrangement in parallel, based on Odd-Even Mergesort, called division and concurrent mixes. The main idea of the algorithm is to achieve that each processor uses a sequential algorithm for ordering a part of the vector, and after that, for making the processors work in pairs in order to mix two of these sections ordered in a greater one, also ordered; after several iterations, the vector will be completely ordered. The paper describes the implementation of the new algorithm on a Message Passing environment (such as MPI). Besides, it compares the obtained experimental results with the quicksort sequential algorithm and with the parallel implementations (also on MPI) of the algorithms quicksort and bitonic sort. The comparison has been realized in an 8 processors cluster under GNU/Linux which is running on a unique PC processor.

Keywords: Parallel algorithm, arrangement, MPI, sorting, parallel program.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1624

70 Application-Specific Instruction Sets Processor with Implicit Registers to Improve Register Bandwidth

Authors: Ginhsuan Li, Chiuyun Hung, Desheng Chen, Yiwen Wang

Abstract:

Application-Specific Instruction (ASI ) set Processors (ASIP) have become an important design choice for embedded systems due to runtime flexibility, which cannot be provided by custom ASIC solutions. One major bottleneck in maximizing ASIP performance is the limitation on the data bandwidth between the General Purpose Register File (GPRF) and ASIs. This paper presents the Implicit Registers (IRs) to provide the desirable data bandwidth. An ASI Input/Output model is proposed to formulate the overheads of the additional data transfer between the GPRF and IRs, therefore, an IRs allocation algorithm is used to achieve the better performance by minimizing the number of extra data transfer instructions. The experiment results show an up to 3.33x speedup compared to the results without using IRs.

Keywords: Application-Specific Instruction-set Processors, data bandwidth, configurable processor, implicit register.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1500

69 Optimization of Distributed Processors for Power System: Kalman Filters using Petri Net

Authors: Anant Oonsivilai, Kenedy A. Greyson

Abstract:

The growth and interconnection of power networks in many regions has invited complicated techniques for energy management services (EMS). State estimation techniques become a powerful tool in power system control centers, and that more information is required to achieve the objective of EMS. For the online state estimator, assuming the continuous time is equidistantly sampled with period Δt, processing events must be finished within this period. Advantage of Kalman Filtering (KF) algorithm in using system information to improve the estimation precision is utilized. Computational power is a major issue responsible for the achievement of the objective, i.e. estimators- solution at a small sampled period. This paper presents the optimum utilization of processors in a state estimator based on KF. The model used is presented using Petri net (PN) theory.

Keywords: Kalman filters, model, Petri Net, power system, sequential State estimator.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1308

68 Parallel Direct Integration Variable Step Block Method for Solving Large System of Higher Order Ordinary Differential Equations

Authors: Zanariah Abdul Majid, Mohamed Suleiman

Abstract:

The aim of this paper is to investigate the performance of the developed two point block method designed for two processors for solving directly non stiff large systems of higher order ordinary differential equations (ODEs). The method calculates the numerical solution at two points simultaneously and produces two new equally spaced solution values within a block and it is possible to assign the computational tasks at each time step to a single processor. The algorithm of the method was developed in C language and the parallel computation was done on a parallel shared memory environment. Numerical results are given to compare the efficiency of the developed method to the sequential timing. For large problems, the parallel implementation produced 1.95 speed-up and 98% efficiency for the two processors.

Keywords: Numerical methods, parallel method, block method, higher order ODEs.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1332

67 A Fuzzy Dynamic Load Balancing Algorithm for Homogenous Distributed Systems

Authors: Ali M. Alakeel

Abstract:

Load balancing in distributed computer systems is the process of redistributing the work load among processors in the system to improve system performance. Most of previous research in using fuzzy logic for the purpose of load balancing has only concentrated in utilizing fuzzy logic concepts in describing processors load and tasks execution length. The responsibility of the fuzzy-based load balancing process itself, however, has not been discussed and in most reported work is assumed to be performed in a distributed fashion by all nodes in the network. This paper proposes a new fuzzy dynamic load balancing algorithm for homogenous distributed systems. The proposed algorithm utilizes fuzzy logic in dealing with inaccurate load information, making load distribution decisions, and maintaining overall system stability. In terms of control, we propose a new approach that specifies how, when, and by which node the load balancing is implemented. Our approach is called Centralized-But-Distributed (CBD).

Keywords: Dynamic load balancing, fuzzy logic, distributed systems, algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2389

66 Performance Evaluation of Task Scheduling Algorithm on LCQ Network

Authors: Zaki Ahmad Khan, Jamshed Siddiqui, Abdus Samad

Abstract:

The Scheduling and mapping of tasks on a set of processors is considered as a critical problem in parallel and distributed computing system. This paper deals with the problem of dynamic scheduling on a special type of multiprocessor architecture known as Linear Crossed Cube (LCQ) network. This proposed multiprocessor is a hybrid network which combines the features of both linear types of architectures as well as cube based architectures. Two standard dynamic scheduling schemes namely Minimum Distance Scheduling (MDS) and Two Round Scheduling (TRS) schemes are implemented on the LCQ network. Parallel tasks are mapped and the imbalance of load is evaluated on different set of processors in LCQ network. The simulations results are evaluated and effort is made by means of through analysis of the results to obtain the best solution for the given network in term of load imbalance left and execution time. The other performance matrices like speedup and efficiency are also evaluated with the given dynamic algorithms.

Keywords: Dynamic algorithm, Load imbalance, Mapping, Task scheduling.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1968

65 Run-Time Customisation of Soft-Core CPUs on Field Programmable Gate Array

Authors: Rehab Abdullah Shendi

Abstract:

The use of customised soft-core processors in which instructions can be integrated into a system in application hardware is increasing in the Field Programmable Gate Array (FPGA) field. Specifically, the partial run-time reconfiguration of FPGAs in specialised processors for a particular domain can be very beneficial. In this report, the design and implementation for the customisation of a soft-core MIPS processor using an FPGA and partial reconfiguration (PR) of FPGA technology will be addressed to achieve efficient resource use. This can be achieved using a PR design flow that helps the design fit into a smaller device. Moreover, the impact of static power consumption could be reduced due to runtime reconfiguration. This will be done by configurable custom instructions implemented in the hardware as an extension on the MIPS CPU. The aim of this project is to investigate the PR of FPGAs for run-time adaptations of the instruction set of a soft-core CPU, including the integration of custom instructions and the exploration of the potential to use the MultiBoot feature available in Xilinx FPGAs to carry out the PR process. The system will be evaluated and tested on a Nexus 3 development board featuring a Xilinx Spartran-6 FPGA. The system will be able to load reconfigurable custom instructions dynamically into user programs with the help of the trap handler when the custom instruction is called by the MIPS CPU. The results of this experiment demonstrate that custom instructions in hardware can speed up a certain function and many instructions can be saved when compared to a software implementation of the same function. Implementing custom instructions in hardware is perfectly possible and worth exploring.

Keywords: Customisation, FPGA, MIPS, partial reconfiguration.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1141

64 Numerical Analysis of All-Optical Microwave Mixing and Bandpass Filtering in an RoF Link

Authors: S. Khosroabadi, M. R. Salehi

Abstract:

In this paper, all-optical signal processors that perform both microwave mixing and bandpass filtering in a radio-over-fiber (RoF) link are presented. The key device is a Mach-Zehnder modulator (MZM) which performs all-optical microwave mixing. An up-converted microwave signal is obtained and other unwanted frequency components are suppressed at the end of the fiber span.

Keywords: Microwave mixing, bandpass filtering, all-optical, signal processing, MZM.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1674

63 An Improved Design of Area Efficient Two Bit Comparator

Authors: Shashank Gautam, Pramod Sharma

Abstract:

In present era, development of digital circuits, signal processors and other integrated circuits, magnitude comparators are challenged by large area and more power consumption. Comparator is most basic circuit that performs comparison. This paper presents a technique to design a two bit comparator which consumes less area and power. DSCH and MICROWIND version 3 are used to design the schematic and design the layout of the schematic, observe the performance parameters at different nanometer technologies respectively.

Keywords: Chip design, consumed power, layout area, two bit comparator.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1162

62 Enhanced Disk-Based Databases Towards Improved Hybrid In-Memory Systems

Authors: Samuel Kaspi, Sitalakshmi Venkatraman

Abstract:

In-memory database systems are becoming popular due to the availability and affordability of sufficiently large RAM and processors in modern high-end servers with the capacity to manage large in-memory database transactions. While fast and reliable inmemory systems are still being developed to overcome cache misses, CPU/IO bottlenecks and distributed transaction costs, disk-based data stores still serve as the primary persistence. In addition, with the recent growth in multi-tenancy cloud applications and associated security concerns, many organisations consider the trade-offs and continue to require fast and reliable transaction processing of diskbased database systems as an available choice. For these organizations, the only way of increasing throughput is by improving the performance of disk-based concurrency control. This warrants a hybrid database system with the ability to selectively apply an enhanced disk-based data management within the context of inmemory systems that would help improve overall throughput. The general view is that in-memory systems substantially outperform disk-based systems. We question this assumption and examine how a modified variation of access invariance that we call enhanced memory access, (EMA) can be used to allow very high levels of concurrency in the pre-fetching of data in disk-based systems. We demonstrate how this prefetching in disk-based systems can yield close to in-memory performance, which paves the way for improved hybrid database systems. This paper proposes a novel EMA technique and presents a comparative study between disk-based EMA systems and in-memory systems running on hardware configurations of equivalent power in terms of the number of processors and their speeds. The results of the experiments conducted clearly substantiate that when used in conjunction with all concurrency control mechanisms, EMA can increase the throughput of disk-based systems to levels quite close to those achieved by in-memory system. The promising results of this work show that enhanced disk-based systems facilitate in improving hybrid data management within the broader context of in-memory systems.

Keywords: Concurrency control, disk-based databases, inmemory systems, enhanced memory access (EMA).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1985

61 Architecture Based on Dynamic Graphs for the Dynamic Reconfiguration of Farms of Computers

Authors: Carmen Navarrete, Eloy Anguiano

Abstract:

In the last years, the computers have increased their capacity of calculus and networks, for the interconnection of these machines. The networks have been improved until obtaining the actual high rates of data transferring. The programs that nowadays try to take advantage of these new technologies cannot be written using the traditional techniques of programming, since most of the algorithms were designed for being executed in an only processor,in a nonconcurrent form instead of being executed concurrently ina set of processors working and communicating through a network.This paper aims to present the ongoing development of a new system for the reconfiguration of grouping of computers, taking into account these new technologies.

Keywords: Dynamic network topology, resource and task allocation, parallel computing, heterogeneous computing, dynamic reconfiguration.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1315

60 Design of Auto Exposure Unit Based On 2-Way Histogram Equalization

Authors: Junghwan Choi, Seongsoo Lee

Abstract:

Histogram equalization is often used in image enhancement, but it can be also used in auto exposure. However, conventional histogram equalization does not work well when many pixels are concentrated in a narrow luminance range.This paper proposes an auto exposure method based on 2-way histogram equalization. Two cumulative distribution functions are used, where one is from dark to bright and the other is from bright to dark. In this paper, the proposed auto exposure method is also designed and implemented for image signal processors with full-HD images.

Keywords: Histogram equalization, Auto exposure, Image signal processor, Low-cost, Full HD Video.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3340

59 An Innovational Intermittent Algorithm in Networks-On-Chip (NOC)

Authors: Ahmad M. Shafiee, Mehrdad Montazeri, Mahdi Nikdast

Abstract:

Every day human life experiences new equipments more automatic and with more abilities. So the need for faster processors doesn-t seem to finish. Despite new architectures and higher frequencies, a single processor is not adequate for many applications. Parallel processing and networks are previous solutions for this problem. The new solution to put a network of resources on a chip is called NOC (network on a chip). The more usual topology for NOC is mesh topology. There are several routing algorithms suitable for this topology such as XY, fully adaptive, etc. In this paper we have suggested a new algorithm named Intermittent X, Y (IX/Y). We have developed the new algorithm in simulation environment to compare delay and power consumption with elders' algorithms.

Keywords: Computer architecture, parallel computing, NOC, routing algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1631

58 Ec-A: A Task Allocation Algorithm for Energy Minimization in Multiprocessor Systems

Authors: Anju S. Pillai, T.B. Isha

Abstract:

With the necessity of increased processing capacity with less energy consumption; power aware multiprocessor system has gained more attention in the recent future. One of the additional challenges that is to be solved in a multi-processor system when compared to uni-processor system is job allocation. This paper presents a novel task dependent job allocation algorithm: Energy centric- Allocation (Ec-A) and Rate Monotonic (RM) scheduling to minimize energy consumption in a multiprocessor system. A simulation analysis is carried out to verify the performance increase with reduction in energy consumption and required number of processors in the system.

Keywords: Energy consumption, Job allocation, Multiprocessor systems, Task dependent.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2125

57 Designing and Implementing a Novel Scheduler for Multiprocessor System using Genetic Algorithm

Authors: Iman Zangeneh, Mostafa Moradi, Mazyar Baranpouyan

Abstract:

System is using multiple processors for computing and information processing, is increasing rapidly speed operation of these systems compared with single processor systems, very significant impact on system performance is increased .important differences to yield a single multi-processor cpu, the scheduling policies, to reduce the implementation time of all processes. Notwithstanding the famous algorithms such as SPT, LPT, LSPT and RLPT for scheduling and there, but none led to the answer are not optimal.In this paper scheduling using genetic algorithms and innovative way to finish the whole process faster that we do and the result compared with three algorithms we mentioned.

Keywords: Multiprocessor system, genetic algorithms, time implementation process.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1518

56 Performance Enhancement of Motion Estimation Using SSE2 Technology

Authors: Trung Hieu Tran, Hyo-Moon Cho, Sang-Bock Cho

Abstract:

Motion estimation is the most computationally intensive part in video processing. Many fast motion estimation algorithms have been proposed to decrease the computational complexity by reducing the number of candidate motion vectors. However, these studies are for fast search algorithms themselves while almost image and video compressions are operated with software based. Therefore, the timing constraints for running these motion estimation algorithms not only challenge for the video codec but also overwhelm for some of processors. In this paper, the performance of motion estimation is enhanced by using Intel's Streaming SIMD Extension 2 (SSE2) technology with Intel Pentium 4 processor.

Keywords: Motion Estimation, Full Search, Three StepSearch, MMX/SSE/SSE2 Technologies, SIMD.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2041

55 GPU-Based Volume Rendering for Medical Imagery

Authors: Hadjira Bentoumi, Pascal Gautron, Kadi Bouatouch

Abstract:

We present a method for fast volume rendering using graphics hardware (GPU). To our knowledge, it is the first implementation on the GPU. Based on the Shear-Warp algorithm, our GPU-based method provides real-time frame rates and outperforms the CPU-based implementation. When the number of slices is not sufficient, we add in-between slices computed by interpolation. This improves then the quality of the rendered images. We have also implemented the ray marching algorithm on the GPU. The results generated by the three algorithms (CPU-based and GPU-based Shear- Warp, GPU-based Ray Marching) for two test models has proved that the ray marching algorithm outperforms the shear-warp methods in terms of speed up and image quality.

Keywords: Volume rendering, graphics processors

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1802

54 A Case Study of Limited Dynamic Voltage Frequency Scaling in Low-Power Processors

Authors: Hwan Su Jung, Ahn Jun Gil, Jong Tae Kim

Abstract:

Power management techniques are necessary to save power in the microprocessor. By changing the frequency and/or operating voltage of processor, DVFS can control power consumption. In this paper, we perform a case study to find optimal power state transition for DVFS. We propose the equation to find the optimal ratio between executions of states while taking into account the deadline of processing time and the power state transition delay overhead. The experiment is performed on the Cortex-M4 processor, and average 6.5% power saving is observed when DVFS is applied under the deadline condition.

Keywords: Deadline, Dynamic Voltage Frequency Scaling, Power State Transition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 910

53 Performance Analysis of Load Balancing Algorithms

Authors: Sandeep Sharma, Sarabjit Singh, Meenakshi Sharma

Abstract:

Load balancing is the process of improving the performance of a parallel and distributed system through a redistribution of load among the processors [1] [5]. In this paper we present the performance analysis of various load balancing algorithms based on different parameters, considering two typical load balancing approaches static and dynamic. The analysis indicates that static and dynamic both types of algorithm can have advancements as well as weaknesses over each other. Deciding type of algorithm to be implemented will be based on type of parallel applications to solve. The main purpose of this paper is to help in design of new algorithms in future by studying the behavior of various existing algorithms.

Keywords: Load balancing (LB), workload, distributed systems, Static Load balancing, Dynamic Load Balancing

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5860