Search results for: multicore processor.

92 Performance Evaluation of a Prioritized, Limited Multi-Server Processor-Sharing System That Includes Servers with Various Capacities

Authors: Yoshiaki Shikata, Nobutane Hanayama

Abstract:

We present a prioritized, limited multi-server processor sharing (PS) system where each server has various capacities, and N (≥2) priority classes are allowed in each PS server. In each prioritized, limited server, different service ratio is assigned to each class request, and the number of requests to be processed is limited to less than a certain number. Routing strategies of such prioritized, limited multi-server PS systems that take into account the capacity of each server are also presented, and a performance evaluation procedure for these strategies is discussed. Practical performance measures of these strategies, such as loss probability, mean waiting time, and mean sojourn time, are evaluated via simulation. In the PS server, at the arrival (or departure) of a request, the extension (shortening) of the remaining sojourn time of each request receiving service can be calculated by using the number of requests of each class and the priority ratio. Utilising a simulation program which executes these events and calculations, the performance of the proposed prioritized, limited multi-server PS rule can be analyzed. From the evaluation results, most suitable routing strategy for the loss or waiting system is clarified.

Keywords: Processor sharing, multi-server, various capacity, N priority classes, routing strategy, loss probability, mean sojourn time, mean waiting time, simulation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 999

91 Performance Evaluation of a Limited Round-Robin System

Authors: Yoshiaki Shikata

Abstract:

Performance of a limited Round-Robin (RR) rule is studied in order to clarify the characteristics of a realistic sharing model of a processor. Under the limited RR rule, the processor allocates to each request a fixed amount of time, called a quantum, in a fixed order. The sum of the requests being allocated these quanta is kept below a fixed value. Arriving requests that cannot be allocated quanta because of such a restriction are queued or rejected. Practical performance measures, such as the relationship between the mean sojourn time, the mean number of requests, or the loss probability and the quantum size are evaluated via simulation. In the evaluation, the requested service time of an arriving request is converted into a quantum number. One of these quanta is included in an RR cycle, which means a series of quanta allocated to each request in a fixed order. The service time of the arriving request can be evaluated using the number of RR cycles required to complete the service, the number of requests receiving service, and the quantum size. Then an increase or decrease in the number of quanta that are necessary before service is completed is reevaluated at the arrival or departure of other requests. Tracking these events and calculations enables us to analyze the performance of our limited RR rule. In particular, we obtain the most suitable quantum size, which minimizes the mean sojourn time, for the case in which the switching time for each quantum is considered.

Keywords: Limited RR rule, quantum, processor sharing, sojourn time, performance measures, simulation, loss probability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1212

90 QSI Dynamical Fetch Policy for SMT

Authors: Shu-Chiao Yang, Jong-Jiann Shieh

Abstract:

A Simultaneous Multithreading (SMT) Processor is capable of executing instructions from multiple threads in the same cycle. SMT in fact was introduced as a powerful architecture to superscalar to increase the throughput of the processor. Simultaneous Multithreading is a technique that permits multiple instructions from multiple independent applications or threads to compete limited resources each cycle. While the fetch unit has been identified as one of the major bottlenecks of SMT architecture, several fetch schemes were proposed by prior works to enhance the fetching efficiency and overall performance. In this paper, we propose a novel fetch policy called queue situation identifier (QSI) which counts some kind of long latency instructions of each thread each cycle then properly selects which threads to fetch next cycle. Simulation results show that in best case our fetch policy can achieve 30% on speedup and also can reduce the data cache level 1 miss rate.

Keywords: SMT, QSI, DL1 miss rate.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1229

89 Application-Specific Instruction Sets Processor with Implicit Registers to Improve Register Bandwidth

Authors: Ginhsuan Li, Chiuyun Hung, Desheng Chen, Yiwen Wang

Abstract:

Application-Specific Instruction (ASI ) set Processors (ASIP) have become an important design choice for embedded systems due to runtime flexibility, which cannot be provided by custom ASIC solutions. One major bottleneck in maximizing ASIP performance is the limitation on the data bandwidth between the General Purpose Register File (GPRF) and ASIs. This paper presents the Implicit Registers (IRs) to provide the desirable data bandwidth. An ASI Input/Output model is proposed to formulate the overheads of the additional data transfer between the GPRF and IRs, therefore, an IRs allocation algorithm is used to achieve the better performance by minimizing the number of extra data transfer instructions. The experiment results show an up to 3.33x speedup compared to the results without using IRs.

Keywords: Application-Specific Instruction-set Processors, data bandwidth, configurable processor, implicit register.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1503

88 FPGA based Relative Distance Measurement using Stereo Vision Technology

Authors: Manasi Pathade, Prachi Kadam, Renuka Kulkarni, Tejas Teredesai

Abstract:

In this paper, we propose a novel concept of relative distance measurement using Stereo Vision Technology and discuss its implementation on a FPGA based real-time image processor. We capture two images using two CCD cameras and compare them. Disparity is calculated for each pixel using a real time dense disparity calculation algorithm. This algorithm is based on the concept of indexed histogram for matching. Disparity being inversely proportional to distance (Proved Later), we can thus get the relative distances of objects in front of the camera. The output is displayed on a TV screen in the form of a depth image (optionally using pseudo colors). This system works in real time on a full PAL frame rate (720 x 576 active pixels @ 25 fps).

Keywords: Stereo Vision, Relative Distance Measurement, Indexed Histogram, Real time FPGA Image Processor

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2959

87 Semi-Lagrangian Method for Advection Equation on GPU in Unstructured R3 Mesh for Fluid Dynamics Application

Authors: Irakli V. Gugushvili, Nickolay M. Evstigneev

Abstract:

Numerical integration of initial boundary problem for advection equation in 3 ℜ is considered. The method used is conditionally stable semi-Lagrangian advection scheme with high order interpolation on unstructured mesh. In order to increase time step integration the BFECC method with limiter TVD correction is used. The method is adopted on parallel graphic processor unit environment using NVIDIA CUDA and applied in Navier-Stokes solver. It is shown that the calculation on NVIDIA GeForce 8800 GPU is 184 times faster than on one processor AMDX2 4800+ CPU. The method is extended to the incompressible fluid dynamics solver. Flow over a Cylinder for 3D case is compared to the experimental data.

Keywords: Advection equations, CUDA technology, Flow overthe 3D Cylinder, Incompressible Pressure Projection Solver, Parallel computation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2787

86 An Efficient Implementation of High Speed Vedic Multiplier Using Compressors for Image Processing Applications

Authors: Shobha Sharma, Amita Dev, Akanksha Kant

Abstract:

Digital signal processor, image signal processor and FIR filters have multipliers as an important part of their design. On the basis of Vedic mathematics, Vedic multipliers have come out to be very fast multipliers. One of the image processing applications is edge detection. This research presents a small area and high speed 8 bit Vedic multiplier system comprising of compressor based adders. This results in faster edge detection. This architecture is tested on Xilinx vertex 4 FPGA board and simulations were carried out using the Xilinx synthesis tool. Comparisons are made and this system is found to be smaller in area with high speed (the lesser propagation delay). This compressor based Vedic multiplier is 1.1 times speedier than a typical Vedic multiplier. Also, this Vedic Multiplier is 2 times speedier than a ‘simple’ multiplier.

Keywords: Detection of edges, Vedic multiplier, image processing, Urdhva Tiryakbhyam sutra.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1767

85 A Simplified Adaptive Decision Feedback Equalization Technique for π/4-DQPSK Signals

Authors: V. Prapulla, A. Mitra, R. Bhattacharjee, S. Nandi

Abstract:

We present a simplified equalization technique for a π/4 differential quadrature phase shift keying ( π/4 -DQPSK) modulated signal in a multipath fading environment. The proposed equalizer is realized as a fractionally spaced adaptive decision feedback equalizer (FS-ADFE), employing exponential step-size least mean square (LMS) algorithm as the adaptation technique. The main advantage of the scheme stems from the usage of exponential step-size LMS algorithm in the equalizer, which achieves similar convergence behavior as that of a recursive least squares (RLS) algorithm with significantly reduced computational complexity. To investigate the finite-precision performance of the proposed equalizer along with the π/4 -DQPSK modem, the entire system is evaluated on a 16-bit fixed point digital signal processor (DSP) environment. The proposed scheme is found to be attractive even for those cases where equalization is to be performed within a restricted number of training samples.

Keywords: Adaptive decision feedback equalizer, Fractionally spaced equalizer, π/4 DQPSK signal, Digital signal processor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5680

84 Frequent Itemset Mining Using Rough-Sets

Authors: Usman Qamar, Younus Javed

Abstract:

Frequent pattern mining is the process of finding a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set. It was proposed in the context of frequent itemsets and association rule mining. Frequent pattern mining is used to find inherent regularities in data. What products were often purchased together? Its applications include basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis. However, one of the bottlenecks of frequent itemset mining is that as the data increase the amount of time and resources required to mining the data increases at an exponential rate. In this investigation a new algorithm is proposed which can be uses as a pre-processor for frequent itemset mining. FASTER (FeAture SelecTion using Entropy and Rough sets) is a hybrid pre-processor algorithm which utilizes entropy and roughsets to carry out record reduction and feature (attribute) selection respectively. FASTER for frequent itemset mining can produce a speed up of 3.1 times when compared to original algorithm while maintaining an accuracy of 71%.

Keywords: Rough-sets, Classification, Feature Selection, Entropy, Outliers, Frequent itemset mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2394

83 Detecting the Edge of Multiple Images in Parallel

Authors: Prakash K. Aithal, U. Dinesh Acharya, Rajesh Gopakumar

Abstract:

Edge is variation of brightness in an image. Edge detection is useful in many application areas such as finding forests, rivers from a satellite image, detecting broken bone in a medical image etc. The paper discusses about finding edge of multiple aerial images in parallel. The proposed work tested on 38 images 37 colored and one monochrome image. The time taken to process N images in parallel is equivalent to time taken to process 1 image in sequential. Message Passing Interface (MPI) and Open Computing Language (OpenCL) is used to achieve task and pixel level parallelism respectively.

Keywords: Edge detection, multicore, GPU, openCL, MPI.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2302

82 Development of Soft-Core System for Heart Rate and Oxygen Saturation

Authors: Caje F. Pinto, Jivan S. Parab, Gourish M. Naik

Abstract:

This paper is about the development of non-invasive heart rate and oxygen saturation in human blood using Altera NIOS II soft-core processor system. In today's world, monitoring oxygen saturation and heart rate is very important in hospitals to keep track of low oxygen levels in blood. We have designed an Embedded System On Peripheral Chip (SOPC) reconfigurable system by interfacing two LED’s of different wavelengths (660 nm/940 nm) with a single photo-detector to measure the absorptions of hemoglobin species at different wavelengths. The implementation of the interface with Finger Probe and Liquid Crystal Display (LCD) was carried out using NIOS II soft-core system running on Altera NANO DE0 board having target as Cyclone IVE. This designed system is used to monitor oxygen saturation in blood and heart rate for different test subjects. The designed NIOS II processor based non-invasive heart rate and oxygen saturation was verified with another Operon Pulse oximeter for 50 measurements on 10 different subjects. It was found that the readings taken were very close to the Operon Pulse oximeter.

Keywords: Heart rate, NIOS II, Oxygen Saturation, photoplethysmography, soft-core, SOPC.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1319

81 An Images Monitoring System based on Multi-Format Streaming Grid Architecture

Authors: Yi-Haur Shiau, Sun-In Lin, Shi-Wei Lo, Hsiu-Mei Chou, Yi-Hsuan Chen

Abstract:

This paper proposes a novel multi-format stream grid architecture for real-time image monitoring system. The system, based on a three-tier architecture, includes stream receiving unit, stream processor unit, and presentation unit. It is a distributed computing and a loose coupling architecture. The benefit is the amount of required servers can be adjusted depending on the loading of the image monitoring system. The stream receive unit supports multi capture source devices and multi-format stream compress encoder. Stream processor unit includes three modules; they are stream clipping module, image processing module and image management module. Presentation unit can display image data on several different platforms. We verified the proposed grid architecture with an actual test of image monitoring. We used a fast image matching method with the adjustable parameters for different monitoring situations. Background subtraction method is also implemented in the system. Experimental results showed that the proposed architecture is robust, adaptive, and powerful in the image monitoring system.

Keywords: Motion detection, grid architecture, image monitoring system, and background subtraction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1548

80 An Implementation of MacMahon's Partition Analysis in Ordering the Lower Bound of Processing Elements for the Algorithm of LU Decomposition

Authors: Halil Snopce, Ilir Spahiu, Lavdrim Elmazi

Abstract:

A lot of Scientific and Engineering problems require the solution of large systems of linear equations of the form bAx in an effective manner. LU-Decomposition offers good choices for solving this problem. Our approach is to find the lower bound of processing elements needed for this purpose. Here is used the so called Omega calculus, as a computational method for solving problems via their corresponding Diophantine relation. From the corresponding algorithm is formed a system of linear diophantine equalities using the domain of computation which is given by the set of lattice points inside the polyhedron. Then is run the Mathematica program DiophantineGF.m. This program calculates the generating function from which is possible to find the number of solutions to the system of Diophantine equalities, which in fact gives the lower bound for the number of processors needed for the corresponding algorithm. There is given a mathematical explanation of the problem as well. Keywordsgenerating function, lattice points in polyhedron, lower bound of processor elements, system of Diophantine equationsand : calculus.

Keywords: generating function, lattice points in polyhedron, lower bound of processor elements, system of Diophantine equations and calculus.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1434

79 The Data Processing Electronics of the METIS Coronagraph aboard the ESA Solar Orbiter Mission

Authors: M. Focardi, M. Pancrazzi, M. Uslenghi, G. Nicolini, E. Magli, F. Landini, M. Romoli, A. Bemporad, E. Antonucci, S. Fineschi, G. Naletto, P. Nicolosi, D. Spadaro, V. Andretta

Abstract:

METIS is the Multi Element Telescope for Imaging and Spectroscopy, a Coronagraph aboard the European Space Agency-s Solar Orbiter Mission aimed at the observation of the solar corona via both VIS and UV/EUV narrow-band imaging and spectroscopy. METIS, with its multi-wavelength capabilities, will study in detail the physical processes responsible for the corona heating and the origin and properties of the slow and fast solar wind. METIS electronics will collect and process scientific data thanks to its detectors proximity electronics, the digital front-end subsystem electronics and the MPPU, the Main Power and Processing Unit, hosting a space-qualified processor, memories and some rad-hard FPGAs acting as digital controllers.This paper reports on the overall METIS electronics architecture and data processing capabilities conceived to address all the scientific issues as a trade-off solution between requirements and allocated resources, just before the Preliminary Design Review as an ESA milestone in April 2012.

Keywords: Solar Coronagraph, Data Processing Electronics, VIS and UV/EUV Detectors, LEON Processor, Rad-hard FPGAs

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2510

78 A PIM (Processor-In-Memory) for Computer Graphics : Data Partitioning and Placement Schemes

Authors: Jae Chul Cha, Sandeep K. Gupta

Abstract:

The demand for higher performance graphics continues to grow because of the incessant desire towards realism. And, rapid advances in fabrication technology have enabled us to build several processor cores on a single die. Hence, it is important to develop single chip parallel architectures for such data-intensive applications. In this paper, we propose an efficient PIM architectures tailored for computer graphics which requires a large number of memory accesses. We then address the two important tasks necessary for maximally exploiting the parallelism provided by the architecture, namely, partitioning and placement of graphic data, which affect respectively load balances and communication costs. Under the constraints of uniform partitioning, we develop approaches for optimal partitioning and placement, which significantly reduce search space. We also present heuristics for identifying near-optimal placement, since the search space for placement is impractically large despite our optimization. We then demonstrate the effectiveness of our partitioning and placement approaches via analysis of example scenes; simulation results show considerable search space reductions, and our heuristics for placement performs close to optimal – the average ratio of communication overheads between our heuristics and the optimal was 1.05. Our uniform partitioning showed average load-balance ratio of 1.47 for geometry processing and 1.44 for rasterization, which is reasonable.

Keywords: Data Partitioning and Placement, Graphics, PIM, Search Space Reduction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1457

77 Enhancing Cache Performance Based on Improved Average Access Time

Authors: Jasim. A. Ghaeb

Abstract:

A high performance computer includes a fast processor and millions bytes of memory. During the data processing, huge amount of information are shuffled between the memory and processor. Because of its small size and its effectiveness speed, cache has become a common feature of high performance computers. Enhancing cache performance proved to be essential in the speed up of cache-based computers. Most enhancement approaches can be classified as either software based or hardware controlled. The performance of the cache is quantified in terms of hit ratio or miss ratio. In this paper, we are optimizing the cache performance based on enhancing the cache hit ratio. The optimum cache performance is obtained by focusing on the cache hardware modification in the way to make a quick rejection to the missed line's tags from the hit-or miss comparison stage, and thus a low hit time for the wanted line in the cache is achieved. In the proposed technique which we called Even- Odd Tabulation (EOT), the cache lines come from the main memory into cache are classified in two types; even line's tags and odd line's tags depending on their Least Significant Bit (LSB). This division is exploited by EOT technique to reject the miss match line's tags in very low time compared to the time spent by the main comparator in the cache, giving an optimum hitting time for the wanted cache line. The high performance of EOT technique against the familiar mapping technique FAM is shown in the simulated results.

Keywords: Caches, Cache performance, Hit time, Cache hit ratio, Cache mapping, Cache memory.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1629

76 A 10 Giga VPN Accelerator Board for Trust Channel Security System

Authors: Ki Hyun Kim, Jang-Hee Yoo, Kyo Il Chung

Abstract:

This paper proposes a VPN Accelerator Board (VPN-AB), a virtual private network (VPN) protocol designed for trust channel security system (TCSS). TCSS supports safety communication channel between security nodes in internet. It furnishes authentication, confidentiality, integrity, and access control to security node to transmit data packets with IPsec protocol. TCSS consists of internet key exchange block, security association block, and IPsec engine block. The internet key exchange block negotiates crypto algorithm and key used in IPsec engine block. Security Association blocks setting-up and manages security association information. IPsec engine block treats IPsec packets and consists of networking functions for communication. The IPsec engine block should be embodied by H/W and in-line mode transaction for high speed IPsec processing. Our VPN-AB is implemented with high speed security processor that supports many cryptographic algorithms and in-line mode. We evaluate a small TCSS communication environment, and measure a performance of VPN-AB in the environment. The experiment results show that VPN-AB gets a performance throughput of maximum 15.645Gbps when we set the IPsec protocol with 3DES-HMAC-MD5 tunnel mode.

Keywords: TCSS(Trust Channel Security System), VPN(VirtualPrivate Network), IPsec, SSL, Security Processor, Securitycommunication.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2058

75 Parallel Explicit Group Domain Decomposition Methods for the Telegraph Equation

Authors: Kew Lee Ming, Norhashidah Hj. Mohd. Ali

Abstract:

In a previous work, we presented the numerical solution of the two dimensional second order telegraph partial differential equation discretized by the centred and rotated five-point finite difference discretizations, namely the explicit group (EG) and explicit decoupled group (EDG) iterative methods, respectively. In this paper, we utilize a domain decomposition algorithm on these group schemes to divide the tasks involved in solving the same equation. The objective of this study is to describe the development of the parallel group iterative schemes under OpenMP programming environment as a way to reduce the computational costs of the solution processes using multicore technologies. A detailed performance analysis of the parallel implementations of points and group iterative schemes will be reported and discussed.

Keywords: Telegraph equation, explicit group iterative scheme, domain decomposition algorithm, parallelization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1480

74 Performance Evaluation of Prioritized Limited Processor-Sharing System

Authors: Yoshiaki Shikata, Wataru Katagiri, Yoshitaka Takahashi

Abstract:

We propose a novel prioritized limited processor-sharing (PS) rule and a simulation algorithm for the performance evaluation of this rule. The performance measures of practical interest are evaluated using this algorithm. Suppose that there are two classes and that an arriving (class-1 or class-2) request encounters n1 class-1 and n2 class-2 requests (including the arriving one) in a single-server system. According to the proposed rule, class-1 requests individually and simultaneously receive m / (m * n1+ n2) of the service-facility capacity, whereas class-2 requests receive 1 / (m *n1 + n2) of it, if m * n1 + n2 ≤ C. Otherwise (m * n1 + n2 > C), the arriving request will be queued in the corresponding class waiting room or rejected. Here, m (1) denotes the priority ratio, and C ( ∞), the service-facility capacity. In this rule, when a request arrives at [or departs from] the system, the extension [shortening] of the remaining sojourn time of each request receiving service can be calculated using the number of requests of each class and the priority ratio. Employing a simulation program to execute these events and calculations enables us to analyze the performance of the proposed prioritized limited PS rule, which is realistic in a time-sharing system (TSS) with a sufficiently small time slot. Moreover, this simulation algorithm is expanded for the evaluation of the prioritized limited PS system with N 3 priority classes.

Keywords: PS rule, priority ratio, service-facility capacity, simulation algorithm, sojourn time, performance measures

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1145

73 A Design of Elliptic Curve Cryptography Processor Based on SM2 over GF(p)

Authors: Shiji Hu, Lei Li, Wanting Zhou, Daohong Yang

Abstract:

The data encryption is the foundation of today’s communication. On this basis, to improve the speed of data encryption and decryption is always an important goal for high-speed applications. This paper proposed an elliptic curve crypto processor architecture based on SM2 prime field. Regarding hardware implementation, we optimized the algorithms in different stages of the structure. For modulo operation on finite field, we proposed an optimized improvement of the Karatsuba-Ofman multiplication algorithm and shortened the critical path through the pipeline structure in the algorithm implementation. Based on SM2 recommended prime field, a fast modular reduction algorithm is used to reduce 512-bit data obtained from the multiplication unit. The radix-4 extended Euclidean algorithm was used to realize the conversion between the affine coordinate system and the Jacobi projective coordinate system. In the parallel scheduling point operations on elliptic curves, we proposed a three-level parallel structure of point addition and point double based on the Jacobian projective coordinate system. Combined with the scalar multiplication algorithm, we added mutual pre-operation to the point addition and double point operation to improve the efficiency of the scalar point multiplication. The proposed ECC hardware architecture was verified and implemented on Xilinx Virtex-7 and ZYNQ-7 platforms, and each 256-bit scalar multiplication operation took 0.275ms. The performance for handling scalar multiplication is 32 times that of CPU (dual-core ARM Cortex-A9).

Keywords: Elliptic curve cryptosystems, SM2, modular multiplication, point multiplication.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 153

72 A Survey on Performance Tools for OpenMP

Authors: Mubarak S. Mohsen, Rosni Abdullah, Yong M. Teo

Abstract:

Advances in processors architecture, such as multicore, increase the size of complexity of parallel computer systems. With multi-core architecture there are different parallel languages that can be used to run parallel programs. One of these languages is OpenMP which embedded in C/Cµ or FORTRAN. Because of this new architecture and the complexity, it is very important to evaluate the performance of OpenMP constructs, kernels, and application program on multi-core systems. Performance is the activity of collecting the information about the execution characteristics of a program. Performance tools consists of at least three interfacing software layers, including instrumentation, measurement, and analysis. The instrumentation layer defines the measured performance events. The measurement layer determines what performance event is actually captured and how it is measured by the tool. The analysis layer processes the performance data and summarizes it into a form that can be displayed in performance tools. In this paper, a number of OpenMP performance tools are surveyed, explaining how each is used to collect, analyse, and display data collection.

Keywords: Parallel performance tools, OpenMP, multi-core.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1858

71 Soft Real-Time Fuzzy Task Scheduling for Multiprocessor Systems

Authors: Mahdi Hamzeh, Sied Mehdi Fakhraie, Caro Lucas

Abstract:

All practical real-time scheduling algorithms in multiprocessor systems present a trade-off between their computational complexity and performance. In real-time systems, tasks have to be performed correctly and timely. Finding minimal schedule in multiprocessor systems with real-time constraints is shown to be NP-hard. Although some optimal algorithms have been employed in uni-processor systems, they fail when they are applied in multiprocessor systems. The practical scheduling algorithms in real-time systems have not deterministic response time. Deterministic timing behavior is an important parameter for system robustness analysis. The intrinsic uncertainty in dynamic real-time systems increases the difficulties of scheduling problem. To alleviate these difficulties, we have proposed a fuzzy scheduling approach to arrange real-time periodic and non-periodic tasks in multiprocessor systems. Static and dynamic optimal scheduling algorithms fail with non-critical overload. In contrast, our approach balances task loads of the processors successfully while consider starvation prevention and fairness which cause higher priority tasks have higher running probability. A simulation is conducted to evaluate the performance of the proposed approach. Experimental results have shown that the proposed fuzzy scheduler creates feasible schedules for homogeneous and heterogeneous tasks. It also and considers tasks priorities which cause higher system utilization and lowers deadline miss time. According to the results, it performs very close to optimal schedule of uni-processor systems.

Keywords: Computational complexity, Deadline, Feasible scheduling, Fuzzy scheduling, Priority, Real-time multiprocessor systems, Robustness, System utilization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2082

70 A Survey of Baseband Architecture for Software Defined Radio

Authors: M. A. Fodha, H. Benfradj, A. Ghazel

Abstract:

This paper is a survey of recent works that proposes a baseband processor architecture for software defined radio. A classification of different approaches is proposed. The performance of each architecture is also discussed in order to clarify the suitable approaches that meet software-defined radio constraints.

Keywords: Multi-core architectures, reconfigurable architecture, software defined radio.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1411

69 Optimal Placement of Processors based on Effective Communication Load

Authors: A. R. Aswatha, T. Basavaraju, N. Bhaskara Rao

Abstract:

This paper presents a new technique for the optimum placement of processors to minimize the total effective communication load under multi-processor communication dominated environment. This is achieved by placing heavily loaded processors near each other and lightly loaded ones far away from one another in the physical grid locations. The results are mathematically proved for the Algorithms are described.

Keywords: Ascending Sort Index Vector, EffectiveCommunication Load, Effective Distance Matrix, OptimalPlacement, Sorting Order.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1303

68 Effects of the Purpose Expropriation of Land Consolidation to Landholding

Authors: Turgut Ayten, Tayfun Çay

Abstract:

In the current expropriation of Turkey, the state acquires necessary lands for its investment without permission of the owners and not searching for alternative solutions, so it is determined that neither processor nor processed is not happy. In this study, interactions of enterprises in Turkey are analysed in case the necessary land for public investments are acquired by expropriation purposed land consolidation. Legal basis, positive and negative sides, financial effects to enterprises of this method is evaluated according to Konya Kadınhanı, Kolukısa avenue which is on the Konya-Ankara High-Speed Train Route.

Keywords: Land consolidation, expropriation purposed land consolidation, sustainable rural development, cost.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 765

67 Design of Local Interconnect Network Controller for Automotive Applications

Authors: Jong-Bae Lee, Seongsoo Lee

Abstract:

Local interconnect network (LIN) is a communication protocol that combines sensors, actuators, and processors to a functional module in automotive applications. In this paper, a LIN ver. 2.2A controller was designed in Verilog hardware description language (Verilog HDL) and implemented in field-programmable gate array (FPGA). Its operation was verified by making full-scale LIN network with the presented FPGA-implemented LIN controller, commercial LIN transceivers, and commercial processors. When described in Verilog HDL and synthesized in 0.18 μm technology, its gate size was about 2,300 gates.

Keywords: Local interconnect network, controller, transceiver, processor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1523

66 Classifying and Predicting Efficiencies Using Interval DEA Grid Setting

Authors: Yiannis G. Smirlis

Abstract:

The classification and the prediction of efficiencies in Data Envelopment Analysis (DEA) is an important issue, especially in large scale problems or when new units frequently enter the under-assessment set. In this paper, we contribute to the subject by proposing a grid structure based on interval segmentations of the range of values for the inputs and outputs. Such intervals combined, define hyper-rectangles that partition the space of the problem. This structure, exploited by Interval DEA models and a dominance relation, acts as a DEA pre-processor, enabling the classification and prediction of efficiency scores, without applying any DEA models.

Keywords: Data envelopment analysis, interval DEA, efficiency classification, efficiency prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 891

65 Consistency Model and Synchronization Primitives in SDSMS

Authors: Dalvinder Singh Dhaliwal, Parvinder S. Sandhu, S. N. Panda

Abstract:

This paper is on the general discussion of memory consistency model like Strict Consistency, Sequential Consistency, Processor Consistency, Weak Consistency etc. Then the techniques for implementing distributed shared memory Systems and Synchronization Primitives in Software Distributed Shared Memory Systems are discussed. The analysis involves the performance measurement of the protocol concerned that is Multiple Writer Protocol. Each protocol has pros and cons. So, the problems that are associated with each protocol is discussed and other related things are explored.

Keywords: Distributed System, Single owner protocol, Multiple owner protocol

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1347

64 Architecture Based on Dynamic Graphs for the Dynamic Reconfiguration of Farms of Computers

Authors: Carmen Navarrete, Eloy Anguiano

Abstract:

In the last years, the computers have increased their capacity of calculus and networks, for the interconnection of these machines. The networks have been improved until obtaining the actual high rates of data transferring. The programs that nowadays try to take advantage of these new technologies cannot be written using the traditional techniques of programming, since most of the algorithms were designed for being executed in an only processor,in a nonconcurrent form instead of being executed concurrently ina set of processors working and communicating through a network.This paper aims to present the ongoing development of a new system for the reconfiguration of grouping of computers, taking into account these new technologies.

Keywords: Dynamic network topology, resource and task allocation, parallel computing, heterogeneous computing, dynamic reconfiguration.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1319

63 Heuristic for Accelerating Run-Time Task Mapping in NoC-Based Heterogeneous MPSoCs

Authors: M. K. Benhaoua, A. K. Singh, A. E. H. Benyamina, A. Kumar, P. Boulet

Abstract:

In this paper, we propose a new packing strategy to find a free resource for run-time mapping of application tasks to NoC-based Heterogeneous MPSoC. The proposed strategy minimizes the task mapping time in addition to placing the communicating tasks close to each other. To evaluate our approach, a comparative study is carried out for a platform containing single task supported PEs. Experiments show that our strategy provides better results when compared to latest dynamic mapping strategies reported in the literature.

Keywords: Multi-Processor Systems-on-Chip (MPSoCs), Network-on-Chip (NoC), Heterogeneous architectures, Dynamic mapping heuristics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2216