Search results for: Application-Specific Instruction-set Processors
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 82

Search results for: Application-Specific Instruction-set Processors

52 Aspect based Reusable Synchronization Schemes

Authors: Nathar Shah

Abstract:

Concurrency and synchronization are becoming big issues as every new PC comes with multi-core processors. A major reason for Object-Oriented Programming originally was to enable easier reuse: encode your algorithm into a class and thoroughly debug it, then you can reuse the class again and again. However, when we get to concurrency and synchronization, this is often not possible. Thread-safety issues means that synchronization constructs need to be entangled into every class involved. We contributed a detailed literature review of issues and challenges in concurrent programming and present a methodology that uses the Aspect- Oriented paradigm to address this problem. Aspects will allow us to extract the synchronization concerns as schemes to be “weaved in" later into the main code. This allows the aspects to be separately tested and verified. Hence, the functional components can be weaved with reusable synchronization schemes that are robust and scalable.

Keywords: Aspect-orientation, development methodologysoftware concurrency, synchronization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1212
51 Parallel 2-Opt Local Search on GPU

Authors: Wen-Bao Qiao, Jean-Charles Créput

Abstract:

To accelerate the solution for large scale traveling salesman problems (TSP), a parallel 2-opt local search algorithm with simple implementation based on Graphics Processing Unit (GPU) is presented and tested in this paper. The parallel scheme is based on technique of data decomposition by dynamically assigning multiple K processors on the integral tour to treat K edges’ 2-opt local optimization simultaneously on independent sub-tours, where K can be user-defined or have a function relationship with input size N. We implement this algorithm with doubly linked list on GPU. The implementation only requires O(N) memory. We compare this parallel 2-opt local optimization against sequential exhaustive 2-opt search along integral tour on TSP instances from TSPLIB with more than 10000 cities.

Keywords: Doubly linked list, parallel 2-opt, tour division, GPU.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1191
50 Design of Multi-disease Diagnosis Processor using Hypernetworks Technique

Authors: Jae-Yeon Song, Seung-Yerl Lee, Kyu-Yeul Wang, Byung-Soo Kim, Sang-Seol Lee, Seong-Seob Shin, Jae-Young Choi, Chong Ho Lee, Jeahyun Park, Duck-Jin Chung

Abstract:

In this paper, we propose disease diagnosis hardware architecture by using Hypernetworks technique. It can be used to diagnose 3 different diseases (SPECT Heart, Leukemia, Prostate cancer). Generally, the disparate diseases require specified diagnosis hardware model for each disease. Using similarities of three diseases diagnosis processor, we design diagnosis processor that can diagnose three different diseases. Our proposed architecture that is combining three processors to one processor can reduce hardware size without decrease of the accuracy.

Keywords: Diagnosis processor, Hypernetworks, Leukemia, Mask, Prostate cancer, SPECT Heart data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1326
49 A Parallel Algorithm for 2-D Cylindrical Geometry Transport Equation with Interface Corrections

Authors: Wei Jun-xia, Yuan Guang-wei, Yang Shu-lin, Shen Wei-dong

Abstract:

In order to make conventional implicit algorithm to be applicable in large scale parallel computers , an interface prediction and correction of discontinuous finite element method is presented to solve time-dependent neutron transport equations under 2-D cylindrical geometry. Domain decomposition is adopted in the computational domain.The numerical experiments show that our parallel algorithm with explicit prediction and implicit correction has good precision, parallelism and simplicity. Especially, it can reach perfect speedup even on hundreds of processors for large-scale problems.

Keywords: Transport Equation, Discontinuous Finite Element, Domain Decomposition, Interface Prediction And Correction

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1626
48 Some Preconditioners for Block Pentadiagonal Linear Systems Based on New Approximate Factorization Methods

Authors: Xian Ming Gu, Ting Zhu Huang, Hou Biao Li

Abstract:

In this paper, getting an high-efficiency parallel algorithm to solve sparse block pentadiagonal linear systems suitable for vectors and parallel processors, stair matrices are used to construct some parallel polynomial approximate inverse preconditioners. These preconditioners are appropriate when the desired target is to maximize parallelism. Moreover, some theoretical results about these preconditioners are presented and how to construct preconditioners effectively for any nonsingular block pentadiagonal H-matrices is also described. In addition, the availability of these preconditioners is illustrated with some numerical experiments arising from two dimensional biharmonic equation.

Keywords: Parallel algorithm, Pentadiagonal matrix, Polynomial approximate inverse, Preconditioners, Stair matrix.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2204
47 Self-Organization-Based Approach for Embedded Real-Time System Design

Authors: S. S. Bendib, L. W. Mouss, S. Kalla

Abstract:

This paper proposes a self-organization-based approach for real-time systems design. The addressed issue is the mapping of an application onto an architecture of heterogeneous processors while optimizing both makespan and reliability. Since this problem is NP-hard, a heuristic algorithm is used to obtain efficiently approximate solutions. The proposed approach takes into consideration the quality as well as the diversity of solutions. Indeed, an alternate treatment of the two objectives allows to produce solutions of good quality while a self-organization approach based on the neighborhood structure is used to reorganize solutions and consequently to enhance their diversity. Produced solutions make different compromises between the makespan and the reliability giving the user the possibility to select the solution suited to his (her) needs.

Keywords: Embedded real-time systems design, makespan, reliability, self-organization, compromises.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 410
46 A Practical Distributed String Matching Algorithm Architecture and Implementation

Authors: Bi Kun, Gu Nai-jie, Tu Kun, Liu Xiao-hu, Liu Gang

Abstract:

Traditional parallel single string matching algorithms are always based on PRAM computation model. Those algorithms concentrate on the cost optimal design and the theoretical speed. Based on the distributed string matching algorithm proposed by CHEN, a practical distributed string matching algorithm architecture is proposed in this paper. And also an improved single string matching algorithm based on a variant Boyer-Moore algorithm is presented. We implement our algorithm on the above architecture and the experiments prove that it is really practical and efficient on distributed memory machine. Its computation complexity is O(n/p + m), where n is the length of the text, and m is the length of the pattern, and p is the number of the processors.

Keywords: Boyer-Moore algorithm, distributed algorithm, parallel string matching, string matching.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2153
45 Application and Limitation of Parallel Modelingin Multidimensional Sequential Pattern

Authors: Mahdi Esmaeili, Mansour Tarafdar

Abstract:

The goal of data mining algorithms is to discover useful information embedded in large databases. One of the most important data mining problems is discovery of frequently occurring patterns in sequential data. In a multidimensional sequence each event depends on more than one dimension. The search space is quite large and the serial algorithms are not scalable for very large datasets. To address this, it is necessary to study scalable parallel implementations of sequence mining algorithms. In this paper, we present a model for multidimensional sequence and describe a parallel algorithm based on data parallelism. Simulation experiments show good load balancing and scalable and acceptable speedup over different processors and problem sizes and demonstrate that our approach can works efficiently in a real parallel computing environment.

Keywords: Sequential Patterns, Data Mining, ParallelAlgorithm, Multidimensional Sequence Data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1445
44 Extended Arithmetic Precision in Meshfree Calculations

Authors: Edward J. Kansa, Pavel Holoborodko

Abstract:

Continuously differentiable radial basis functions (RBFs) are meshfree, converge faster as the dimensionality increases, and is theoretically spectrally convergent. When implemented on current single and double precision computers, such RBFs can suffer from ill-conditioning because the systems of equations needed to be solved to find the expansion coefficients are full. However, the Advanpix extended precision software package allows computer mathematics to resemble asymptotically ideal Platonic mathematics. Additionally, full systems with extended precision execute faster graphical processors units and field-programmable gate arrays because no branching is needed. Sparse equation systems are fast for iterative solvers in a very limited number of cases.

Keywords: Meshless spectrally convergent, partial differential equations, extended arithmetic precision, no branching.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 578
43 Dynamic Data Partition Algorithm for a Parallel H.264 Encoder

Authors: Juntae Kim, Jaeyoung Park, Kyoungkun Lee, Jong Tae Kim

Abstract:

The H.264/AVC standard is a highly efficient video codec providing high-quality videos at low bit-rates. As employing advanced techniques, the computational complexity has been increased. The complexity brings about the major problem in the implementation of a real-time encoder and decoder. Parallelism is the one of approaches which can be implemented by multi-core system. We analyze macroblock-level parallelism which ensures the same bit rate with high concurrency of processors. In order to reduce the encoding time, dynamic data partition based on macroblock region is proposed. The data partition has the advantages in load balancing and data communication overhead. Using the data partition, the encoder obtains more than 3.59x speed-up on a four-processor system. This work can be applied to other multimedia processing applications.

Keywords: H.264/AVC, video coding, thread-level parallelism, OpenMP, multimedia

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1768
42 Performance Comparison of Parallel Sorting Algorithms on the Cluster of Workstations

Authors: Lai Lai Win Kyi, Nay Min Tun

Abstract:

Sorting appears the most attention among all computational tasks over the past years because sorted data is at the heart of many computations. Sorting is of additional importance to parallel computing because of its close relation to the task of routing data among processes, which is an essential part of many parallel algorithms. Many parallel sorting algorithms have been investigated for a variety of parallel computer architectures. In this paper, three parallel sorting algorithms have been implemented and compared in terms of their overall execution time. The algorithms implemented are the odd-even transposition sort, parallel merge sort and parallel rank sort. Cluster of Workstations or Windows Compute Cluster has been used to compare the algorithms implemented. The C# programming language is used to develop the sorting algorithms. The MPI (Message Passing Interface) library has been selected to establish the communication and synchronization between processors. The time complexity for each parallel sorting algorithm will also be mentioned and analyzed.

Keywords: Cluster of Workstations, Parallel sorting algorithms, performance analysis, parallel computing and MPI.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1452
41 New VLSI Architecture for Motion Estimation Algorithm

Authors: V. S. K. Reddy, S. Sengupta, Y. M. Latha

Abstract:

This paper presents an efficient VLSI architecture design to achieve real time video processing using Full-Search Block Matching (FSBM) algorithm. The design employs parallel bank architecture with minimum latency, maximum throughput, and full hardware utilization. We use nine parallel processors in our architecture and each controlled by a state machine. State machine control implementation makes the design very simple and cost effective. The design is implemented using VHDL and the programming techniques we incorporated makes the design completely programmable in the sense that the search ranges and the block sizes can be varied to suit any given requirements. The design can operate at frequencies up to 36 MHz and it can function in QCIF and CIF video resolution at 1.46 MHz and 5.86 MHz, respectively.

Keywords: Video Coding, Motion Estimation, Full-Search, Block-Matching, VLSI Architecture.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1773
40 A Novel Implementation of Application Specific Instruction-set Processor (ASIP) using Verilog

Authors: Kamaraju.M, Lal Kishore.K, Tilak.A.V.N

Abstract:

The general purpose processors that are used in embedded systems must support constraints like execution time, power consumption, code size and so on. On the other hand an Application Specific Instruction-set Processor (ASIP) has advantages in terms of power consumption, performance and flexibility. In this paper, a 16-bit Application Specific Instruction-set processor for the sensor data transfer is proposed. The designed processor architecture consists of on-chip transmitter and receiver modules along with the processing and controlling units to enable the data transmission and reception on a single die. The data transfer is accomplished with less number of instructions as compared with the general purpose processor. The ASIP core operates at a maximum clock frequency of 1.132GHz with a delay of 0.883ns and consumes 569.63mW power at an operating voltage of 1.2V. The ASIP is implemented in Verilog HDL using the Xilinx platform on Virtex4.

Keywords: ASIP, Data transfer, Instruction set, Processor

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2040
39 Classification Based on Deep Neural Cellular Automata Model

Authors: Yasser F. Hassan

Abstract:

Deep learning structure is a branch of machine learning science and greet achievement in research and applications. Cellular neural networks are regarded as array of nonlinear analog processors called cells connected in a way allowing parallel computations. The paper discusses how to use deep learning structure for representing neural cellular automata model. The proposed learning technique in cellular automata model will be examined from structure of deep learning. A deep automata neural cellular system modifies each neuron based on the behavior of the individual and its decision as a result of multi-level deep structure learning. The paper will present the architecture of the model and the results of simulation of approach are given. Results from the implementation enrich deep neural cellular automata system and shed a light on concept formulation of the model and the learning in it.

Keywords: Cellular automata, neural cellular automata, deep learning, classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 827
38 Iterative Joint Power Control and Partial Crosstalk Cancellation in Upstream VDSL

Authors: H. Bagheri, H. Emami, M. R. Pakravan

Abstract:

Crosstalk is the major limiting issue in very high bit-rate digital subscriber line (VDSL) systems in terms of bit-rate or service coverage. At the central office side, joint signal processing accompanied by appropriate power allocation enables complex multiuser processors to provide near capacity rates. Unfortunately complexity grows with the square of the number of lines within a binder, so by taking into account that there are only a few dominant crosstalkers who contribute to main part of crosstalk power, the canceller structure can be simplified which resulted in a much lower run-time complexity. In this paper, a multiuser power control scheme, namely iterative waterfilling, is combined with previously proposed partial crosstalk cancellation approaches to demonstrate the best ever achieved performance which is verified by simulation results.

Keywords: iterative waterfilling, partial crosstalk cancellation, run-time complexity, VDSL.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1370
37 A Robust Software for Advanced Analysis of Space Steel Frames

Authors: Viet-Hung Truong, Seung-Eock Kim

Abstract:

This paper presents a robust software package for practical advanced analysis of space steel framed structures. The pre- and post-processors of the presented software package are coded in the C++ programming language while the solver is written by using the FORTRAN programming language. A user-friendly graphical interface of the presented software is developed to facilitate the modeling process and result interpretation of the problem. The solver employs the stability functions for capturing the second-order effects to minimize modeling and computational time. Both the plastic-hinge and fiber-hinge beam-column elements are available in the presented software. The generalized displacement control method is adopted to solve the nonlinear equilibrium equations.

Keywords: Advanced analysis, beam-column, fiber-hinge, plastic hinge, steel frame.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1424
36 A Distributed Group Mutual Exclusion Algorithm for Soft Real Time Systems

Authors: Abhishek Swaroop, Awadhesh Kumar Singh

Abstract:

The group mutual exclusion (GME) problem is an interesting generalization of the mutual exclusion problem. Several solutions of the GME problem have been proposed for message passing distributed systems. However, none of these solutions is suitable for real time distributed systems. In this paper, we propose a token-based distributed algorithms for the GME problem in soft real time distributed systems. The algorithm uses the concepts of priority queue, dynamic request set and the process state. The algorithm uses first come first serve approach in selecting the next session type between the same priority levels and satisfies the concurrent occupancy property. The algorithm allows all n processors to be inside their CS provided they request for the same session. The performance analysis and correctness proof of the algorithm has also been included in the paper.

Keywords: Concurrency, Group mutual exclusion, Priority, Request set, Token.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1678
35 An Approach for Blind Source Separation using the Sliding DFT and Time Domain Independent Component Analysis

Authors: Koji Yamanouchi, Masaru Fujieda, Takahiro Murakami, Yoshihisa Ishida

Abstract:

''Cocktail party problem'' is well known as one of the human auditory abilities. We can recognize the specific sound that we want to listen by this ability even if a lot of undesirable sounds or noises are mixed. Blind source separation (BSS) based on independent component analysis (ICA) is one of the methods by which we can separate only a special signal from their mixed signals with simple hypothesis. In this paper, we propose an online approach for blind source separation using the sliding DFT and the time domain independent component analysis. The proposed method can reduce calculation complexity in comparison with conventional methods, and can be applied to parallel processing by using digital signal processors (DSPs) and so on. We evaluate this method and show its availability.

Keywords: Cocktail party problem, blind Source Separation(BSS), independent component analysis, sliding DFT, onlineprocessing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1602
34 Accelerating Integer Neural Networks On Low Cost DSPs

Authors: Thomas Behan, Zaiyi Liao, Lian Zhao, Chunting Yang

Abstract:

In this paper, low end Digital Signal Processors (DSPs) are applied to accelerate integer neural networks. The use of DSPs to accelerate neural networks has been a topic of study for some time, and has demonstrated significant performance improvements. Recently, work has been done on integer only neural networks, which greatly reduces hardware requirements, and thus allows for cheaper hardware implementation. DSPs with Arithmetic Logic Units (ALUs) that support floating or fixed point arithmetic are generally more expensive than their integer only counterparts due to increased circuit complexity. However if the need for floating or fixed point math operation can be removed, then simpler, lower cost DSPs can be used. To achieve this, an integer only neural network is created in this paper, which is then accelerated by using DSP instructions to improve performance.

Keywords: Digital Signal Processor (DSP), Integer Neural Network(INN), Low Cost Neural Network, Integer Neural Network DSPImplementation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1761
33 Sub-Image Detection Using Fast Neural Processors and Image Decomposition

Authors: Hazem M. El-Bakry, Qiangfu Zhao

Abstract:

In this paper, an approach to reduce the computation steps required by fast neural networksfor the searching process is presented. The principle ofdivide and conquer strategy is applied through imagedecomposition. Each image is divided into small in sizesub-images and then each one is tested separately usinga fast neural network. The operation of fast neuralnetworks based on applying cross correlation in thefrequency domain between the input image and theweights of the hidden neurons. Compared toconventional and fast neural networks, experimentalresults show that a speed up ratio is achieved whenapplying this technique to locate human facesautomatically in cluttered scenes. Furthermore, fasterface detection is obtained by using parallel processingtechniques to test the resulting sub-images at the sametime using the same number of fast neural networks. Incontrast to using only fast neural networks, the speed upratio is increased with the size of the input image whenusing fast neural networks and image decomposition.

Keywords: Fast Neural Networks, 2D-FFT, CrossCorrelation, Image decomposition, Parallel Processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2141
32 A Survey on Performance Tools for OpenMP

Authors: Mubarak S. Mohsen, Rosni Abdullah, Yong M. Teo

Abstract:

Advances in processors architecture, such as multicore, increase the size of complexity of parallel computer systems. With multi-core architecture there are different parallel languages that can be used to run parallel programs. One of these languages is OpenMP which embedded in C/Cµ or FORTRAN. Because of this new architecture and the complexity, it is very important to evaluate the performance of OpenMP constructs, kernels, and application program on multi-core systems. Performance is the activity of collecting the information about the execution characteristics of a program. Performance tools consists of at least three interfacing software layers, including instrumentation, measurement, and analysis. The instrumentation layer defines the measured performance events. The measurement layer determines what performance event is actually captured and how it is measured by the tool. The analysis layer processes the performance data and summarizes it into a form that can be displayed in performance tools. In this paper, a number of OpenMP performance tools are surveyed, explaining how each is used to collect, analyse, and display data collection.

Keywords: Parallel performance tools, OpenMP, multi-core.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1871
31 Model Transformation with a Visual Control Flow Language

Authors: László Lengyel, Tihamér Levendovszky, Gergely Mezei, Hassan Charaf

Abstract:

Graph rewriting-based visual model processing is a widely used technique for model transformation. Visual model transformations often need to follow an algorithm that requires a strict control over the execution sequence of the transformation steps. Therefore, in Visual Model Processors (VMPs) the execution order of the transformation steps is crucial. This paper presents the visual control flow support of Visual Modeling and Transformation System (VMTS), which facilitates composing complex model transformations of simple transformation steps and executing them. The VMTS Visual Control Flow Language (VCFL) uses stereotyped activity diagrams to specify control flow structures and OCL constraints to choose between different control flow branches. This paper introduces VCFL, discusses its termination properties and provides an algorithm to support the termination analysis of VCFL transformations.

Keywords: Control Flow, Metamodel-Based Visual ModelTransformation, OCL, Termination Properties, UML.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1649
30 Parallezation Protein Sequence Similarity Algorithms using Remote Method Interface

Authors: Mubarak Saif Mohsen, Zurinahni Zainol, Rosalina Abdul Salam, Wahidah Husain

Abstract:

One of the major problems in genomic field is to perform sequence comparison on DNA and protein sequences. Executing sequence comparison on the DNA and protein data is a computationally intensive task. Sequence comparison is the basic step for all algorithms in protein sequences similarity. Parallel computing is an attractive solution to provide the computational power needed to speedup the lengthy process of the sequence comparison. Our main research is to enhance the protein sequence algorithm using dynamic programming method. In our approach, we parallelize the dynamic programming algorithm using multithreaded program to perform the sequence comparison and also developed a distributed protein database among many PCs using Remote Method Interface (RMI). As a result, we showed how different sizes of protein sequences data and computation of scoring matrix of these protein sequence on different number of processors affected the processing time and speed, as oppose to sequential processing.

Keywords: Protein sequence algorithm, dynamic programming algorithm, multithread

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1868
29 An Analysis of Real-Time Distributed System under Different Priority Policies

Authors: Y. Jayanta Singh, Suresh C. Mehrotra

Abstract:

A real time distributed computing has heterogeneously networked computers to solve a single problem. So coordination of activities among computers is a complex task and deadlines make more complex. The performances depend on many factors such as traffic workloads, database system architecture, underlying processors, disks speeds, etc. Simulation study have been performed to analyze the performance under different transaction scheduling: different workloads, arrival rate, priority policies, altering slack factors and Preemptive Policy. The performance metric of the experiments is missed percent that is the percentage of transaction that the system is unable to complete. The throughput of the system is depends on the arrival rate of transaction. The performance can be enhanced with altering the slack factor value. Working on slack value for the transaction can helps to avoid some of transactions from killing or aborts. Under the Preemptive Policy, many extra executions of new transactions can be carried out.

Keywords: Real distributed systems, slack factors, transaction scheduling, priority policies.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1597
28 3D Network-on-Chip with on-Chip DRAM: An Empirical Analysis for Future Chip Multiprocessor

Authors: Thomas Canhao Xu, Bo Yang, Alexander Wei Yin, Pasi Liljeberg, Hannu Tenhunen

Abstract:

With the increasing number of on-chip components and the critical requirement for processing power, Chip Multiprocessor (CMP) has gained wide acceptance in both academia and industry during the last decade. However, the conventional bus-based onchip communication schemes suffer from very high communication delay and low scalability in large scale systems. Network-on-Chip (NoC) has been proposed to solve the bottleneck of parallel onchip communications by applying different network topologies which separate the communication phase from the computation phase. Observing that the memory bandwidth of the communication between on-chip components and off-chip memory has become a critical problem even in NoC based systems, in this paper, we propose a novel 3D NoC with on-chip Dynamic Random Access Memory (DRAM) in which different layers are dedicated to different functionalities such as processors, cache or memory. Results show that, by using our proposed architecture, average link utilization has reduced by 10.25% for SPLASH-2 workloads. Our proposed design costs 1.12% less execution cycles than the traditional design on average.

Keywords: 3D integration, network-on-chip, memory-on-chip, DRAM, chip multiprocessor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2407
27 The Most Secure Smartphone Operating System: A Survey

Authors: Sundus Ayyaz, Saad Rehman

Abstract:

In the recent years, a fundamental revolution in the Mobile Phone technology from just being able to provide voice and short message services to becoming the most essential part of our lives by connecting to network and various app stores for downloading software apps of almost every activity related to our life from finding location to banking from getting news updates to downloading HD videos and so on. This progress in Smart Phone industry has modernized and transformed our way of living into a trouble-free world. The smart phone has become our personal computers with the addition of significant features such as multi core processors, multi-tasking, large storage space, bluetooth, WiFi, including large screen and cameras. With this evolution, the rise in the security threats have also been amplified. In Literature, different threats related to smart phones have been highlighted and various precautions and solutions have been proposed to keep the smart phone safe which carries all the private data of a user. In this paper, a survey has been carried out to find out the most secure and the most unsecure smart phone operating system among the most popular smart phones in use today.

Keywords: Smart phone, operating system, security threats, Android, iOS, Balckberry, Windows.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4146
26 Multiple Job Shop-Scheduling using Hybrid Heuristic Algorithm

Authors: R.A.Mahdavinejad

Abstract:

In this paper, multi-processors job shop scheduling problems are solved by a heuristic algorithm based on the hybrid of priority dispatching rules according to an ant colony optimization algorithm. The objective function is to minimize the makespan, i.e. total completion time, in which a simultanous presence of various kinds of ferons is allowed. By using the suitable hybrid of priority dispatching rules, the process of finding the best solution will be improved. Ant colony optimization algorithm, not only promote the ability of this proposed algorithm, but also decreases the total working time because of decreasing in setup times and modifying the working production line. Thus, the similar work has the same production lines. Other advantage of this algorithm is that the similar machines (not the same) can be considered. So, these machines are able to process a job with different processing and setup times. According to this capability and from this algorithm evaluation point of view, a number of test problems are solved and the associated results are analyzed. The results show a significant decrease in throughput time. It also shows that, this algorithm is able to recognize the bottleneck machine and to schedule jobs in an efficient way.

Keywords: Job shops scheduling, Priority dispatching rules, Makespan, Hybrid heuristic algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1633
25 An Enhanced Distributed System to improve theTime Complexity of Binary Indexed Trees

Authors: Ahmed M. Elhabashy, A. Baes Mohamed, Abou El Nasr Mohamad

Abstract:

Distributed Computing Systems are usually considered the most suitable model for practical solutions of many parallel algorithms. In this paper an enhanced distributed system is presented to improve the time complexity of Binary Indexed Trees (BIT). The proposed system uses multi-uniform processors with identical architectures and a specially designed distributed memory system. The analysis of this system has shown that it has reduced the time complexity of the read query to O(Log(Log(N))), and the update query to constant complexity, while the naive solution has a time complexity of O(Log(N)) for both queries. The system was implemented and simulated using VHDL and Verilog Hardware Description Languages, with xilinx ISE 10.1, as the development environment and ModelSim 6.1c, similarly as the simulation tool. The simulation has shown that the overhead resulting by the wiring and communication between the system fragments could be fairly neglected, which makes it applicable to practically reach the maximum speed up offered by the proposed model.

Keywords: Binary Index Tree (BIT), Least Significant Bit (LSB), Parallel Adder (PA), Very High Speed Integrated Circuits HardwareDescription Language (VHDL), Distributed Parallel Computing System(DPCS).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1735
24 Measuring Risk Levels and Efficacy of Risk Management Strategies in Vietnamese Catfish Farming

Authors: Tru C. Le, France Cheong

Abstract:

Although the Vietnamese catfish farming has grown at very high rates in recent years, the industry has also faced many problems affecting its sustainability. This paper studies the perceptions of catfish farmers regarding risk and risk management strategies in their production activities. Specifically, the study aims to measure the consequences, likelihoods, and levels of risks as well as the efficacy of risk management in Vietnamese catfish farming. Data for the study were collected through a sample of 261 catfish farmers in the Mekong Delta, Vietnam using a questionnaire survey in 2008. Results show that, in general, price and production risks were perceived as the most important risks. Farm management and technical measures were perceived more effective than other kinds of risk management strategies in risk reduction. Although price risks were rated as important risks, price risk management strategies were not perceived as important measures for risk mitigation. The results of the study are discussed to provide implications for various industry stakeholders, including policy makers, processors, advisors, and developers of new risk management strategies.

Keywords: Aquaculture, catfish farming, sources of risk, riskmanagement, risk strategies, risk mitigation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1944
23 Dynamic Power Reduction in Sequential Circuits Using Look Ahead Clock Gating Technique

Authors: R. Manjith, C. Muthukumari

Abstract:

In this paper, a novel Linear Feedback Shift Register (LFSR) with Look Ahead Clock Gating (LACG) technique is presented to reduce the power consumption in modern processors and System-on-Chip. Clock gating is a predominant technique used to reduce unwanted switching of clock signals. Several clock gating techniques to reduce the dynamic power have been developed, of which LACG is predominant. LACG computes the clock enabling signals of each flip-flop (FF) one cycle ahead of time, based on the present cycle data of the flip-flops on which it depends. It overcomes the timing problems in the existing clock gating methods like datadriven clock gating and Auto-Gated flip-flops (AGFF) by allotting a full clock cycle for the determination of the clock enabling signals. Further to reduce the power consumption in LACG technique, FFs can be grouped so that they share a common clock enabling signal. Simulation results show that the novel grouped LFSR with LACG achieves 15.03% power savings than conventional LFSR with LACG and 44.87% than data-driven clock gating.

Keywords: AGFF, data-driven, LACG, LFSR.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1711