Search results for: hardware accelerator
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 606

Search results for: hardware accelerator

606 Providing Reliability, Availability and Scalability Support for Quick Assist Technology Cryptography on the Cloud

Authors: Songwu Shen, Garrett Drysdale, Veerendranath Mannepalli, Qihua Dai, Yuan Wang, Yuli Chen, David Qian, Utkarsh Kakaiya

Abstract:

Hardware accelerator has been a promising solution to reduce the cost of cloud data centers. This paper investigates the QoS enhancement of the acceleration of an important datacenter workload: the webserver (or proxy) that faces high computational consumption originated from secure sockets layer (SSL) or transport layer security (TLS) procession in the cloud environment. Our study reveals that for the accelerator maintenance cases—need to upgrade driver/firmware or hardware reset due to hardware hang; we still can provide cryptography services by switching to software during maintenance phase and then switching back to accelerator after maintenance. The switching is seamless to server application such as Nginx that runs inside a VM on top of the server. To achieve this high availability goal, we propose a comprehensive fallback solution based on Intel® QuickAssist Technology (QAT). This approach introduces an architecture that involves the collaboration between physical function (PF) and virtual function (VF), and collaboration among VF, OpenSSL, and web application Nginx. The evaluation shows that our solution could provide high reliability, availability, and scalability (RAS) of hardware cryptography service in a 7x24x365 manner in the cloud environment.

Keywords: accelerator, cryptography service, RAS, secure sockets layer/transport layer security, SSL/TLS, virtualization fallback architecture

Procedia PDF Downloads 108
605 A Survey of Field Programmable Gate Array-Based Convolutional Neural Network Accelerators

Authors: Wei Zhang

Abstract:

With the rapid development of deep learning, neural network and deep learning algorithms play a significant role in various practical applications. Due to the high accuracy and good performance, Convolutional Neural Networks (CNNs) especially have become a research hot spot in the past few years. However, the size of the networks becomes increasingly large scale due to the demands of the practical applications, which poses a significant challenge to construct a high-performance implementation of deep learning neural networks. Meanwhile, many of these application scenarios also have strict requirements on the performance and low-power consumption of hardware devices. Therefore, it is particularly critical to choose a moderate computing platform for hardware acceleration of CNNs. This article aimed to survey the recent advance in Field Programmable Gate Array (FPGA)-based acceleration of CNNs. Various designs and implementations of the accelerator based on FPGA under different devices and network models are overviewed, and the versions of Graphic Processing Units (GPUs), Application Specific Integrated Circuits (ASICs) and Digital Signal Processors (DSPs) are compared to present our own critical analysis and comments. Finally, we give a discussion on different perspectives of these acceleration and optimization methods on FPGA platforms to further explore the opportunities and challenges for future research. More helpfully, we give a prospect for future development of the FPGA-based accelerator.

Keywords: deep learning, field programmable gate array, FPGA, hardware accelerator, convolutional neural networks, CNN

Procedia PDF Downloads 84
604 Operator Optimization Based on Hardware Architecture Alignment Requirements

Authors: Qingqing Gai, Junxing Shen, Yu Luo

Abstract:

Due to the hardware architecture characteristics, some operators tend to acquire better performance if the input/output tensor dimensions are aligned to a certain minimum granularity, such as convolution and deconvolution commonly used in deep learning. Furthermore, if the requirements are not met, the general strategy is to pad with 0 to satisfy the requirements, potentially leading to the under-utilization of the hardware resources. Therefore, for the convolution and deconvolution whose input and output channels do not meet the minimum granularity alignment, we propose to transfer the W-dimensional data to the C-dimension for computation (W2C) to enable the C-dimension to meet the hardware requirements. This scheme also reduces the number of computations in the W-dimension. Although this scheme substantially increases computation, the operator’s speed can improve significantly. It achieves remarkable speedups on multiple hardware accelerators, including Nvidia Tensor cores, Qualcomm digital signal processors (DSPs), and Huawei neural processing units (NPUs). All you need to do is modify the network structure and rearrange the operator weights offline without retraining. At the same time, for some operators, such as the Reducemax, we observe that transferring the Cdimensional data to the W-dimension(C2W) and replacing the Reducemax with the Maxpool can accomplish acceleration under certain circumstances.

Keywords: convolution, deconvolution, W2C, C2W, alignment, hardware accelerator

Procedia PDF Downloads 65
603 Embedded Test Framework: A Solution Accelerator for Embedded Hardware Testing

Authors: Arjun Kumar Rath, Titus Dhanasingh

Abstract:

Embedded product development requires software to test hardware functionality during development and finding issues during manufacturing in larger quantities. As the components are getting integrated, the devices are tested for their full functionality using advanced software tools. Benchmarking tools are used to measure and compare the performance of product features. At present, these tests are based on a variety of methods involving varying hardware and software platforms. Typically, these tests are custom built for every product and remain unusable for other variants. A majority of the tests goes undocumented, not updated, unusable when the product is released. To bridge this gap, a solution accelerator in the form of a framework can address these issues for running all these tests from one place, using an off-the-shelf tests library in a continuous integration environment. There are many open-source test frameworks or tools (fuego. LAVA, AutoTest, KernelCI, etc.) designed for testing embedded system devices, with each one having several unique good features, but one single tool and framework may not satisfy all of the testing needs for embedded systems, thus an extensible framework with the multitude of tools. Embedded product testing includes board bring-up testing, test during manufacturing, firmware testing, application testing, and assembly testing. Traditional test methods include developing test libraries and support components for every new hardware platform that belongs to the same domain with identical hardware architecture. This approach will have drawbacks like non-reusability where platform-specific libraries cannot be reused, need to maintain source infrastructure for individual hardware platforms, and most importantly, time is taken to re-develop test cases for new hardware platforms. These limitations create challenges like environment set up for testing, scalability, and maintenance. A desirable strategy is certainly one that is focused on maximizing reusability, continuous integration, and leveraging artifacts across the complete development cycle during phases of testing and across family of products. To get over the stated challenges with the conventional method and offers benefits of embedded testing, an embedded test framework (ETF), a solution accelerator, is designed, which can be deployed in embedded system-related products with minimal customizations and maintenance to accelerate the hardware testing. Embedded test framework supports testing different hardwares including microprocessor and microcontroller. It offers benefits such as (1) Time-to-Market: Accelerates board brings up time with prepacked test suites supporting all necessary peripherals which can speed up the design and development stage(board bring up, manufacturing and device driver) (2) Reusability-framework components isolated from the platform-specific HW initialization and configuration makes the adaptability of test cases across various platform quick and simple (3) Effective build and test infrastructure with multiple test interface options and preintegrated with FUEGO framework (4) Continuos integration - pre-integrated with Jenkins which enabled continuous testing and automated software update feature. Applying the embedded test framework accelerator throughout the design and development phase enables to development of the well-tested systems before functional verification and improves time to market to a large extent.

Keywords: board diagnostics software, embedded system, hardware testing, test frameworks

Procedia PDF Downloads 103
602 High Level Synthesis of Canny Edge Detection Algorithm on Zynq Platform

Authors: Hanaa M. Abdelgawad, Mona Safar, Ayman M. Wahba

Abstract:

Real-time image and video processing is a demand in many computer vision applications, e.g. video surveillance, traffic management and medical imaging. The processing of those video applications requires high computational power. Therefore, the optimal solution is the collaboration of CPU and hardware accelerators. In this paper, a Canny edge detection hardware accelerator is proposed. Canny edge detection is one of the common blocks in the pre-processing phase of image and video processing pipeline. Our presented approach targets offloading the Canny edge detection algorithm from processing system (PS) to programmable logic (PL) taking the advantage of High Level Synthesis (HLS) tool flow to accelerate the implementation on Zynq platform. The resulting implementation enables up to a 100x performance improvement through hardware acceleration. The CPU utilization drops down and the frame rate jumps to 60 fps of 1080p full HD input video stream.

Keywords: high level synthesis, canny edge detection, hardware accelerators, computer vision

Procedia PDF Downloads 440
601 Ion Thruster Grid Lifetime Assessment Based on Its Structural Failure

Authors: Juan Li, Jiawen Qiu, Yuchuan Chu, Tianping Zhang, Wei Meng, Yanhui Jia, Xiaohui Liu

Abstract:

This article developed an ion thruster optic system sputter erosion depth numerical 3D model by IFE-PIC (Immersed Finite Element-Particle-in-Cell) and Mont Carlo method, and calculated the downstream surface sputter erosion rate of accelerator grid; Compared with LIPS-200 life test data, the results of the numerical model are in reasonable agreement with the measured data. Finally, we predict the lifetime of the 20cm diameter ion thruster via the erosion data obtained with the model. The ultimate result demonstrates that under normal operating condition, the erosion rate of the grooves wears on the downstream surface of the accelerator grid is 34.6μm⁄1000h, which means the conservative lifetime until structural failure occurring on the accelerator grid is 11500 hours.

Keywords: ion thruster, accelerator gird, sputter erosion, lifetime assessment

Procedia PDF Downloads 521
600 Optimal Beam for Accelerator Driven Systems

Authors: M. Paraipan, V. M. Javadova, S. I. Tyutyunnikov

Abstract:

The concept of energy amplifier or accelerator driven system (ADS) involves the use of a particle accelerator coupled with a nuclear reactor. The accelerated particle beam generates a supplementary source of neutrons, which allows the subcritical functioning of the reactor, and consequently a safe exploitation. The harder neutron spectrum realized ensures a better incineration of the actinides. The almost generalized opinion is that the optimal beam for ADS is represented by protons with energy around 1 GeV (gigaelectronvolt). In the present work, a systematic analysis of the energy gain for proton beams with energy from 0.5 to 3 GeV and ion beams from deuteron to neon with energies between 0.25 and 2 AGeV is performed. The target is an assembly of metallic U-Pu-Zr fuel rods in a bath of lead-bismuth eutectic coolant. The rods length is 150 cm. A beryllium converter with length 110 cm is used in order to maximize the energy released in the target. The case of a linear accelerator is considered, with a beam intensity of 1.25‧10¹⁶ p/s, and a total accelerator efficiency of 0.18 for proton beam. These values are planned to be achieved in the European Spallation Source project. The energy gain G is calculated as the ratio between the energy released in the target to the energy spent to accelerate the beam. The energy released is obtained through simulation with the code Geant4. The energy spent is calculating by scaling from the data about the accelerator efficiency for the reference particle (proton). The analysis concerns the G values, the net power produce, the accelerator length, and the period between refueling. The optimal energy for proton is 1.5 GeV. At this energy, G reaches a plateau around a value of 8 and a net power production of 120 MW (megawatt). Starting with alpha, ion beams have a higher G than 1.5 GeV protons. A beam of 0.25 AGeV(gigaelectronvolt per nucleon) ⁷Li realizes the same net power production as 1.5 GeV protons, has a G of 15, and needs an accelerator length 2.6 times lower than for protons, representing the best solution for ADS. Beams of ¹⁶O or ²⁰Ne with energy 0.75 AGeV, accelerated in an accelerator with the same length as 1.5 GeV protons produce approximately 900 MW net power, with a gain of 23-25. The study of the evolution of the isotopes composition during irradiation shows that the increase in power production diminishes the period between refueling. For a net power produced of 120 MW, the target can be irradiated approximately 5000 days without refueling, but only 600 days when the net power reaches 1 GW (gigawatt).

Keywords: accelerator driven system, ion beam, electrical power, energy gain

Procedia PDF Downloads 105
599 Application Research of Stilbene Crystal for the Measurement of Accelerator Neutron Sources

Authors: Zhao Kuo, Chen Liang, Zhang Zhongbing, Ruan Jinlu. He Shiyi, Xu Mengxuan

Abstract:

Stilbene, C₁₄H₁₂, is well known as one of the most useful organic scintillators for pulse shape discrimination (PSD) technique for its good scintillation properties. An on-line acquisition system and an off-line acquisition system were developed with several CAMAC standard plug-ins, NIM plug-ins, neutron/γ discriminating plug-in named 2160A and a digital oscilloscope with high sampling rate respectively for which stilbene crystals and photomultiplier tube detectors (PMT) as detector for accelerator neutron sources measurement carried out in China Institute of Atomic Energy. Pulse amplitude spectrums and charge amplitude spectrums were real-time recorded after good neutron/γ discrimination whose best PSD figure-of-merits (FoMs) are 1.756 for D-D accelerator neutron source and 1.393 for D-T accelerator neutron source. The probability of neutron events in total events was 80%, and neutron detection efficiency was 5.21% for D-D accelerator neutron sources, which were 50% and 1.44% for D-T accelerator neutron sources after subtracting the background of scattering observed by the on-line acquisition system. Pulse waveform signals were acquired by the off-line acquisition system randomly while the on-line acquisition system working. The PSD FoMs obtained by the off-line acquisition system were 2.158 for D-D accelerator neutron sources and 1.802 for D-T accelerator neutron sources after waveform digitization off-line processing named charge integration method for just 1000 pulses. In addition, the probabilities of neutron events in total events obtained by the off-line acquisition system matched very well with the probabilities of the on-line acquisition system. The pulse information recorded by the off-line acquisition system could be repetitively used to adjust the parameters or methods of PSD research and obtain neutron charge amplitude spectrums or pulse amplitude spectrums after digital analysis with a limited number of pulses. The off-line acquisition system showed equivalent or better measurement effects compared with the online system with a limited number of pulses which indicated a feasible method based on stilbene crystals detectors for the measurement of prompt neutrons neutron sources like prompt accelerator neutron sources emit a number of neutrons in a short time.

Keywords: stilbene crystal, accelerator neutron source, neutron / γ discrimination, figure-of-merits, CAMAC, waveform digitization

Procedia PDF Downloads 143
598 On-Chip Sensor Ellipse Distribution Method and Equivalent Mapping Technique for Real-Time Hardware Trojan Detection and Location

Authors: Longfei Wang, Selçuk Köse

Abstract:

Hardware Trojan becomes great concern as integrated circuit (IC) technology advances and not all manufacturing steps of an IC are accomplished within one company. Real-time hardware Trojan detection is proven to be a feasible way to detect randomly activated Trojans that cannot be detected at testing stage. On-chip sensors serve as a great candidate to implement real-time hardware Trojan detection, however, the optimization of on-chip sensors has not been thoroughly investigated and the location of Trojan has not been carefully explored. On-chip sensor ellipse distribution method and equivalent mapping technique are proposed based on the characteristics of on-chip power delivery network in this paper to address the optimization and distribution of on-chip sensors for real-time hardware Trojan detection as well as to estimate the location and current consumption of hardware Trojan. Simulation results verify that hardware Trojan activation can be effectively detected and the location of a hardware Trojan can be efficiently estimated with less than 5% error for a realistic power grid using our proposed methods. The proposed techniques therefore lay a solid foundation for isolation and even deactivation of hardware Trojans through accurate location of Trojans.

Keywords: hardware trojan, on-chip sensor, power distribution network, power/ground noise

Procedia PDF Downloads 342
597 Numerical Solution Speedup of the Laplace Equation Using FPGA Hardware

Authors: Abbas Ebrahimi, Mohammad Zandsalimy

Abstract:

The main purpose of this study is to investigate the feasibility of using FPGA (Field Programmable Gate Arrays) chips as alternatives for the conventional CPUs to accelerate the numerical solution of the Laplace equation. FPGA is an integrated circuit that contains an array of logic blocks, and its architecture can be reprogrammed and reconfigured after manufacturing. Complex circuits for various applications can be designed and implemented using FPGA hardware. The reconfigurable hardware used in this paper is an SoC (System on a Chip) FPGA type that integrates both microprocessor and FPGA architectures into a single device. In the present study the Laplace equation is implemented and solved numerically on both reconfigurable hardware and CPU. The precision of results and speedups of the calculations are compared together. The computational process on FPGA, is up to 20 times faster than a conventional CPU, with the same data precision. An analytical solution is used to validate the results.

Keywords: accelerating numerical solutions, CFD, FPGA, hardware definition language, numerical solutions, reconfigurable hardware

Procedia PDF Downloads 349
596 Cortex-M3 Based Virtual Platform Implementation for Software Development

Authors: Jun Young Moon, Hyeonggeon Lee, Jong Tae Kim

Abstract:

In this paper, we present Cortex-M3 based virtual platform which can virtualize wearable hardware platform and evaluate hardware performance. Cortex-M3 is very popular microcontroller in wearable devices, hardware sensors and display devices. This platform can be used to implement software layer for specific hardware architecture. By using the proposed platform the software development process can be parallelized with hardware development process. We present internal mechanism to implement the proposed virtual platform and describe how to use the proposed platform to develop software by using case study which is low cost wearable device that uses Cortex-M3.

Keywords: electronic system level design, software development, virtual platform, wearable device

Procedia PDF Downloads 336
595 Performances of the Double-Crystal Setup at CERN SPS Accelerator for Physics beyond Colliders Experiments

Authors: Andrii Natochii

Abstract:

We are currently presenting the recent results from the CERN accelerator facilities obtained in the frame of the UA9 Collaboration. The UA9 experiment investigates how a tiny silicon bent crystal (few millimeters long) can be used for various high-energy physics applications. Due to the huge electrostatic field (tens of GV/cm) between crystalline planes, there is a probability for charged particles, impinging the crystal, to be trapped in the channeling regime. It gives a possibility to steer a high intensity and momentum beam by bending the crystal: channeled particles will follow the crystal curvature and deflect on the certain angle (from tens microradians for LHC to few milliradians for SPS energy ranges). The measurements at SPS, performed in 2017 and 2018, confirmed that the protons deflected by the first crystal, inserted in the primary beam halo, can be caught and channeled by the second crystal. In this configuration, we measure the single pass deflection efficiency of the second crystal and prove our opportunity to perform the fixed target experiment at SPS accelerator (LHC in the future).

Keywords: channeling, double-crystal setup, fixed target experiment, Timepix detector

Procedia PDF Downloads 111
594 Analysis of Lightweight Register Hardware Threat

Authors: Yang Luo, Beibei Wang

Abstract:

In this paper, we present a design methodology of lightweight register transfer level (RTL) hardware threat implemented based on a MAX II FPGA platform. The dynamic power consumed by the toggling of the various bit of registers as well as the dynamic power consumed per unit of logic circuits were analyzed. The hardware threat was designed taking advantage of the differences in dynamic power consumed per unit of logic circuits to hide the transfer information. The experiment result shows that the register hardware threat was successfully implemented by using different dynamic power consumed per unit of logic circuits to hide the key information of DES encryption module. It needs more than 100000 sample curves to reduce the background noise by comparing the sample space when it completely meets the time alignment requirement. In additional, an external trigger signal is playing a very important role to detect the hardware threat in this experiment.

Keywords: side-channel analysis, hardware Trojan, register transfer level, dynamic power

Procedia PDF Downloads 245
593 Hardware for Genetic Algorithm

Authors: Fariborz Ahmadi, Reza Tati

Abstract:

Genetic algorithm is a soft computing method that works on set of solutions. These solutions are called chromosome and the best one is the absolute solution of the problem. The main problem of this algorithm is that after passing through some generations, it may be produced some chromosomes that had been produced in some generations ago that causes reducing the convergence speed. From another respective, most of the genetic algorithms are implemented in software and less works have been done on hardware implementation. Our work implements genetic algorithm in hardware that doesn’t produce chromosome that have been produced in previous generations. In this work, most of genetic operators are implemented without producing iterative chromosomes and genetic diversity is preserved. Genetic diversity causes that not only do not this algorithm converge to local optimum but also reaching to global optimum. Without any doubts, proposed approach is so faster than software implementations. Evaluation results also show the proposed approach is faster than hardware ones.

Keywords: hardware, genetic algorithm, computer science, engineering

Procedia PDF Downloads 456
592 A Low-Area Fully-Reconfigurable Hardware Design of Fast Fourier Transform System for 3GPP-LTE Standard

Authors: Xin-Yu Shih, Yue-Qu Liu, Hong-Ru Chou

Abstract:

This paper presents a low-area and fully-reconfigurable Fast Fourier Transform (FFT) hardware design for 3GPP-LTE communication standard. It can fully support 32 different FFT sizes, up to 2048 FFT points. Besides, a special processing element is developed for making reconfigurable computing characteristics possible, while first-in first-out (FIFO) scheduling scheme design technique is proposed for hardware-friendly FIFO resource arranging. In a synthesis chip realization via TSMC 40 nm CMOS technology, the hardware circuit only occupies core area of 0.2325 mm2 and dissipates 233.5 mW at maximal operating frequency of 250 MHz.

Keywords: reconfigurable, fast Fourier transform (FFT), single-path delay feedback (SDF), 3GPP-LTE

Procedia PDF Downloads 237
591 Development of the Accelerator Applied to an Early Stage High-Strength Shotcrete

Authors: Ayanori Sugiyama, Takahisa Hanei, Yasuhide Higo

Abstract:

Domestic demand for the construction of tunnels has been increasing in recent years in Japan. To meet this demand, various construction materials and construction methods have been developed to attain higher strength, reduction of negative impact on the environment and improvement for working conditions. In this report, we would like to introduce the newly developed shotcrete with superior hardening properties which were tested through the actual machine scale and its workability and strength development were evaluated. As a result, this new tunnel construction method was found to achieve higher workability and quicker strength development in only a couple of minutes.

Keywords: accelerator, shotcrete, tunnel, high-strength

Procedia PDF Downloads 268
590 Design and Thermal Simulation Analysis of the Chinese Accelerator Driven Sub-Critical System Injector-I Cryomodule

Authors: Rui-Xiong Han, Rui Ge, Shao-Peng Li, Lin Bian, Liang-Rui Sun, Min-Jing Sang, Rui Ye, Ya-Ping Liu, Xiang-Zhen Zhang, Jie-Hao Zhang, Zhuo Zhang, Jian-Qing Zhang, Miao-Fu Xu

Abstract:

The Chinese Accelerator Driven Sub-critical system (C-ADS) uses a high-energy proton beam to bombard the metal target and generate neutrons to deal with the nuclear waste. The Chinese ADS proton linear has two 0~10 MeV injectors and one 10~1500 MeV superconducting linac. Injector-I is studied by the Institute of High Energy Physics (IHEP) under construction in the Beijing, China. The linear accelerator consists of two accelerating cryomodules operating at the temperature of 2 Kelvin. This paper describes the structure and thermal performances analysis of the cryomodule. The analysis takes into account all the main contributors (support posts, multilayer insulation, current leads, power couplers, and cavities) to the static and dynamic heat load at various cryogenic temperature levels. The thermal simulation analysis of the cryomodule is important theory foundation of optimization and commissioning.

Keywords: C-ADS, cryomodule, structure, thermal simulation, static heat load, dynamic heat load

Procedia PDF Downloads 357
589 Embedded Semantic Segmentation Network Optimized for Matrix Multiplication Accelerator

Authors: Jaeyoung Lee

Abstract:

Autonomous driving systems require high reliability to provide people with a safe and comfortable driving experience. However, despite the development of a number of vehicle sensors, it is difficult to always provide high perceived performance in driving environments that vary from time to season. The image segmentation method using deep learning, which has recently evolved rapidly, provides high recognition performance in various road environments stably. However, since the system controls a vehicle in real time, a highly complex deep learning network cannot be used due to time and memory constraints. Moreover, efficient networks are optimized for GPU environments, which degrade performance in embedded processor environments equipped simple hardware accelerators. In this paper, a semantic segmentation network, matrix multiplication accelerator network (MMANet), optimized for matrix multiplication accelerator (MMA) on Texas instrument digital signal processors (TI DSP) is proposed to improve the recognition performance of autonomous driving system. The proposed method is designed to maximize the number of layers that can be performed in a limited time to provide reliable driving environment information in real time. First, the number of channels in the activation map is fixed to fit the structure of MMA. By increasing the number of parallel branches, the lack of information caused by fixing the number of channels is resolved. Second, an efficient convolution is selected depending on the size of the activation. Since MMA is a fixed, it may be more efficient for normal convolution than depthwise separable convolution depending on memory access overhead. Thus, a convolution type is decided according to output stride to increase network depth. In addition, memory access time is minimized by processing operations only in L3 cache. Lastly, reliable contexts are extracted using the extended atrous spatial pyramid pooling (ASPP). The suggested method gets stable features from an extended path by increasing the kernel size and accessing consecutive data. In addition, it consists of two ASPPs to obtain high quality contexts using the restored shape without global average pooling paths since the layer uses MMA as a simple adder. To verify the proposed method, an experiment is conducted using perfsim, a timing simulator, and the Cityscapes validation sets. The proposed network can process an image with 640 x 480 resolution for 6.67 ms, so six cameras can be used to identify the surroundings of the vehicle as 20 frame per second (FPS). In addition, it achieves 73.1% mean intersection over union (mIoU) which is the highest recognition rate among embedded networks on the Cityscapes validation set.

Keywords: edge network, embedded network, MMA, matrix multiplication accelerator, semantic segmentation network

Procedia PDF Downloads 88
588 Lightweight Hardware Firewall for Embedded System Based on Bus Transactions

Authors: Ziyuan Wu, Yulong Jia, Xiang Zhang, Wanting Zhou, Lei Li

Abstract:

The Internet of Things (IoT) is a rapidly evolving field involving a large number of interconnected embedded devices. In the design of embedded System-on-Chip (SoC), the key issues are power consumption, performance, and security. However, the easy-to-implement software and untrustworthy third-party IP cores may threaten the safety of hardware assets. Considering that illegal access and malicious attacks against SoC resources pass through the bus that integrates IPs, we propose a Lightweight Hardware Firewall (LHF) to protect SoC, which monitors and disallows the offending bus transactions based on physical addresses. Furthermore, under the LHF architecture, this paper refines two types of firewalls: Destination Hardware Firewall (DHF) and Source Hardware Firewall (SHF). The former is oriented to fine-grained detection and configuration, whose core technology is based on the method of dynamic grading units. In addition, we design the SHF based on static entries to achieve lightweight. Finally, we evaluate the hardware consumption of the proposed method by both Field-Programmable Gate Array (FPGA) and IC. Compared with the exciting efforts, LHF introduces a bus latency of zero clock cycles for every read or write transaction implemented on Xilinx Kintex-7 FPGAs. Meanwhile, the DC synthesis results based on TSMC 90nm show that the area is reduced by about 25% compared with the previous method.

Keywords: IoT, security, SoC, bus architecture, lightweight hardware firewall, FPGA

Procedia PDF Downloads 17
587 Individual Actuators of a Car-Like Robot with Back Trailer

Authors: Tarek El-Derini, Ahmed El-Shenawy

Abstract:

This paper presents the hardware implemented and validation for a special system to assist the unprofessional users of car with back trailers. The system consists of two platforms; the front car platform (C) and the trailer platform (T). The main objective is to control the Trailer platform using the actuators found in the front platform (c). The mobility of the platform (C) is investigated and inverse and forward kinematics model is obtained for both platforms (C) and (T). The system is simulated using Matlab M-file and the simulation examples results illustrated the system performance. The system is constructed with a hardware setup for the front and trailer platform. The hardware experimental results and the simulated examples outputs showed the validation of the hardware setup.

Keywords: kinematics, modeling, robot, MATLAB

Procedia PDF Downloads 403
586 Hardware Error Analysis and Severity Characterization in Linux-Based Server Systems

Authors: Nikolaos Georgoulopoulos, Alkis Hatzopoulos, Konstantinos Karamitsios, Konstantinos Kotrotsios, Alexandros I. Metsai

Abstract:

In modern server systems, business critical applications run in different types of infrastructure, such as cloud systems, physical machines and virtualization. Often, due to high load and over time, various hardware faults occur in servers that translate to errors, resulting to malfunction or even server breakdown. CPU, RAM and hard drive (HDD) are the hardware parts that concern server administrators the most regarding errors. In this work, selected RAM, HDD and CPU errors, that have been observed or can be simulated in kernel ring buffer log files from two groups of Linux servers, are investigated. Moreover, a severity characterization is given for each error type. Better understanding of such errors can lead to more efficient analysis of kernel logs that are usually exploited for fault diagnosis and prediction. In addition, this work summarizes ways of simulating hardware errors in RAM and HDD, in order to test the error detection and correction mechanisms of a Linux server.

Keywords: hardware errors, Kernel logs, Linux servers, RAM, hard disk, CPU

Procedia PDF Downloads 113
585 Neutron Contamination in 18 MV Medical Linear Accelerator

Authors: Onur Karaman, A. Gunes Tanir

Abstract:

Photon radiation therapy used to treat cancer is one of the most important methods. However, photon beam collimator materials in Linear Accelerator (LINAC) head generally contains heavy elements is used and the interaction of bremsstrahlung photon with such heavy nuclei, the neutron can be produced inside the treatment rooms. In radiation therapy, neutron contamination contributes to the risk of secondary malignancies in patients, also physicians working in this field. Since the neutron is more dangerous than photon, it is important to determine neutron dose during radiotherapy treatment. In this study, it is aimed to analyze the effect of field size, distance from axis and depth on the amount of in-field and out-field neutron contamination for ElektaVmat accelerator with 18 MV nominal energy. The photon spectra at the distance of 75, 150, 225, 300 cm from target and on the isocenter of beam were scored for 5x5, 10x10, 20x20, 30x30 and 40x40 cm2 fields. Results demonstrated that the neutron spectra and dose are dependent on field size and distances. Beyond 225 cm of isocenter, the dependence of the neutron dose on field size is minimal. As a result, it is concluded that as the open field increases, neutron dose determined decreases. It is important to remember that when treating with high energy photons, the dose from contamination neutrons must be considered as it is much greater than the photon dose.

Keywords: radiotherapy, neutron contamination, linear accelerators, photon

Procedia PDF Downloads 308
584 Hardware-in-the-Loop Test for Automatic Voltage Regulator of Synchronous Condenser

Authors: Ha Thi Nguyen, Guangya Yang, Arne Hejde Nielsen, Peter Højgaard Jensen

Abstract:

Automatic voltage regulator (AVR) plays an important role in volt/var control of synchronous condenser (SC) in power systems. Test AVR performance in steady-state and dynamic conditions in real grid is expensive, low efficiency, and hard to achieve. To address this issue, we implement hardware-in-the-loop (HiL) test for the AVR of SC to test the steady-state and dynamic performances of AVR in different operating conditions. Startup procedure of the system and voltage set point changes are studied to evaluate the AVR hardware response. Overexcitation, underexcitation, and AVR set point loss are tested to compare the performance of SC with the AVR hardware and that of simulation. The comparative results demonstrate how AVR will work in a real system. The results show HiL test is an effective approach for testing devices before deployment and is able to parameterize the controller with lower cost, higher efficiency, and more flexibility.

Keywords: automatic voltage regulator, hardware-in-the-loop, synchronous condenser, real time digital simulator

Procedia PDF Downloads 213
583 Hardware Implementation of Local Binary Pattern Based Two-Bit Transform Motion Estimation

Authors: Seda Yavuz, Anıl Çelebi, Aysun Taşyapı Çelebi, Oğuzhan Urhan

Abstract:

Nowadays, demand for using real-time video transmission capable devices is ever-increasing. So, high resolution videos have made efficient video compression techniques an essential component for capturing and transmitting video data. Motion estimation has a critical role in encoding raw video. Hence, various motion estimation methods are introduced to efficiently compress the video. Low bit‑depth representation based motion estimation methods facilitate computation of matching criteria and thus, provide small hardware footprint. In this paper, a hardware implementation of a two-bit transformation based low-complexity motion estimation method using local binary pattern approach is proposed. Image frames are represented in two-bit depth instead of full-depth by making use of the local binary pattern as a binarization approach and the binarization part of the hardware architecture is explained in detail. Experimental results demonstrate the difference between the proposed hardware architecture and the architectures of well-known low-complexity motion estimation methods in terms of important aspects such as resource utilization, energy and power consumption.

Keywords: binarization, hardware architecture, local binary pattern, motion estimation, two-bit transform

Procedia PDF Downloads 268
582 Adaptive Data Approximations Codec (ADAC) for AI/ML-based Cyber-Physical Systems

Authors: Yong-Kyu Jung

Abstract:

The fast growth in information technology has led to de-mands to access/process data. CPSs heavily depend on the time of hardware/software operations and communication over the network (i.e., real-time/parallel operations in CPSs (e.g., autonomous vehicles). Since data processing is an im-portant means to overcome the issue confronting data management, reducing the gap between the technological-growth and the data-complexity and channel-bandwidth. An adaptive perpetual data approximation method is intro-duced to manage the actual entropy of the digital spectrum. An ADAC implemented as an accelerator and/or apps for servers/smart-connected devices adaptively rescales digital contents (avg.62.8%), data processing/access time/energy, encryption/decryption overheads in AI/ML applications (facial ID/recognition).

Keywords: adaptive codec, AI, ML, HPC, cyber-physical, cybersecurity

Procedia PDF Downloads 40
581 Absorbed Dose Measurements for Teletherapy Prediction of Superficial Dose Using Halcyon Linear Accelerator

Authors: Raymond Limen Njinga, Adeneye Samuel Olaolu, Akinyode Ojumoola Ajimo

Abstract:

Introduction: Measurement of entrance dose and dose at different depths is essential to avoid overdose and underdose of patients. The aim of this study is to verify the variation in the absorbed dose using a water-equivalent material. Materials and Methods: The plastic phantom was arranged on the couch of the halcyon linear accelerator by Varian, with the farmer ionization chamber inserted and connected to the electrometer. The image of the setup was taken using the High-Quality Single 1280x1280x16 higher on the service mode to check the alignment with the isocenter. The beam quality TPR₂₀,₁₀ (Tissue phantom ratio) was done to check the beam quality of the machine at a field size of 10 cm x 10 cm. The calibration was done using SAD type set-up at a depth of 5 cm. This process was repeated for ten consecutive weeks, and the values were recorded. Results: The results of the beam output for the teletherapy machine were satisfactory and accepted in comparison with the commissioned measurement of 0.62. The beam quality TPR₂₀,₁₀ (Tissue phantom ratio) was reasonable with respect to the beam quality of the machine at a field size of 10 cm x 10 cm. Conclusion: The results of the beam quality and the absorbed dose rate showed a good consistency over the period of ten weeks with the commissioned measurement value.

Keywords: linear accelerator, absorbed dose rate, isocenter, phantom, ionization chamber

Procedia PDF Downloads 9
580 Design and Development of Permanent Magnet Quadrupoles for Low Energy High Intensity Proton Accelerator

Authors: Vikas Teotia, Sanjay Malhotra, Elina Mishra, Prashant Kumar, R. R. Singh, Priti Ukarde, P. P. Marathe, Y. S. Mayya

Abstract:

Bhabha Atomic Research Centre, Trombay is developing low energy high intensity Proton Accelerator (LEHIPA) as pre-injector for 1 GeV proton accelerator for accelerator driven sub-critical reactor system (ADSS). LEHIPA consists of RFQ (Radio Frequency Quadrupole) and DTL (Drift Tube Linac) as major accelerating structures. DTL is RF resonator operating in TM010 mode and provides longitudinal E-field for acceleration of charged particles. The RF design of drift tubes of DTL was carried out to maximize the shunt impedance; this demands the diameter of drift tubes (DTs) to be as low as possible. The width of the DT is however determined by the particle β and trade-off between a transit time factor and effective accelerating voltage in the DT gap. The array of Drift Tubes inside DTL shields the accelerating particle from decelerating RF phase and provides transverse focusing to the charged particles which otherwise tends to diverge due to Columbic repulsions and due to transverse e-field at entry of DTs. The magnetic lenses housed inside DTS controls the transverse emittance of the beam. Quadrupole magnets are preferred over solenoid magnets due to relative high focusing strength of former over later. The availability of small volume inside DTs for housing magnetic quadrupoles has motivated the usage of permanent magnet quadrupoles rather than Electromagnetic Quadrupoles (EMQ). This provides another advantage as joule heating is avoided which would have added thermal loaded in the continuous cycle accelerator. The beam dynamics requires uniformity of integral magnetic gradient to be better than ±0.5% with the nominal value of 2.05 tesla. The paper describes the magnetic design of the PMQ using Sm2Co17 rare earth permanent magnets. The paper discusses the results of five pre-series prototype fabrications and qualification of their prototype permanent magnet quadrupoles and a full scale DT developed with embedded PMQs. The paper discusses the magnetic pole design for optimizing integral Gdl uniformity and the value of higher order multipoles. A novel but simple method of tuning the integral Gdl is discussed.

Keywords: DTL, focusing, PMQ, proton, rate earth magnets

Procedia PDF Downloads 427
579 On the Design of Electronic Control Unitsfor the Safety-Critical Vehicle Applications

Authors: Kyung-Jung Lee, Hyun-Sik Ahn

Abstract:

This paper suggests a design methodology for the hardware and software of the Electronic Control Unit (ECU) of safety-critical vehicle applications such as braking and steering. The architecture of the hardware is a high integrity system such that it incorporates a high performance 32-bit CPU and a separate Peripheral Control-Processor (PCP) together with an external watchdog CPU. Communication between the main CPU and the PCP is executed via a common area of RAM and events on either processor which are invoked by interrupts. Safety-related software is also implemented to provide a reliable, self-testing computing environment for safety critical and high integrity applications. The validity of the design approach is shown by using the Hardware-in-the-Loop Simulation (HILS) for Electric Power Steering (EPS) systems which consists of the EPS mechanism, the designed ECU, and monitoring tools.

Keywords: electronic control unit, electric power steering, functional safety, hardware-in-the-loop simulation

Procedia PDF Downloads 259
578 Adaptive Multiple Transforms Hardware Architecture for Versatile Video Coding

Authors: T. Damak, S. Houidi, M. A. Ben Ayed, N. Masmoudi

Abstract:

The Versatile Video Coding standard (VVC) is actually under development by the Joint Video Exploration Team (or JVET). An Adaptive Multiple Transforms (AMT) approach was announced. It is based on different transform modules that provided an efficient coding. However, the AMT solution raises several issues especially regarding the complexity of the selected set of transforms. This can be an important issue, particularly for a future industrial adoption. This paper proposed an efficient hardware implementation of the most used transform in AMT approach: the DCT II. The developed circuit is adapted to different block sizes and can reach a minimum frequency of 192 MHz allowing an optimized execution time.

Keywords: adaptive multiple transforms, AMT, DCT II, hardware, transform, versatile video coding, VVC

Procedia PDF Downloads 110
577 Advanced Mechatronic Design of Robot Manipulator Using Hardware-In-The-Loop Simulation

Authors: Reza Karami, Ali Akbar Ebrahimi

Abstract:

This paper discusses concurrent engineering of robot manipulators, based on the Holistic Concurrent Design (HCD) methodology and by using a hardware-in-the-loop simulation platform. The methodology allows for considering numerous design variables with different natures concurrently. It redefines the ultimate goal of design based on the notion of satisfaction, resulting in the simplification of the multi-objective constrained optimization process. It also formalizes the effect of designer’s subjective attitude in the process. To enhance modeling efficiency for both computation and accuracy, a hardware-in-the-loop simulation platform is used, which involves physical joint modules and the control unit in addition to the software modules. This platform is implemented in the HCD design architecture to reliably evaluate the design attributes and performance super criterion during the design process. The resulting overall architecture is applied to redesigning kinematic, dynamic and control parameters of an industrial robot manipulator.

Keywords: concurrent engineering, hardware-in-the-loop simulation, robot manipulator, multidisciplinary systems, mechatronics

Procedia PDF Downloads 408