Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 1497

Search results for: carry save adder Karatsuba multiplication

1497 Very Large Scale Integration Architecture of Finite Impulse Response Filter Implementation Using Retiming Technique

Authors: S. Jalaja, A. M. Vijaya Prakash

Abstract:

Recursive combination of an algorithm based on Karatsuba multiplication is exploited to design a generalized transpose and parallel Finite Impulse Response (FIR) Filter. Mid-range Karatsuba multiplication and Carry Save adder based on Karatsuba multiplication reduce time complexity for higher order multiplication implemented up to n-bit. As a result, we design modified N-tap Transpose and Parallel Symmetric FIR Filter Structure using Karatsuba algorithm. The mathematical formulation of the FFA Filter is derived. The proposed architecture involves significantly less area delay product (APD) then the existing block implementation. By adopting retiming technique, hardware cost is reduced further. The filter architecture is designed by using 90 nm technology library and is implemented by using cadence EDA Tool. The synthesized result shows better performance for different word length and block size. The design achieves switching activity reduction and low power consumption by applying with and without retiming for different combination of the circuit. The proposed structure achieves more than a half of the power reduction by adopting with and without retiming techniques compared to the earlier design structure. As a proof of the concept for block size 16 and filter length 64 for CKA method, it achieves a 51% as well as 70% less power by applying retiming technique, and for CSA method it achieves a 57% as well as 77% less power by applying retiming technique compared to the previously proposed design.

Keywords: carry save adder Karatsuba multiplication, mid range Karatsuba multiplication, modified FFA and transposed filter, retiming

Procedia PDF Downloads 202

1496 Area Efficient Carry Select Adder Using XOR Gate Design

Authors: Mahendrapal Singh Pachlaniya, Laxmi Kumre

Abstract:

The AOI (AND – OR- INVERTER) based design of XOR gate is proposed in this paper with less number of gates. This new XOR gate required four basic gates and basic gate include only AND, OR, Inverter (AOI). Conventional XOR gate required five basic gates. Ripple Carry Adder (RCA) used in parallel addition but propagation delay time is large. RCA replaced with Carry Select Adder (CSLA) to reduce propagation delay time. CSLA design with dual RCA considering carry = ‘0’ and carry = ‘1’, so it is not an area efficient adder. To make area efficient, modified CSLA is designed with single RCA considering carry = ‘0’ and another RCA considering carry = ‘1’ replaced with Binary to Excess 1 Converter (BEC). Now replacement of conventional XOR gate by new design of XOR gate in modified CSLA reduces much area compared to regular CSLA and modified CSLA.

Keywords: CSLA, BEC, XOR gate, area efficient

Procedia PDF Downloads 329

1495 Design of Speedy, Scanty Adder for Lossy Application Using QCA

Authors: T. Angeline Priyanka, R. Ganesan

Abstract:

Recent trends in microelectronics technology have gradually changed the strategies used in very large scale integration (VLSI) circuits. Complementary Metal Oxide Semiconductor (CMOS) technology has been the industry standard for implementing VLSI device for the past two decades, but due to scale-down issues of ultra-low dimension achievement is not achieved so far. Hence it paved a way for Quantum Cellular Automata (QCA). It is only one of the many alternative technologies proposed as a replacement solution to the fundamental limit problem that CMOS technology will impose in the years to come. In this brief, presented a new adder that possesses high speed of operation occupying less area is proposed. This adder is designed especially for error tolerant application. Hence in the proposed adder, the overall area (cell count) and simulation time are reduced by 88 and 73 percent respectively. Various results of the proposed adder are shown and described.

Keywords: quantum cellular automata, carry look ahead adder, ripple carry adder, lossy application, majority gate, crossover

Procedia PDF Downloads 528

1494 Scalable Systolic Multiplier over Binary Extension Fields Based on Two-Level Karatsuba Decomposition

Authors: Chiou-Yng Lee, Wen-Yo Lee, Chieh-Tsai Wu, Cheng-Chen Yang

Abstract:

Shifted polynomial basis (SPB) is a variation of polynomial basis representation. SPB has potential for efficient bit-level and digit-level implementations of multiplication over binary extension fields with subquadratic space complexity. For efficient implementation of pairing computation with large finite fields, this paper presents a new SPB multiplication algorithm based on Karatsuba schemes, and used that to derive a novel scalable multiplier architecture. Analytical results show that the proposed multiplier provides a trade-off between space and time complexities. Our proposed multiplier is modular, regular, and suitable for very-large-scale integration (VLSI) implementations. It involves less area complexity compared to the multipliers based on traditional decomposition methods. It is therefore, more suitable for efficient hardware implementation of pairing based cryptography and elliptic curve cryptography (ECC) in constraint driven applications.

Keywords: digit-serial systolic multiplier, elliptic curve cryptography (ECC), Karatsuba algorithm (KA), shifted polynomial basis (SPB), pairing computation

Procedia PDF Downloads 331

1493 Performance Analysis of Arithmetic Units for IoT Applications

Authors: Nithiya C., Komathi B. J., Praveena N. G., Samuda Prathima

Abstract:

At present, the ultimate aim in digital system designs, especially at the gate level and lower levels of design abstraction, is power optimization. Adders are a nearly universal component of today's integrated circuits. Most of the research was on the design of high-speed adders to execute addition based on various adder structures. This paper discusses the ideal path for selecting an arithmetic unit for IoT applications. Based on the analysis of eight types of 16-bit adders, we found out Carry Look-ahead (CLA) produces low power. Additionally, multiplier and accumulator (MAC) unit is implemented with the Booth multiplier by using the low power adders in the order of preference. The design is synthesized and verified using Synopsys Design Compiler and VCS. Then it is implemented by using Cadence Encounter. The total power consumed by the CLA based booth multiplier is 0.03527mW, the total area occupied is 11260 um², and the speed is 2034 ps.

Keywords: carry look-ahead, carry select adder, CSA, internet of things, ripple carry adder, design rule check, power delay product, multiplier and accumulator

Procedia PDF Downloads 93

1492 An Embedded High Speed Adder for Arithmetic Computations

Authors: Kala Bharathan, R. Seshasayanan

Abstract:

In this paper, a 1-bit Embedded Logic Full Adder (EFA) circuit in transistor level is proposed, which reduces logic complexity, gives low power and high speed. The design is further extended till 64 bits. To evaluate the performance of EFA, a 16, 32, 64-bit both Linear and Square root Carry Select Adder/Subtractor (CSLAS) Structure is also proposed. Realistic testing of proposed circuits is done on 8 X 8 Modified Booth multiplier and comparison in terms of power and delay is done. The EFA is implemented for different multiplier architectures for performance parameter comparison. Overall delay for CSLAS is reduced to 78% when compared to conventional one. The circuit implementations are done on TSMC 28nm CMOS technology using Cadence Virtuoso tool. The EFA has power savings of up to 14% when compared to the conventional adder. The present implementation was found to offer significant improvement in terms of power and speed in comparison to other full adder circuits.

Keywords: embedded logic, full adder, pdp, xor gate

Procedia PDF Downloads 420

1491 A Design of Elliptic Curve Cryptography Processor based on SM2 over GF(p)

Authors: Shiji Hu, Lei Li, Wanting Zhou, DaoHong Yang

Abstract:

The data encryption, is the foundation of today’s communication. On this basis, how to improve the speed of data encryption and decryption is always a problem that scholars work for. In this paper, we proposed an elliptic curve crypto processor architecture based on SM2 prime field. In terms of hardware implementation, we optimized the algorithms in different stages of the structure. In finite field modulo operation, we proposed an optimized improvement of Karatsuba-Ofman multiplication algorithm, and shorten the critical path through pipeline structure in the algorithm implementation. Based on SM2 recommended prime field, a fast modular reduction algorithm is used to reduce 512-bit wide data obtained from the multiplication unit. The radix-4 extended Euclidean algorithm was used to realize the conversion between affine coordinate system and Jacobi projective coordinate system. In the parallel scheduling of point operations on elliptic curves, we proposed a three-level parallel structure of point addition and point double based on the Jacobian projective coordinate system. Combined with the scalar multiplication algorithm, we added mutual pre-operation to the point addition and double point operation to improve the efficiency of the scalar point multiplication. The proposed ECC hardware architecture was verified and implemented on Xilinx Virtex-7 and ZYNQ-7 platforms, and each 256-bit scalar multiplication operation took 0.275ms. The performance for handling scalar multiplication is 32 times that of CPU(dual-core ARM Cortex-A9).

Keywords: Elliptic curve cryptosystems, SM2, modular multiplication, point multiplication.

Procedia PDF Downloads 54

1490 Design and Study of a Low Power High Speed Full Adder Using GDI Multiplexer

Authors: Biswarup Mukherjee, Aniruddha Ghosal

Abstract:

In this paper, we propose a new technique for implementing a low power full adder using a set of GDI multiplexers. Full adder circuits are used comprehensively in Application Specific Integrated Circuits (ASICs). Thus it is desirable to have low power operation for the sub components. The explored method of implementation achieves a low power design for the full adder. Simulated results using state-of-art Tanner tool indicates the superior performance of the proposed technique over conventional CMOS full adder. Detailed comparison of simulated results for the conventional and present method of implementation is presented.

Keywords: low power full adder, 2-T GDI MUX, ASIC (application specific integrated circuit), 12-T FA, CMOS (complementary metal oxide semiconductor)

Procedia PDF Downloads 318

1489 Design and Study of a Low Power High Speed 8 Transistor Based Full Adder Using Multiplexer and XOR Gates

Authors: Biswarup Mukherjee, Aniruddha Ghoshal

Abstract:

In this paper, we propose a new technique for implementing a low power high speed full adder using 8 transistors. Full adder circuits are used comprehensively in Application Specific Integrated Circuits (ASICs). Thus it is desirable to have high speed operation for the sub components. The explored method of implementation achieves a high speed low power design for the full adder. Simulated results indicate the superior performance of the proposed technique over conventional 28 transistor CMOS full adder. Detailed comparison of simulated results for the conventional and present method of implementation is presented.

Keywords: high speed low power full adder, 2-T MUX, 3-T XOR, 8-T FA, pass transistor logic, CMOS (complementary metal oxide semiconductor)

Procedia PDF Downloads 314

1488 Optimization of SWL Algorithms Using Alternative Adder Module in FPGA

Authors: Tayab D. Memon, Shahji Farooque, Marvi Deshi, Imtiaz Hussain Kalwar, B. S. Chowdhry

Abstract:

Recently single-bit ternary FIR-like filter (SBTFF) hardware synthesize in FPGA is reported and compared with multi-bit FIR filter on similar spectral characteristics. Results shows that SBTFF dominates upon multi-bit filter overall. In this paper, an optimized adder module for ternary quantized sigma-delta modulated signal is presented. The adder is simulated using ModelSim for functional verification the area-performance of the proposed adder were obtained through synthesis in Xilinx and compared to conventional adder trees. The synthesis results show that the proposed adder tree achieves higher clock rates and lower chip area at higher inputs to the adder block; whereas conventional adder tree achieves better performance and lower chip area at lower number of inputs to the same adder block. These results enhance the usefulness of existing short word length DSP algorithms for fast and efficient mobile communication.

Keywords: short word length (SWL), DSP algorithms, FPGA, SBTFF, VHDL

Procedia PDF Downloads 309

1487 The Fallacy around Inserting Brackets to Evaluate Expressions Involving Multiplication and Division

Authors: Manduth Ramchander

Abstract:

Evaluating expressions involving multiplication and division can give rise to the fallacy that brackets can be arbitrarily inserted into expressions involving multiplication and division. The aim of this article was to draw upon mathematical theory to prove that brackets cannot be arbitrarily inserted into expressions involving multiplication and division and in particular in expressions where division precedes multiplication. In doing so, it demonstrates that the notion that two different answers are possible, when evaluating expressions involving multiplication and division, is indeed a false one. Searches conducted in a number of scholarly databases unearthed the rules to be applied when removing brackets from expressions, which revealed that consideration needs to be given to sign changes when brackets are removed. The rule pertaining to expressions involving multiplication and division was then extended upon, in its reverse format, to prove that brackets cannot be arbitrarily inserted into expressions involving multiplication and division. The application of the rule demonstrates that an expression involving multiplication and division can have only one correct answer. It is recommended that both the rule and its reverse be included in the curriculum, preferably at the juncture when manipulation with brackets is introduced.

Keywords: brackets, multiplications and division, operations, order

Procedia PDF Downloads 124

1486 An Adder with Novel PMOS and NMOS for Ultra Low Power Applications in Deep Submicron Technology

Authors: Ch. Ashok Babu, J. V. R. Ravindra, K. Lalkishore

Abstract:

Power has became a burning issue in modern VLSI design. As the technology advances especially below 45nm, technology of leakage power became a big problem apart of the dynamic power. This paper presents a full adder with novel PMOS and NMOS which consume less power compare to conventional full adder, DTMOS full adder. This paper shows different types of adders and their power consumption, area, and delay. All the experiments have been carried out using Cadence® Virtuoso® design lay out editor which shows power consumption of different types of adders.

Keywords: average power, leakage power, delay, DTMOS, PDP

Procedia PDF Downloads 361

1485 Analytical Comparison of Conventional Algorithms with Vedic Algorithm for Digital Multiplier

Authors: Akhilesh G. Naik, Dipankar Pal

Abstract:

In today’s scenario, the complexity of digital signal processing (DSP) applications and various microcontroller architectures have been increasing to such an extent that the traditional approaches to multiplier design in most processors are becoming outdated for being comparatively slow. Modern processing applications require suitable pipelined approaches, and therefore, algorithms that are friendlier with pipelined architectures. Traditional algorithms like Wallace Tree, Radix-4 Booth, Radix-8 Booth, Dadda architectures have been proven to be comparatively slow for pipelined architectures. These architectures, therefore, need to be optimized or combined with other architectures amongst them to enhance its performances and to be made suitable for pipelined hardware/architectures. Recently, Vedic algorithm mathematically has proven to be efficient by appearing to be less complex and with fewer steps for its output establishment and have assumed renewed importance. This paper describes and shows how the Vedic algorithm can be better suited for pipelined architectures and also can be combined with traditional architectures and algorithms for enhancing its ability even further. In this paper, we also established that for complex applications on DSP and other microcontroller architectures, using Vedic approach for multiplication proves to be the best available and efficient option.

Keywords: Wallace Tree, Radix-4 Booth, Radix-8 Booth, Dadda, Vedic, Single-Stage Karatsuba (SSK), Looped Karatsuba (LK)

Procedia PDF Downloads 139

1484 Low-Complexity Multiplication Using Complement and Signed-Digit Recoding Methods

Authors: Te-Jen Chang, I-Hui Pan, Ping-Sheng Huang, Shan-Jen Cheng

Abstract:

In this paper, a fast multiplication computing method utilizing the complement representation method and canonical recoding technique is proposed. By performing complements and canonical recoding technique, the number of partial products can be reduced. Based on these techniques, we propose an algorithm that provides an efficient multiplication method. On average, our proposed algorithm is to reduce the number of k-bit additions from (0.25k+logk/k+2.5) to (k/6 +logk/k+2.5), where k is the bit-length of the multiplicand A and multiplier B. We can therefore efficiently speed up the overall performance of the multiplication. Moreover, if we use the new proposes to compute common-multiplicand multiplication, the computational complexity can be reduced from (0.5 k+2 logk/k+5) to (k/3+2 logk/k+5) k-bit additions.

Keywords: algorithm design, complexity analysis, canonical recoding, public key cryptography, common-multiplicand multiplication

Procedia PDF Downloads 401

1483 Design of Reconfigurable Fixed-Point LMS Adaptive FIR Filter

Authors: S. Padmapriya, V. Lakshmi Prabha

Abstract:

In this paper, an efficient reconfigurable fixed-point Least Mean Square Adaptive FIR filter is proposed. The proposed architecture has two methods of operation: one is area efficient design and the other is optimized power. Pipelining of the adder blocks and partial product generator are used to achieve low area and reversible logic is used to obtain low power design. Depending upon the input samples and filter coefficients, one of the techniques is chosen. Least-Mean-Square adaptation is performed to update the weights. The architecture is coded using Verilog and synthesized in cadence encounter 0.18μm technology. The synthesized results show that the area reduction ratio of the proposed when compared with conventional technique is about 1.2%.

Keywords: adaptive filter, carry select adder, least mean square algorithm, reversible logic

Procedia PDF Downloads 296

1482 Design and Construction of an Intelligent Multiplication Table for Enhanced Education and Increased Student Engagement

Authors: Zahra Alikhani Koopaei

Abstract:

In the fifth lesson of the third-grade mathematics book, students are introduced to the concept of multiplication. However, some students showed a lack of interest in learning this topic. To address this, a simple electronic multiplication table was designed with the aim of making the concept of multiplication entertaining and engaging for students. It provides them with moments of excitement during the learning process. To achieve this goal, a device was created that produced a bell sound when two wire ends were connected. Each wire end was connected to a specific number in the multiplication table, and the other end was linked to the corresponding answer. Consequently, if the answer is correct, the bell will ring. This study employs interactive and engaging methods to teach mathematics, particularly to students who have previously shown little interest in the subject. By integrating game-based learning and critical thinking, we observed an increase in understanding and interest in learning multiplication compared to before using this method. This further motivated the students. As a result, the intelligent multiplication table was successfully designed. Students, under the instructor's supervision, could easily construct the device during the lesson. Through the implementation of these operations, the concept of multiplication was firmly established in the students' minds. Engaging multiple intelligences in each student enhances a more stable and improved understanding of the concept of multiplication.

Keywords: intelligent multiplication table, design, construction, education, increased interest, students

Procedia PDF Downloads 32

1481 A Fault-Tolerant Full Adder in Double Pass CMOS Transistor

Authors: Abdelmonaem Ayachi, Belgacem Hamdi

Abstract:

This paper presents a fault-tolerant implementation for adder schemes using the dual duplication code. To prove the efficiency of the proposed method, the circuit is simulated in double pass transistor CMOS 32nm technology and some transient faults are voluntary injected in the Layout of the circuit. This fully differential implementation requires only 20 transistors which mean that the proposed design involves 28.57% saving in transistor count compared to standard CMOS technology.

Keywords: digital electronics, integrated circuits, full adder, 32nm CMOS tehnology, double pass transistor technology, fault toleance, self-checking

Procedia PDF Downloads 312

1480 Modified Montgomery for RSA Cryptosystem

Authors: Rupali Verma, Maitreyee Dutta, Renu Vig

Abstract:

Encryption and decryption in RSA are done by modular exponentiation which is achieved by repeated modular multiplication. Hence, efficiency of modular multiplication directly determines the efficiency of RSA cryptosystem. This paper designs a Modified Montgomery Modular multiplication in which addition of operands is computed by 4:2 compressor. The basic logic operations in addition are partitioned over two iterations such that parallel computations are performed. This reduces the critical path delay of proposed Montgomery design. The proposed design and RSA are implemented on Virtex 2 and Virtex 5 FPGAs. The two factors partitioning and parallelism have improved the frequency and throughput of proposed design.

Keywords: RSA, montgomery modular multiplication, 4:2 compressor, FPGA

Procedia PDF Downloads 380

1479 Parallel Computing: Offloading Matrix Multiplication to GPU

Authors: Bharath R., Tharun Sai N., Bhuvan G.

Abstract:

This project focuses on developing a Parallel Computing method aimed at optimizing matrix multiplication through GPU acceleration. Addressing algorithmic challenges, GPU programming intricacies, and integration issues, the project aims to enhance efficiency and scalability. The methodology involves algorithm design, GPU programming, and optimization techniques. Future plans include advanced optimizations, extended functionality, and integration with high-level frameworks. User engagement is emphasized through user-friendly interfaces, open- source collaboration, and continuous refinement based on feedback. The project's impact extends to significantly improving matrix multiplication performance in scientific computing and machine learning applications.

Keywords: matrix multiplication, parallel processing, cuda, performance boost, neural networks

Procedia PDF Downloads 10

1478 Efficient Semi-Systolic Finite Field Multiplier Using Redundant Basis

Authors: Hyun-Ho Lee, Kee-Won Kim

Abstract:

The arithmetic operations over GF(2m) have been extensively used in error correcting codes and public-key cryptography schemes. Finite field arithmetic includes addition, multiplication, division and inversion operations. Addition is very simple and can be implemented with an extremely simple circuit. The other operations are much more complex. The multiplication is the most important for cryptosystems, such as the elliptic curve cryptosystem, since computing exponentiation, division, and computing multiplicative inverse can be performed by computing multiplication iteratively. In this paper, we present a parallel computation algorithm that operates Montgomery multiplication over finite field using redundant basis. Also, based on the multiplication algorithm, we present an efficient semi-systolic multiplier over finite field. The multiplier has less space and time complexities compared to related multipliers. As compared to the corresponding existing structures, the multiplier saves at least 5% area, 50% time, and 53% area-time (AT) complexity. Accordingly, it is well suited for VLSI implementation and can be easily applied as a basic component for computing complex operations over finite field, such as inversion and division operation.

Keywords: finite field, Montgomery multiplication, systolic array, cryptography

Procedia PDF Downloads 251

1477 Integrating Indigenous Students’ Funds of Knowledge to Introduce Multiplication with a Picture Storybook

Authors: Murni Sianturi, Andreas Au Hurit

Abstract:

The low level of Indigenous Papuan students’ literacy and numeracy in Merauke Regency-Indonesia needs to be considered. The development of a learnable storybook with pictures related to their lives might raise their curiosity to read. This study aimed to design a storybook as a complementary resource for the third graders using Indigenous Malind cultural approaches by employing research and development methods. The product developed was a thematic-integrative picture storybook using funds of knowledge from Indigenous students. All the book contents depicted Indigenous students’ lives and were in line with the national curriculum syllabus, specifically representing one sub-theme−multiplication topic. Multiplication material of grade 3 was modified in the form of a story, and at the end of the reading, students were given several multiplication exercises. Based on the results of the evaluation from the expert team, it was found that the average score was in the excellent category. The students’ and teacher’s responses to the storybook were very positive. Students were thrilled when reading this book and also effortlessly understood the concept of multiplication. Therefore, this book might be used as a companion book to the main book and serve as introductory reading material for students prior to discussing multiplication material.

Keywords: a picture storybook, funds of knowledge, Indigenous elementary students, literacy, numeracy

Procedia PDF Downloads 157

1476 Magnification Factor Based Seismic Response of Moment Resisting Frames with Open Ground Storey

Authors: Subzar Ahmad Bhat, Saraswati Setia, V. K.Sehgal

Abstract:

During the past earthquakes, open ground storey buildings have performed poorly due to the soft storey defect. Indian Standard IS 1893:2002 allows analysis of open ground storey buildings without considering infill stiffness but with a multiplication factor 2.5 in compensation for the stiffness discontinuity. Therefore, the aim of this paper is to check the applicability of the multiplication factor of 2.5 and study behaviour of the structure after the application of the multiplication factor. For this purpose, study is performed on models considering infill stiffness using SAP 2000 (Version 14) by linear static analysis and response spectrum analysis. Total seven models are analysed and designed for the range of multiplication factor ranging from 1.25 to 2.5. The value of multiplication factor equal to 2.5 has been found on the higher side, resulting in increased dimension and percentage of reinforcement without significant enhancement beyond a certain multiplication factor. When the building with OGS is designed for values of MF higher than 1.25 considering infill stiffness soft storey effect shifts from ground storey to first storey. For the analysis of the OGS structure best way to analysis the structure is to analyse it as the frame with stiffness and strength of the infill taken into account. The provision of infill walls in the upper storeys enhances the performance of the structure in terms of displacement and storey drift controls.

Keywords: open ground storey, multiplication factor, IS 1893:2002 provisions, static analysis, response spectrum analysis, infill stiffness, equivalent strut

Procedia PDF Downloads 358

1475 A New Full Adder Cell for High Performance Low Power Applications

Authors: Mahdiar Hosseighadiry, Farnaz Fotovatikhah, Razali Ismail, Mohsen Khaledian, Mehdi Saeidemanesh

Abstract:

In this paper, a new low-power high-performance full adder is presented based on a new design method. The proposed method relies on pass gate design and provides full-swing circuits with minimum number of transistors. The method has been applied on SUM, COUT and XOR-XNOR modules resulting on rail-to-rail intermediate and output signals with no feedback transistors. The presented full adder cell has been simulated in 45 and 32 nm CMOS technologies using HSPICE considering parasitic capacitance and compared to several well-known designs from literature. In addition, the proposed cell has been extensively evaluated with different output loads, supply voltages, temperatures, threshold voltages, and operating frequencies. Results show that it functions properly under all mentioned conditions and exhibits less PDP compared to other design styles.

Keywords: full adders, low-power, high-performance, VLSI design

Procedia PDF Downloads 354

1474 Symmetry Properties of Linear Algebraic Systems with Non-Canonical Scalar Multiplication

Authors: Krish Jhurani

Abstract:

The research paper presents an in-depth analysis of symmetry properties in linear algebraic systems under the operation of non-canonical scalar multiplication structures, specifically semirings, and near-rings. The objective is to unveil the profound alterations that occur in traditional linear algebraic structures when we replace conventional field multiplication with these non-canonical operations. In the methodology, we first establish the theoretical foundations of non-canonical scalar multiplication, followed by a meticulous investigation into the resulting symmetry properties, focusing on eigenvectors, eigenspaces, and invariant subspaces. The methodology involves a combination of rigorous mathematical proofs and derivations, supplemented by illustrative examples that exhibit these discovered symmetry properties in tangible mathematical scenarios. The core findings uncover unique symmetry attributes. For linear algebraic systems with semiring scalar multiplication, we reveal eigenvectors and eigenvalues. Systems operating under near-ring scalar multiplication disclose unique invariant subspaces. These discoveries drastically broaden the traditional landscape of symmetry properties in linear algebraic systems. With the application of these findings, potential practical implications span across various fields such as physics, coding theory, and cryptography. They could enhance error detection and correction codes, devise more secure cryptographic algorithms, and even influence theoretical physics. This expansion of applicability accentuates the significance of the presented research. The research paper thus contributes to the mathematical community by bringing forth perspectives on linear algebraic systems and their symmetry properties through the lens of non-canonical scalar multiplication, coupled with an exploration of practical applications.

Keywords: eigenspaces, eigenvectors, invariant subspaces, near-rings, non-canonical scalar multiplication, semirings, symmetry properties

Procedia PDF Downloads 77

1473 Performance Analysis and Optimization for Diagonal Sparse Matrix-Vector Multiplication on Machine Learning Unit

Authors: Qiuyu Dai, Haochong Zhang, Xiangrong Liu

Abstract:

Diagonal sparse matrix-vector multiplication is a well-studied topic in the fields of scientific computing and big data processing. However, when diagonal sparse matrices are stored in DIA format, there can be a significant number of padded zero elements and scattered points, which can lead to a degradation in the performance of the current DIA kernel. This can also lead to excessive consumption of computational and memory resources. In order to address these issues, the authors propose the DIA-Adaptive scheme and its kernel, which leverages the parallel instruction sets on MLU. The researchers analyze the effect of allocating a varying number of threads, clusters, and hardware architectures on the performance of SpMV using different formats. The experimental results indicate that the proposed DIA-Adaptive scheme performs well and offers excellent parallelism.

Keywords: adaptive method, DIA, diagonal sparse matrices, MLU, sparse matrix-vector multiplication

Procedia PDF Downloads 78

1472 Embedded Semantic Segmentation Network Optimized for Matrix Multiplication Accelerator

Authors: Jaeyoung Lee

Abstract:

Autonomous driving systems require high reliability to provide people with a safe and comfortable driving experience. However, despite the development of a number of vehicle sensors, it is difficult to always provide high perceived performance in driving environments that vary from time to season. The image segmentation method using deep learning, which has recently evolved rapidly, provides high recognition performance in various road environments stably. However, since the system controls a vehicle in real time, a highly complex deep learning network cannot be used due to time and memory constraints. Moreover, efficient networks are optimized for GPU environments, which degrade performance in embedded processor environments equipped simple hardware accelerators. In this paper, a semantic segmentation network, matrix multiplication accelerator network (MMANet), optimized for matrix multiplication accelerator (MMA) on Texas instrument digital signal processors (TI DSP) is proposed to improve the recognition performance of autonomous driving system. The proposed method is designed to maximize the number of layers that can be performed in a limited time to provide reliable driving environment information in real time. First, the number of channels in the activation map is fixed to fit the structure of MMA. By increasing the number of parallel branches, the lack of information caused by fixing the number of channels is resolved. Second, an efficient convolution is selected depending on the size of the activation. Since MMA is a fixed, it may be more efficient for normal convolution than depthwise separable convolution depending on memory access overhead. Thus, a convolution type is decided according to output stride to increase network depth. In addition, memory access time is minimized by processing operations only in L3 cache. Lastly, reliable contexts are extracted using the extended atrous spatial pyramid pooling (ASPP). The suggested method gets stable features from an extended path by increasing the kernel size and accessing consecutive data. In addition, it consists of two ASPPs to obtain high quality contexts using the restored shape without global average pooling paths since the layer uses MMA as a simple adder. To verify the proposed method, an experiment is conducted using perfsim, a timing simulator, and the Cityscapes validation sets. The proposed network can process an image with 640 x 480 resolution for 6.67 ms, so six cameras can be used to identify the surroundings of the vehicle as 20 frame per second (FPS). In addition, it achieves 73.1% mean intersection over union (mIoU) which is the highest recognition rate among embedded networks on the Cityscapes validation set.

Keywords: edge network, embedded network, MMA, matrix multiplication accelerator, semantic segmentation network

Procedia PDF Downloads 95

1471 Implementation of Integer Sub-Decomposition Method on Elliptic Curves with J-Invariant 1728

Authors: Siti Noor Farwina Anwar, Hailiza Kamarulhaili

Abstract:

In this paper, we present the idea of implementing the Integer Sub-Decomposition (ISD) method on elliptic curves with j-invariant 1728. The ISD method was proposed in 2013 to compute scalar multiplication in elliptic curves, which remains to be the most expensive operation in Elliptic Curve Cryptography (ECC). However, the original ISD method only works on integer number field and solve integer scalar multiplication. By extending the method into the complex quadratic field, we are able to solve complex multiplication and implement the ISD method on elliptic curves with j-invariant 1728. The curve with j-invariant 1728 has a unique discriminant of the imaginary quadratic field. This unique discriminant of quadratic field yields a unique efficiently computable endomorphism, which later able to speed up the computations on this curve. However, the ISD method needs three endomorphisms to be accomplished. Hence, we choose all three endomorphisms to be from the same imaginary quadratic field as the curve itself, where the first endomorphism is the unique endomorphism yield from the discriminant of the imaginary quadratic field.

Keywords: efficiently computable endomorphism, elliptic scalar multiplication, j-invariant 1728, quadratic field

Procedia PDF Downloads 165

1470 In vitro Clonal Multiplication and Acclimatization of Large Cardamom (Amomum subulatum Roxb.)

Authors: Krishna Poudel, Tahar Katuwal, Sujan Karki

Abstract:

A rapid propagation and acclimatization method of large cardamom was optimized in this study. Sprouted rhizome buds were collected. The excised rhizome bud explants were cultured on semi solid culture media. The explants were cultured on Murashige and Skoog’s (MS) medium supplemented with different concentration and combinations of BAP (6-Benzyl-amino-purine) and IBA (Indole-3-butyric acid) for shoot and root induction. Explants cultured on MS basal medium supplemented with 1.0 mg/l BAP + 0.5 gm/l IBA showed the highest rate of shoot multiplication. In vitro shoots were rooted on to the half-strength MS basal media supplemented with 0.5 mg/l IBA. Rooted shoots were transplanted in the screen house for hardening process. These hardened plants were subsequently shifted into the netted nursery for further multiplication process.

Keywords: concentration, explants, hardening, rhizome

Procedia PDF Downloads 210

1469 Functional Instruction Set Simulator (ISS) of a Neural Network (NN) IP with Native BF-16 Generator

Authors: Debajyoti Mukherjee, Arathy B. S., Arpita Sahu, Saranga P. Pogula

Abstract:

A Functional Model to mimic the functional correctness of a Neural Network Compute Accelerator IP is very crucial for design validation. Neural network workloads are based on a Brain Floating Point (BF-16) data type. The major challenge we were facing was the incompatibility of gcc compilers to BF-16 datatype, which we addressed with a native BF-16 generator integrated to our functional model. Moreover, working with big GEMM (General Matrix Multiplication) or SpMM (Sparse Matrix Multiplication) Work Loads (Dense or Sparse) and debugging the failures related to data integrity is highly painstaking. In this paper, we are addressing the quality challenge of such a complex Neural Network Accelerator design by proposing a Functional Model-based scoreboard or Software model using SystemC. The proposed Functional Model executes the assembly code based on the ISA of the processor IP, decodes all instructions, and executes as expected to be done by the DUT. The said model would give a lot of visibility and debug capability in the DUT bringing up micro-steps of execution.

Keywords: ISA (instruction set architecture), NN (neural network), TLM (transaction-level modeling), GEMM (general matrix multiplication)

Procedia PDF Downloads 46

1468 Low-Power Digital Filters Design Using a Bypassing Technique

Authors: Thiago Brito Bezerra

Abstract:

This paper presents a novel approach to reduce power consumption of digital filters based on dynamic bypassing of partial products in their multipliers. The bypassing elements incorporated into the multiplier hardware eliminate redundant signal transitions, which appear within the carry-save adders when the partial product is zero. This technique reduces the power consumption by around 20%. The circuit implementation was made using the AMS 0.18 um technology. The bypassing technique applied to the circuits is outlined.

Keywords: digital filter, low-power, bypassing technique, low-pass filter

Procedia PDF Downloads 351