Search results for: GEMM (general matrix multiplication)
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7079

Search results for: GEMM (general matrix multiplication)

7079 Functional Instruction Set Simulator (ISS) of a Neural Network (NN) IP with Native BF-16 Generator

Authors: Debajyoti Mukherjee, Arathy B. S., Arpita Sahu, Saranga P. Pogula

Abstract:

A Functional Model to mimic the functional correctness of a Neural Network Compute Accelerator IP is very crucial for design validation. Neural network workloads are based on a Brain Floating Point (BF-16) data type. The major challenge we were facing was the incompatibility of gcc compilers to BF-16 datatype, which we addressed with a native BF-16 generator integrated to our functional model. Moreover, working with big GEMM (General Matrix Multiplication) or SpMM (Sparse Matrix Multiplication) Work Loads (Dense or Sparse) and debugging the failures related to data integrity is highly painstaking. In this paper, we are addressing the quality challenge of such a complex Neural Network Accelerator design by proposing a Functional Model-based scoreboard or Software model using SystemC. The proposed Functional Model executes the assembly code based on the ISA of the processor IP, decodes all instructions, and executes as expected to be done by the DUT. The said model would give a lot of visibility and debug capability in the DUT bringing up micro-steps of execution.

Keywords: ISA (instruction set architecture), NN (neural network), TLM (transaction-level modeling), GEMM (general matrix multiplication)

Procedia PDF Downloads 46
7078 Functional Instruction Set Simulator of a Neural Network IP with Native Brain Float-16 Generator

Authors: Debajyoti Mukherjee, Arathy B. S., Arpita Sahu, Saranga P. Pogula

Abstract:

A functional model to mimic the functional correctness of a neural network compute accelerator IP is very crucial for design validation. Neural network workloads are based on a Brain Floating Point (BF-16) data type. The major challenge we were facing was the incompatibility of GCC compilers to the BF-16 datatype, which we addressed with a native BF-16 generator integrated into our functional model. Moreover, working with big GEMM (General Matrix Multiplication) or SpMM (Sparse Matrix Multiplication) Work Loads (Dense or Sparse) and debugging the failures related to data integrity is highly painstaking. In this paper, we are addressing the quality challenge of such a complex neural network accelerator design by proposing a functional model-based scoreboard or software model using SystemC. The proposed functional model executes the assembly code based on the ISA of the processor IP, decodes all instructions, and executes as expected to be done by the DUT. The said model would give a lot of visibility and debug capability in the DUT, bringing up micro-steps of execution.

Keywords: ISA, neural network, Brain Float-16, DUT

Procedia PDF Downloads 60
7077 Parallel Computing: Offloading Matrix Multiplication to GPU

Authors: Bharath R., Tharun Sai N., Bhuvan G.

Abstract:

This project focuses on developing a Parallel Computing method aimed at optimizing matrix multiplication through GPU acceleration. Addressing algorithmic challenges, GPU programming intricacies, and integration issues, the project aims to enhance efficiency and scalability. The methodology involves algorithm design, GPU programming, and optimization techniques. Future plans include advanced optimizations, extended functionality, and integration with high-level frameworks. User engagement is emphasized through user-friendly interfaces, open- source collaboration, and continuous refinement based on feedback. The project's impact extends to significantly improving matrix multiplication performance in scientific computing and machine learning applications.

Keywords: matrix multiplication, parallel processing, cuda, performance boost, neural networks

Procedia PDF Downloads 10
7076 Performance Analysis and Optimization for Diagonal Sparse Matrix-Vector Multiplication on Machine Learning Unit

Authors: Qiuyu Dai, Haochong Zhang, Xiangrong Liu

Abstract:

Diagonal sparse matrix-vector multiplication is a well-studied topic in the fields of scientific computing and big data processing. However, when diagonal sparse matrices are stored in DIA format, there can be a significant number of padded zero elements and scattered points, which can lead to a degradation in the performance of the current DIA kernel. This can also lead to excessive consumption of computational and memory resources. In order to address these issues, the authors propose the DIA-Adaptive scheme and its kernel, which leverages the parallel instruction sets on MLU. The researchers analyze the effect of allocating a varying number of threads, clusters, and hardware architectures on the performance of SpMV using different formats. The experimental results indicate that the proposed DIA-Adaptive scheme performs well and offers excellent parallelism.

Keywords: adaptive method, DIA, diagonal sparse matrices, MLU, sparse matrix-vector multiplication

Procedia PDF Downloads 78
7075 The Fallacy around Inserting Brackets to Evaluate Expressions Involving Multiplication and Division

Authors: Manduth Ramchander

Abstract:

Evaluating expressions involving multiplication and division can give rise to the fallacy that brackets can be arbitrarily inserted into expressions involving multiplication and division. The aim of this article was to draw upon mathematical theory to prove that brackets cannot be arbitrarily inserted into expressions involving multiplication and division and in particular in expressions where division precedes multiplication. In doing so, it demonstrates that the notion that two different answers are possible, when evaluating expressions involving multiplication and division, is indeed a false one. Searches conducted in a number of scholarly databases unearthed the rules to be applied when removing brackets from expressions, which revealed that consideration needs to be given to sign changes when brackets are removed. The rule pertaining to expressions involving multiplication and division was then extended upon, in its reverse format, to prove that brackets cannot be arbitrarily inserted into expressions involving multiplication and division. The application of the rule demonstrates that an expression involving multiplication and division can have only one correct answer. It is recommended that both the rule and its reverse be included in the curriculum, preferably at the juncture when manipulation with brackets is introduced.

Keywords: brackets, multiplications and division, operations, order

Procedia PDF Downloads 125
7074 Private Coded Computation of Matrix Multiplication

Authors: Malihe Aliasgari, Yousef Nejatbakhsh

Abstract:

The era of Big Data and the immensity of real-life datasets compels computation tasks to be performed in a distributed fashion, where the data is dispersed among many servers that operate in parallel. However, massive parallelization leads to computational bottlenecks due to faulty servers and stragglers. Stragglers refer to a few slow or delay-prone processors that can bottleneck the entire computation because one has to wait for all the parallel nodes to finish. The problem of straggling processors, has been well studied in the context of distributed computing. Recently, it has been pointed out that, for the important case of linear functions, it is possible to improve over repetition strategies in terms of the tradeoff between performance and latency by carrying out linear precoding of the data prior to processing. The key idea is that, by employing suitable linear codes operating over fractions of the original data, a function may be completed as soon as enough number of processors, depending on the minimum distance of the code, have completed their operations. The problem of matrix-matrix multiplication in the presence of practically big sized of data sets faced with computational and memory related difficulties, which makes such operations are carried out using distributed computing platforms. In this work, we study the problem of distributed matrix-matrix multiplication W = XY under storage constraints, i.e., when each server is allowed to store a fixed fraction of each of the matrices X and Y, which is a fundamental building of many science and engineering fields such as machine learning, image and signal processing, wireless communication, optimization. Non-secure and secure matrix multiplication are studied. We want to study the setup, in which the identity of the matrix of interest should be kept private from the workers and then obtain the recovery threshold of the colluding model, that is, the number of workers that need to complete their task before the master server can recover the product W. The problem of secure and private distributed matrix multiplication W = XY which the matrix X is confidential, while matrix Y is selected in a private manner from a library of public matrices. We present the best currently known trade-off between communication load and recovery threshold. On the other words, we design an achievable PSGPD scheme for any arbitrary privacy level by trivially concatenating a robust PIR scheme for arbitrary colluding workers and private databases and the proposed SGPD code that provides a smaller computational complexity at the workers.

Keywords: coded distributed computation, private information retrieval, secret sharing, stragglers

Procedia PDF Downloads 89
7073 On Direct Matrix Factored Inversion via Broyden's Updates

Authors: Adel Mohsen

Abstract:

A direct method based on the good Broyden's updates for evaluating the inverse of a nonsingular square matrix of full rank and solving related system of linear algebraic equations is studied. For a matrix A of order n whose LU-decomposition is A = LU, the multiplication count is O (n3). This includes the evaluation of the LU-decompositions of the inverse, the lower triangular decomposition of A as well as a “reduced matrix inverse”. If an explicit value of the inverse is not needed the order reduces to O (n3/2) to compute to compute inv(U) and the reduced inverse. For a symmetric matrix only O (n3/3) operations are required to compute inv(L) and the reduced inverse. An example is presented to demonstrate the capability of using the reduced matrix inverse in treating ill-conditioned systems. Besides the simplicity of Broyden's update, the method provides a mean to exploit the possible sparsity in the matrix and to derive a suitable preconditioner.

Keywords: Broyden's updates, matrix inverse, inverse factorization, solution of linear algebraic equations, ill-conditioned matrices, preconditioning

Procedia PDF Downloads 448
7072 Low-Complexity Multiplication Using Complement and Signed-Digit Recoding Methods

Authors: Te-Jen Chang, I-Hui Pan, Ping-Sheng Huang, Shan-Jen Cheng

Abstract:

In this paper, a fast multiplication computing method utilizing the complement representation method and canonical recoding technique is proposed. By performing complements and canonical recoding technique, the number of partial products can be reduced. Based on these techniques, we propose an algorithm that provides an efficient multiplication method. On average, our proposed algorithm is to reduce the number of k-bit additions from (0.25k+logk/k+2.5) to (k/6 +logk/k+2.5), where k is the bit-length of the multiplicand A and multiplier B. We can therefore efficiently speed up the overall performance of the multiplication. Moreover, if we use the new proposes to compute common-multiplicand multiplication, the computational complexity can be reduced from (0.5 k+2 logk/k+5) to (k/3+2 logk/k+5) k-bit additions.

Keywords: algorithm design, complexity analysis, canonical recoding, public key cryptography, common-multiplicand multiplication

Procedia PDF Downloads 401
7071 Design and Construction of an Intelligent Multiplication Table for Enhanced Education and Increased Student Engagement

Authors: Zahra Alikhani Koopaei

Abstract:

In the fifth lesson of the third-grade mathematics book, students are introduced to the concept of multiplication. However, some students showed a lack of interest in learning this topic. To address this, a simple electronic multiplication table was designed with the aim of making the concept of multiplication entertaining and engaging for students. It provides them with moments of excitement during the learning process. To achieve this goal, a device was created that produced a bell sound when two wire ends were connected. Each wire end was connected to a specific number in the multiplication table, and the other end was linked to the corresponding answer. Consequently, if the answer is correct, the bell will ring. This study employs interactive and engaging methods to teach mathematics, particularly to students who have previously shown little interest in the subject. By integrating game-based learning and critical thinking, we observed an increase in understanding and interest in learning multiplication compared to before using this method. This further motivated the students. As a result, the intelligent multiplication table was successfully designed. Students, under the instructor's supervision, could easily construct the device during the lesson. Through the implementation of these operations, the concept of multiplication was firmly established in the students' minds. Engaging multiple intelligences in each student enhances a more stable and improved understanding of the concept of multiplication.

Keywords: intelligent multiplication table, design, construction, education, increased interest, students

Procedia PDF Downloads 32
7070 Modified Montgomery for RSA Cryptosystem

Authors: Rupali Verma, Maitreyee Dutta, Renu Vig

Abstract:

Encryption and decryption in RSA are done by modular exponentiation which is achieved by repeated modular multiplication. Hence, efficiency of modular multiplication directly determines the efficiency of RSA cryptosystem. This paper designs a Modified Montgomery Modular multiplication in which addition of operands is computed by 4:2 compressor. The basic logic operations in addition are partitioned over two iterations such that parallel computations are performed. This reduces the critical path delay of proposed Montgomery design. The proposed design and RSA are implemented on Virtex 2 and Virtex 5 FPGAs. The two factors partitioning and parallelism have improved the frequency and throughput of proposed design.

Keywords: RSA, montgomery modular multiplication, 4:2 compressor, FPGA

Procedia PDF Downloads 381
7069 Mixed Number Algebra and Its Application

Authors: Md. Shah Alam

Abstract:

Mushfiq Ahmad has defined a Mixed Number, which is the sum of a scalar and a Cartesian vector. He has also defined the elementary group operations of Mixed numbers i.e. the norm of Mixed numbers, the product of two Mixed numbers, the identity element and the inverse. It has been observed that Mixed Number is consistent with Pauli matrix algebra and a handy tool to work with Dirac electron theory. Its use as a mathematical method in Physics has been studied. (1) We have applied Mixed number in Quantum Mechanics: Mixed Number version of Displacement operator, Vector differential operator, and Angular momentum operator has been developed. Mixed Number method has also been applied to Klein-Gordon equation. (2) We have applied Mixed number in Electrodynamics: Mixed Number version of Maxwell’s equation, the Electric and Magnetic field quantities and Lorentz Force has been found. (3) An associative transformation of Mixed Number numbers fulfilling Lorentz invariance requirement is developed. (4) We have applied Mixed number algebra as an extension of Complex number. Mixed numbers and the Quaternions have isomorphic correspondence, but they are different in algebraic details. The multiplication of unit Mixed number and the multiplication of unit Quaternions are different. Since Mixed Number has properties similar to those of Pauli matrix algebra, Mixed Number algebra is a more convenient tool to deal with Dirac equation.

Keywords: mixed number, special relativity, quantum mechanics, electrodynamics, pauli matrix

Procedia PDF Downloads 327
7068 Efficient Semi-Systolic Finite Field Multiplier Using Redundant Basis

Authors: Hyun-Ho Lee, Kee-Won Kim

Abstract:

The arithmetic operations over GF(2m) have been extensively used in error correcting codes and public-key cryptography schemes. Finite field arithmetic includes addition, multiplication, division and inversion operations. Addition is very simple and can be implemented with an extremely simple circuit. The other operations are much more complex. The multiplication is the most important for cryptosystems, such as the elliptic curve cryptosystem, since computing exponentiation, division, and computing multiplicative inverse can be performed by computing multiplication iteratively. In this paper, we present a parallel computation algorithm that operates Montgomery multiplication over finite field using redundant basis. Also, based on the multiplication algorithm, we present an efficient semi-systolic multiplier over finite field. The multiplier has less space and time complexities compared to related multipliers. As compared to the corresponding existing structures, the multiplier saves at least 5% area, 50% time, and 53% area-time (AT) complexity. Accordingly, it is well suited for VLSI implementation and can be easily applied as a basic component for computing complex operations over finite field, such as inversion and division operation.

Keywords: finite field, Montgomery multiplication, systolic array, cryptography

Procedia PDF Downloads 252
7067 Integrating Indigenous Students’ Funds of Knowledge to Introduce Multiplication with a Picture Storybook

Authors: Murni Sianturi, Andreas Au Hurit

Abstract:

The low level of Indigenous Papuan students’ literacy and numeracy in Merauke Regency-Indonesia needs to be considered. The development of a learnable storybook with pictures related to their lives might raise their curiosity to read. This study aimed to design a storybook as a complementary resource for the third graders using Indigenous Malind cultural approaches by employing research and development methods. The product developed was a thematic-integrative picture storybook using funds of knowledge from Indigenous students. All the book contents depicted Indigenous students’ lives and were in line with the national curriculum syllabus, specifically representing one sub-theme−multiplication topic. Multiplication material of grade 3 was modified in the form of a story, and at the end of the reading, students were given several multiplication exercises. Based on the results of the evaluation from the expert team, it was found that the average score was in the excellent category. The students’ and teacher’s responses to the storybook were very positive. Students were thrilled when reading this book and also effortlessly understood the concept of multiplication. Therefore, this book might be used as a companion book to the main book and serve as introductory reading material for students prior to discussing multiplication material.

Keywords: a picture storybook, funds of knowledge, Indigenous elementary students, literacy, numeracy

Procedia PDF Downloads 159
7066 Magnification Factor Based Seismic Response of Moment Resisting Frames with Open Ground Storey

Authors: Subzar Ahmad Bhat, Saraswati Setia, V. K.Sehgal

Abstract:

During the past earthquakes, open ground storey buildings have performed poorly due to the soft storey defect. Indian Standard IS 1893:2002 allows analysis of open ground storey buildings without considering infill stiffness but with a multiplication factor 2.5 in compensation for the stiffness discontinuity. Therefore, the aim of this paper is to check the applicability of the multiplication factor of 2.5 and study behaviour of the structure after the application of the multiplication factor. For this purpose, study is performed on models considering infill stiffness using SAP 2000 (Version 14) by linear static analysis and response spectrum analysis. Total seven models are analysed and designed for the range of multiplication factor ranging from 1.25 to 2.5. The value of multiplication factor equal to 2.5 has been found on the higher side, resulting in increased dimension and percentage of reinforcement without significant enhancement beyond a certain multiplication factor. When the building with OGS is designed for values of MF higher than 1.25 considering infill stiffness soft storey effect shifts from ground storey to first storey. For the analysis of the OGS structure best way to analysis the structure is to analyse it as the frame with stiffness and strength of the infill taken into account. The provision of infill walls in the upper storeys enhances the performance of the structure in terms of displacement and storey drift controls.

Keywords: open ground storey, multiplication factor, IS 1893:2002 provisions, static analysis, response spectrum analysis, infill stiffness, equivalent strut

Procedia PDF Downloads 360
7065 Parallel Computation of the Covariance-Matrix

Authors: Claude Tadonki

Abstract:

We address the issues related to the computation of the covariance matrix. This matrix is likely to be ill conditioned following its canonical expression, thus consequently raises serious numerical issues. The underlying linear system, which therefore should be solved by means of iterative approaches, becomes computationally challenging. A huge number of iterations is expected in order to reach an acceptable level of convergence, necessary to meet the required accuracy of the computation. In addition, this linear system needs to be solved at each iteration following the general form of the covariance matrix. Putting all together, its comes that we need to compute as fast as possible the associated matrix-vector product. This is our purpose in the work, where we consider and discuss skillful formulations of the problem, then propose a parallel implementation of the matrix-vector product involved. Numerical and performance oriented discussions are provided based on experimental evaluations.

Keywords: covariance-matrix, multicore, numerical computing, parallel computing

Procedia PDF Downloads 283
7064 A Design of Elliptic Curve Cryptography Processor based on SM2 over GF(p)

Authors: Shiji Hu, Lei Li, Wanting Zhou, DaoHong Yang

Abstract:

The data encryption, is the foundation of today’s communication. On this basis, how to improve the speed of data encryption and decryption is always a problem that scholars work for. In this paper, we proposed an elliptic curve crypto processor architecture based on SM2 prime field. In terms of hardware implementation, we optimized the algorithms in different stages of the structure. In finite field modulo operation, we proposed an optimized improvement of Karatsuba-Ofman multiplication algorithm, and shorten the critical path through pipeline structure in the algorithm implementation. Based on SM2 recommended prime field, a fast modular reduction algorithm is used to reduce 512-bit wide data obtained from the multiplication unit. The radix-4 extended Euclidean algorithm was used to realize the conversion between affine coordinate system and Jacobi projective coordinate system. In the parallel scheduling of point operations on elliptic curves, we proposed a three-level parallel structure of point addition and point double based on the Jacobian projective coordinate system. Combined with the scalar multiplication algorithm, we added mutual pre-operation to the point addition and double point operation to improve the efficiency of the scalar point multiplication. The proposed ECC hardware architecture was verified and implemented on Xilinx Virtex-7 and ZYNQ-7 platforms, and each 256-bit scalar multiplication operation took 0.275ms. The performance for handling scalar multiplication is 32 times that of CPU(dual-core ARM Cortex-A9).

Keywords: Elliptic curve cryptosystems, SM2, modular multiplication, point multiplication.

Procedia PDF Downloads 54
7063 Easily Memorable Strong Password Generation and Retrieval

Authors: Shatadru Das, Natarajan Vijayarangan

Abstract:

In this paper, a system and method for generating and recovering an authorization code has been designed and analyzed. The system creates an authorization code by accepting a base-sentence from a user. Based on the characters present in this base-sentence, the system computes a base-sentence matrix. The system also generates a plurality of patterns. The user can either select the pattern from the multiple patterns suggested by the system or can create his/her own pattern. The system then performs multiplications between the base-sentence matrix and the selected pattern matrix at different stages in the path forward, for obtaining a strong authorization code. In case the user forgets the base sentence, the system has a provision to manage and retrieve 'forgotten authorization code'. This is done by fragmenting the base sentence into different matrices and storing the fragmented matrices into a repository after computing matrix multiplication with a security question-answer approach and with a secret key provided by the user.

Keywords: easy authentication, key retrieval, memorable passwords, strong password generation

Procedia PDF Downloads 366
7062 On a Generalization of the Spectral Dichotomy Method of a Matrix With Respect to Parabolas

Authors: Mouhamadou Dosso

Abstract:

This paper presents methods of spectral dichotomy of a matrix which compute spectral projectors on the subspace associated with the eigenvalues external to the parabolas described by a general equation. These methods are modifications of the one proposed in [A. N. Malyshev and M. Sadkane, SIAM J. MATRIX ANAL. APPL. 18 (2), 265-278, 1997] which uses the spectral dichotomy method of a matrix with respect to the imaginary axis. Theoretical and algorithmic aspects of the methods are developed. Numerical results obtained by applying methods presented on matrices are reported.

Keywords: spectral dichotomy method, spectral projector, eigensubspaces, eigenvalue

Procedia PDF Downloads 61
7061 Symmetry Properties of Linear Algebraic Systems with Non-Canonical Scalar Multiplication

Authors: Krish Jhurani

Abstract:

The research paper presents an in-depth analysis of symmetry properties in linear algebraic systems under the operation of non-canonical scalar multiplication structures, specifically semirings, and near-rings. The objective is to unveil the profound alterations that occur in traditional linear algebraic structures when we replace conventional field multiplication with these non-canonical operations. In the methodology, we first establish the theoretical foundations of non-canonical scalar multiplication, followed by a meticulous investigation into the resulting symmetry properties, focusing on eigenvectors, eigenspaces, and invariant subspaces. The methodology involves a combination of rigorous mathematical proofs and derivations, supplemented by illustrative examples that exhibit these discovered symmetry properties in tangible mathematical scenarios. The core findings uncover unique symmetry attributes. For linear algebraic systems with semiring scalar multiplication, we reveal eigenvectors and eigenvalues. Systems operating under near-ring scalar multiplication disclose unique invariant subspaces. These discoveries drastically broaden the traditional landscape of symmetry properties in linear algebraic systems. With the application of these findings, potential practical implications span across various fields such as physics, coding theory, and cryptography. They could enhance error detection and correction codes, devise more secure cryptographic algorithms, and even influence theoretical physics. This expansion of applicability accentuates the significance of the presented research. The research paper thus contributes to the mathematical community by bringing forth perspectives on linear algebraic systems and their symmetry properties through the lens of non-canonical scalar multiplication, coupled with an exploration of practical applications.

Keywords: eigenspaces, eigenvectors, invariant subspaces, near-rings, non-canonical scalar multiplication, semirings, symmetry properties

Procedia PDF Downloads 77
7060 The Second Smallest Eigenvalue of Complete Tripartite Hypergraph

Authors: Alfi Y. Zakiyyah, Hanni Garminia, M. Salman, A. N. Irawati

Abstract:

In the terminology of the hypergraph, there is a relation with the terminology graph. In the theory of graph, the edges connected two vertices. In otherwise, in hypergraph, the edges can connect more than two vertices. There is representation matrix of a graph such as adjacency matrix, Laplacian matrix, and incidence matrix. The adjacency matrix is symmetry matrix so that all eigenvalues is real. This matrix is a nonnegative matrix. The all diagonal entry from adjacency matrix is zero so that the trace is zero. Another representation matrix of the graph is the Laplacian matrix. Laplacian matrix is symmetry matrix and semidefinite positive so that all eigenvalues are real and non-negative. According to the spectral study in the graph, some that result is generalized to hypergraph. A hypergraph can be represented by a matrix such as adjacency, incidence, and Laplacian matrix. Throughout for this term, we use Laplacian matrix to represent a complete tripartite hypergraph. The aim from this research is to determine second smallest eigenvalues from this matrix and find a relation this eigenvalue with the connectivity of that hypergraph.

Keywords: connectivity, graph, hypergraph, Laplacian matrix

Procedia PDF Downloads 448
7059 Embedded Semantic Segmentation Network Optimized for Matrix Multiplication Accelerator

Authors: Jaeyoung Lee

Abstract:

Autonomous driving systems require high reliability to provide people with a safe and comfortable driving experience. However, despite the development of a number of vehicle sensors, it is difficult to always provide high perceived performance in driving environments that vary from time to season. The image segmentation method using deep learning, which has recently evolved rapidly, provides high recognition performance in various road environments stably. However, since the system controls a vehicle in real time, a highly complex deep learning network cannot be used due to time and memory constraints. Moreover, efficient networks are optimized for GPU environments, which degrade performance in embedded processor environments equipped simple hardware accelerators. In this paper, a semantic segmentation network, matrix multiplication accelerator network (MMANet), optimized for matrix multiplication accelerator (MMA) on Texas instrument digital signal processors (TI DSP) is proposed to improve the recognition performance of autonomous driving system. The proposed method is designed to maximize the number of layers that can be performed in a limited time to provide reliable driving environment information in real time. First, the number of channels in the activation map is fixed to fit the structure of MMA. By increasing the number of parallel branches, the lack of information caused by fixing the number of channels is resolved. Second, an efficient convolution is selected depending on the size of the activation. Since MMA is a fixed, it may be more efficient for normal convolution than depthwise separable convolution depending on memory access overhead. Thus, a convolution type is decided according to output stride to increase network depth. In addition, memory access time is minimized by processing operations only in L3 cache. Lastly, reliable contexts are extracted using the extended atrous spatial pyramid pooling (ASPP). The suggested method gets stable features from an extended path by increasing the kernel size and accessing consecutive data. In addition, it consists of two ASPPs to obtain high quality contexts using the restored shape without global average pooling paths since the layer uses MMA as a simple adder. To verify the proposed method, an experiment is conducted using perfsim, a timing simulator, and the Cityscapes validation sets. The proposed network can process an image with 640 x 480 resolution for 6.67 ms, so six cameras can be used to identify the surroundings of the vehicle as 20 frame per second (FPS). In addition, it achieves 73.1% mean intersection over union (mIoU) which is the highest recognition rate among embedded networks on the Cityscapes validation set.

Keywords: edge network, embedded network, MMA, matrix multiplication accelerator, semantic segmentation network

Procedia PDF Downloads 95
7058 Conditions on Expressing a Matrix as a Sum of α-Involutions

Authors: Ric Joseph R. Murillo, Edna N. Gueco, Dennis I. Merino

Abstract:

Let F be C or R, where C and R are the set of complex numbers and real numbers, respectively, and n be a natural number. An n-by-n matrix A over the field F is called an α-involutory matrix or an α-involution if there exists an α in the field such that the square of the matrix is equal to αI, where I is the n-by-n identity matrix. If α is a complex number or a nonnegative real number, then an n-by-n matrix A over the field F can be written as a sum of n-by-n α-involutory matrices over the field F if and only if the trace of that matrix is an integral multiple of the square root of α. Meanwhile, if α is a negative real number, then a 2n-by-2n matrix A over R can be written as a sum of 2n-by-2n α-involutory matrices over R if and only the trace of the matrix is zero. Some other properties of α-involutory matrices are also determined

Keywords: α-involutory Matrices, sum of α-involutory Matrices, Trace, Matrix Theory

Procedia PDF Downloads 157
7057 High Speed Image Rotation Algorithm

Authors: Hee-Choul Kwon, Hyungjin Cho, Heeyong Kwon

Abstract:

Image rotation is one of main pre-processing step in image processing or image pattern recognition. It is implemented with rotation matrix multiplication. However it requires lots of floating point arithmetic operations and trigonometric function calculations, so it takes long execution time. We propose a new high speed image rotation algorithm without two major time-consuming operations. We compare the proposed algorithm with the conventional rotation one with various size images. Experimental results show that the proposed algorithm is superior to the conventional rotation ones.

Keywords: high speed rotation operation, image processing, image rotation, pattern recognition, transformation matrix

Procedia PDF Downloads 472
7056 Very Large Scale Integration Architecture of Finite Impulse Response Filter Implementation Using Retiming Technique

Authors: S. Jalaja, A. M. Vijaya Prakash

Abstract:

Recursive combination of an algorithm based on Karatsuba multiplication is exploited to design a generalized transpose and parallel Finite Impulse Response (FIR) Filter. Mid-range Karatsuba multiplication and Carry Save adder based on Karatsuba multiplication reduce time complexity for higher order multiplication implemented up to n-bit. As a result, we design modified N-tap Transpose and Parallel Symmetric FIR Filter Structure using Karatsuba algorithm. The mathematical formulation of the FFA Filter is derived. The proposed architecture involves significantly less area delay product (APD) then the existing block implementation. By adopting retiming technique, hardware cost is reduced further. The filter architecture is designed by using 90 nm technology library and is implemented by using cadence EDA Tool. The synthesized result shows better performance for different word length and block size. The design achieves switching activity reduction and low power consumption by applying with and without retiming for different combination of the circuit. The proposed structure achieves more than a half of the power reduction by adopting with and without retiming techniques compared to the earlier design structure. As a proof of the concept for block size 16 and filter length 64 for CKA method, it achieves a 51% as well as 70% less power by applying retiming technique, and for CSA method it achieves a 57% as well as 77% less power by applying retiming technique compared to the previously proposed design.

Keywords: carry save adder Karatsuba multiplication, mid range Karatsuba multiplication, modified FFA and transposed filter, retiming

Procedia PDF Downloads 203
7055 Manufacturing and Characterization of Ni-Matrix Composite Reinforced with Ti3SiC2 and Ti2AlC; and Al-Matrix with Ti2SiC

Authors: M. Hadji, N. Chiker, Y. Hadji, A. Haddad

Abstract:

In this paper, we report for the first time on the synthesis and characterization of novel MAX phases (Ti3SiC2, Ti2AlC) reinforced Ni-matrix and Ti2AlC reinforced Al-matrix. The stability of MAX phases in Al-matrix and Ni-matrix at a temperature of 985°C has been investigated. All the composites were cold pressed and sintered at a temperature of 985°C for 20min in H2 environment, except (Ni/Ti3SiC2) who was sintered at 1100°C for 1h.Microstructure analysis by scanning electron microscopy and phase analysis by X-Ray diffraction confirmed that there was minimal interfacial reaction between MAX particles and Ni, thus Al/MAX samples shown that MAX phases was totally decomposed at 985°C.The Addition of MAX enhanced the Al-matrix and Ni-matrix.

Keywords: MAX phase, microstructures, composites, hardness, SEM

Procedia PDF Downloads 311
7054 Inverse Matrix in the Theory of Dynamical Systems

Authors: Renata Masarova, Bohuslava Juhasova, Martin Juhas, Zuzana Sutova

Abstract:

In dynamic system theory a mathematical model is often used to describe their properties. In order to find a transfer matrix of a dynamic system we need to calculate an inverse matrix. The paper contains the fusion of the classical theory and the procedures used in the theory of automated control for calculating the inverse matrix. The final part of the paper models the given problem by the Matlab.

Keywords: dynamic system, transfer matrix, inverse matrix, modeling

Procedia PDF Downloads 481
7053 Implementation of Integer Sub-Decomposition Method on Elliptic Curves with J-Invariant 1728

Authors: Siti Noor Farwina Anwar, Hailiza Kamarulhaili

Abstract:

In this paper, we present the idea of implementing the Integer Sub-Decomposition (ISD) method on elliptic curves with j-invariant 1728. The ISD method was proposed in 2013 to compute scalar multiplication in elliptic curves, which remains to be the most expensive operation in Elliptic Curve Cryptography (ECC). However, the original ISD method only works on integer number field and solve integer scalar multiplication. By extending the method into the complex quadratic field, we are able to solve complex multiplication and implement the ISD method on elliptic curves with j-invariant 1728. The curve with j-invariant 1728 has a unique discriminant of the imaginary quadratic field. This unique discriminant of quadratic field yields a unique efficiently computable endomorphism, which later able to speed up the computations on this curve. However, the ISD method needs three endomorphisms to be accomplished. Hence, we choose all three endomorphisms to be from the same imaginary quadratic field as the curve itself, where the first endomorphism is the unique endomorphism yield from the discriminant of the imaginary quadratic field.

Keywords: efficiently computable endomorphism, elliptic scalar multiplication, j-invariant 1728, quadratic field

Procedia PDF Downloads 166
7052 Image Rotation Using an Augmented 2-Step Shear Transform

Authors: Hee-Choul Kwon, Heeyong Kwon

Abstract:

Image rotation is one of main pre-processing steps for image processing or image pattern recognition. It is implemented with a rotation matrix multiplication. It requires a lot of floating point arithmetic operations and trigonometric calculations, so it takes a long time to execute. Therefore, there has been a need for a high speed image rotation algorithm without two major time-consuming operations. However, the rotated image has a drawback, i.e. distortions. We solved the problem using an augmented two-step shear transform. We compare the presented algorithm with the conventional rotation with images of various sizes. Experimental results show that the presented algorithm is superior to the conventional rotation one.

Keywords: high-speed rotation operation, image rotation, transform matrix, image processing, pattern recognition

Procedia PDF Downloads 238
7051 In vitro Clonal Multiplication and Acclimatization of Large Cardamom (Amomum subulatum Roxb.)

Authors: Krishna Poudel, Tahar Katuwal, Sujan Karki

Abstract:

A rapid propagation and acclimatization method of large cardamom was optimized in this study. Sprouted rhizome buds were collected. The excised rhizome bud explants were cultured on semi solid culture media. The explants were cultured on Murashige and Skoog’s (MS) medium supplemented with different concentration and combinations of BAP (6-Benzyl-amino-purine) and IBA (Indole-3-butyric acid) for shoot and root induction. Explants cultured on MS basal medium supplemented with 1.0 mg/l BAP + 0.5 gm/l IBA showed the highest rate of shoot multiplication. In vitro shoots were rooted on to the half-strength MS basal media supplemented with 0.5 mg/l IBA. Rooted shoots were transplanted in the screen house for hardening process. These hardened plants were subsequently shifted into the netted nursery for further multiplication process.

Keywords: concentration, explants, hardening, rhizome

Procedia PDF Downloads 210
7050 Plasticity in Matrix Dominated Metal-Matrix Composite with One Active Slip Based Dislocation

Authors: Temesgen Takele Kasa

Abstract:

The main aim of this paper is to suggest one active slip based continuum dislocation approach to matrix dominated MMC plasticity analysis. The approach centered the free energy principles through the continuum behavior of dislocations combined with small strain continuum kinematics. The analytical derivation of this method includes the formulation of one active slip system, the thermodynamic approach of dislocations, determination of free energy, and evolution of dislocations. In addition zero and non-zero energy dissipation analysis of dislocation evolution is also formulated by using varational energy minimization method. In general, this work shows its capability to analyze the plasticity of matrix dominated MMC with inclusions. The proposed method is also found to be capable of handling plasticity of MMC.

Keywords: active slip, continuum dislocation, distortion, dominated, energy dissipation, matrix dominated, plasticity

Procedia PDF Downloads 356