Search results for: matrix multiplication
2334 Parallel Computing: Offloading Matrix Multiplication to GPU
Authors: Bharath R., Tharun Sai N., Bhuvan G.
Abstract:
This project focuses on developing a Parallel Computing method aimed at optimizing matrix multiplication through GPU acceleration. Addressing algorithmic challenges, GPU programming intricacies, and integration issues, the project aims to enhance efficiency and scalability. The methodology involves algorithm design, GPU programming, and optimization techniques. Future plans include advanced optimizations, extended functionality, and integration with high-level frameworks. User engagement is emphasized through user-friendly interfaces, open- source collaboration, and continuous refinement based on feedback. The project's impact extends to significantly improving matrix multiplication performance in scientific computing and machine learning applications.Keywords: matrix multiplication, parallel processing, cuda, performance boost, neural networks
Procedia PDF Downloads 582333 Performance Analysis and Optimization for Diagonal Sparse Matrix-Vector Multiplication on Machine Learning Unit
Authors: Qiuyu Dai, Haochong Zhang, Xiangrong Liu
Abstract:
Diagonal sparse matrix-vector multiplication is a well-studied topic in the fields of scientific computing and big data processing. However, when diagonal sparse matrices are stored in DIA format, there can be a significant number of padded zero elements and scattered points, which can lead to a degradation in the performance of the current DIA kernel. This can also lead to excessive consumption of computational and memory resources. In order to address these issues, the authors propose the DIA-Adaptive scheme and its kernel, which leverages the parallel instruction sets on MLU. The researchers analyze the effect of allocating a varying number of threads, clusters, and hardware architectures on the performance of SpMV using different formats. The experimental results indicate that the proposed DIA-Adaptive scheme performs well and offers excellent parallelism.Keywords: adaptive method, DIA, diagonal sparse matrices, MLU, sparse matrix-vector multiplication
Procedia PDF Downloads 1342332 The Fallacy around Inserting Brackets to Evaluate Expressions Involving Multiplication and Division
Authors: Manduth Ramchander
Abstract:
Evaluating expressions involving multiplication and division can give rise to the fallacy that brackets can be arbitrarily inserted into expressions involving multiplication and division. The aim of this article was to draw upon mathematical theory to prove that brackets cannot be arbitrarily inserted into expressions involving multiplication and division and in particular in expressions where division precedes multiplication. In doing so, it demonstrates that the notion that two different answers are possible, when evaluating expressions involving multiplication and division, is indeed a false one. Searches conducted in a number of scholarly databases unearthed the rules to be applied when removing brackets from expressions, which revealed that consideration needs to be given to sign changes when brackets are removed. The rule pertaining to expressions involving multiplication and division was then extended upon, in its reverse format, to prove that brackets cannot be arbitrarily inserted into expressions involving multiplication and division. The application of the rule demonstrates that an expression involving multiplication and division can have only one correct answer. It is recommended that both the rule and its reverse be included in the curriculum, preferably at the juncture when manipulation with brackets is introduced.Keywords: brackets, multiplications and division, operations, order
Procedia PDF Downloads 1602331 Functional Instruction Set Simulator (ISS) of a Neural Network (NN) IP with Native BF-16 Generator
Authors: Debajyoti Mukherjee, Arathy B. S., Arpita Sahu, Saranga P. Pogula
Abstract:
A Functional Model to mimic the functional correctness of a Neural Network Compute Accelerator IP is very crucial for design validation. Neural network workloads are based on a Brain Floating Point (BF-16) data type. The major challenge we were facing was the incompatibility of gcc compilers to BF-16 datatype, which we addressed with a native BF-16 generator integrated to our functional model. Moreover, working with big GEMM (General Matrix Multiplication) or SpMM (Sparse Matrix Multiplication) Work Loads (Dense or Sparse) and debugging the failures related to data integrity is highly painstaking. In this paper, we are addressing the quality challenge of such a complex Neural Network Accelerator design by proposing a Functional Model-based scoreboard or Software model using SystemC. The proposed Functional Model executes the assembly code based on the ISA of the processor IP, decodes all instructions, and executes as expected to be done by the DUT. The said model would give a lot of visibility and debug capability in the DUT bringing up micro-steps of execution.Keywords: ISA (instruction set architecture), NN (neural network), TLM (transaction-level modeling), GEMM (general matrix multiplication)
Procedia PDF Downloads 862330 Private Coded Computation of Matrix Multiplication
Authors: Malihe Aliasgari, Yousef Nejatbakhsh
Abstract:
The era of Big Data and the immensity of real-life datasets compels computation tasks to be performed in a distributed fashion, where the data is dispersed among many servers that operate in parallel. However, massive parallelization leads to computational bottlenecks due to faulty servers and stragglers. Stragglers refer to a few slow or delay-prone processors that can bottleneck the entire computation because one has to wait for all the parallel nodes to finish. The problem of straggling processors, has been well studied in the context of distributed computing. Recently, it has been pointed out that, for the important case of linear functions, it is possible to improve over repetition strategies in terms of the tradeoff between performance and latency by carrying out linear precoding of the data prior to processing. The key idea is that, by employing suitable linear codes operating over fractions of the original data, a function may be completed as soon as enough number of processors, depending on the minimum distance of the code, have completed their operations. The problem of matrix-matrix multiplication in the presence of practically big sized of data sets faced with computational and memory related difficulties, which makes such operations are carried out using distributed computing platforms. In this work, we study the problem of distributed matrix-matrix multiplication W = XY under storage constraints, i.e., when each server is allowed to store a fixed fraction of each of the matrices X and Y, which is a fundamental building of many science and engineering fields such as machine learning, image and signal processing, wireless communication, optimization. Non-secure and secure matrix multiplication are studied. We want to study the setup, in which the identity of the matrix of interest should be kept private from the workers and then obtain the recovery threshold of the colluding model, that is, the number of workers that need to complete their task before the master server can recover the product W. The problem of secure and private distributed matrix multiplication W = XY which the matrix X is confidential, while matrix Y is selected in a private manner from a library of public matrices. We present the best currently known trade-off between communication load and recovery threshold. On the other words, we design an achievable PSGPD scheme for any arbitrary privacy level by trivially concatenating a robust PIR scheme for arbitrary colluding workers and private databases and the proposed SGPD code that provides a smaller computational complexity at the workers.Keywords: coded distributed computation, private information retrieval, secret sharing, stragglers
Procedia PDF Downloads 1222329 On Direct Matrix Factored Inversion via Broyden's Updates
Authors: Adel Mohsen
Abstract:
A direct method based on the good Broyden's updates for evaluating the inverse of a nonsingular square matrix of full rank and solving related system of linear algebraic equations is studied. For a matrix A of order n whose LU-decomposition is A = LU, the multiplication count is O (n3). This includes the evaluation of the LU-decompositions of the inverse, the lower triangular decomposition of A as well as a “reduced matrix inverse”. If an explicit value of the inverse is not needed the order reduces to O (n3/2) to compute to compute inv(U) and the reduced inverse. For a symmetric matrix only O (n3/3) operations are required to compute inv(L) and the reduced inverse. An example is presented to demonstrate the capability of using the reduced matrix inverse in treating ill-conditioned systems. Besides the simplicity of Broyden's update, the method provides a mean to exploit the possible sparsity in the matrix and to derive a suitable preconditioner.Keywords: Broyden's updates, matrix inverse, inverse factorization, solution of linear algebraic equations, ill-conditioned matrices, preconditioning
Procedia PDF Downloads 4792328 Low-Complexity Multiplication Using Complement and Signed-Digit Recoding Methods
Authors: Te-Jen Chang, I-Hui Pan, Ping-Sheng Huang, Shan-Jen Cheng
Abstract:
In this paper, a fast multiplication computing method utilizing the complement representation method and canonical recoding technique is proposed. By performing complements and canonical recoding technique, the number of partial products can be reduced. Based on these techniques, we propose an algorithm that provides an efficient multiplication method. On average, our proposed algorithm is to reduce the number of k-bit additions from (0.25k+logk/k+2.5) to (k/6 +logk/k+2.5), where k is the bit-length of the multiplicand A and multiplier B. We can therefore efficiently speed up the overall performance of the multiplication. Moreover, if we use the new proposes to compute common-multiplicand multiplication, the computational complexity can be reduced from (0.5 k+2 logk/k+5) to (k/3+2 logk/k+5) k-bit additions.Keywords: algorithm design, complexity analysis, canonical recoding, public key cryptography, common-multiplicand multiplication
Procedia PDF Downloads 4352327 A Low-Latency Quadratic Extended Domain Modular Multiplier for Bilinear Pairing Based on Non-Least Positive Multiplication
Authors: Yulong Jia, Xiang Zhang, Ziyuan Wu, Shiji Hu
Abstract:
The calculation of bilinear pairing is the core of the SM9 algorithm, which relies on the underlying prime domain algorithm and the quadratic extension domain algorithm. Among the field algorithms, modular multiplication operation is the most time-consuming part. Therefore, the underlying modular multiplication algorithm is optimized to maximize the operation speed of bilinear pairings. This paper uses a modular multiplication method based on non-least positive (NLP) combined with Karatsuba and schoolbook multiplication to improve the Montgomery algorithm. At the same time, according to the characteristics of multiplication operation in the quadratic extension domain, a quadratic extension domain FP2-NLP modular multiplication algorithm for bilinear pairings is proposed, which effectively reduces the operation time of modular multiplication in the quadratic extension domain. The sub-expanded domain Fp₂ -NLP modular multiplication algorithm effectively reduces the operation time of modular multiplication under the second-expanded domain. The multiplication unit in the quadratic extension domain is implemented using SMIC55nm process, and two different implementation architectures are designed to cope with different application scenarios. Compared with the existing related literature, The output latency of this design can reach a minimum of 15 cycles. The shortest time for calculating the (AB+CD)r⁻¹ mod form is 37.5ns, and the comprehensive area-time product (AT) is 11400. The final R-ate pairing algorithm hardware accelerator consumes 2670k equivalent logic gates and 1.8ms computing time in 55nm process.Keywords: sm9, hardware, NLP, Montgomery
Procedia PDF Downloads 42326 Design and Construction of an Intelligent Multiplication Table for Enhanced Education and Increased Student Engagement
Authors: Zahra Alikhani Koopaei
Abstract:
In the fifth lesson of the third-grade mathematics book, students are introduced to the concept of multiplication. However, some students showed a lack of interest in learning this topic. To address this, a simple electronic multiplication table was designed with the aim of making the concept of multiplication entertaining and engaging for students. It provides them with moments of excitement during the learning process. To achieve this goal, a device was created that produced a bell sound when two wire ends were connected. Each wire end was connected to a specific number in the multiplication table, and the other end was linked to the corresponding answer. Consequently, if the answer is correct, the bell will ring. This study employs interactive and engaging methods to teach mathematics, particularly to students who have previously shown little interest in the subject. By integrating game-based learning and critical thinking, we observed an increase in understanding and interest in learning multiplication compared to before using this method. This further motivated the students. As a result, the intelligent multiplication table was successfully designed. Students, under the instructor's supervision, could easily construct the device during the lesson. Through the implementation of these operations, the concept of multiplication was firmly established in the students' minds. Engaging multiple intelligences in each student enhances a more stable and improved understanding of the concept of multiplication.Keywords: intelligent multiplication table, design, construction, education, increased interest, students
Procedia PDF Downloads 682325 Modified Montgomery for RSA Cryptosystem
Authors: Rupali Verma, Maitreyee Dutta, Renu Vig
Abstract:
Encryption and decryption in RSA are done by modular exponentiation which is achieved by repeated modular multiplication. Hence, efficiency of modular multiplication directly determines the efficiency of RSA cryptosystem. This paper designs a Modified Montgomery Modular multiplication in which addition of operands is computed by 4:2 compressor. The basic logic operations in addition are partitioned over two iterations such that parallel computations are performed. This reduces the critical path delay of proposed Montgomery design. The proposed design and RSA are implemented on Virtex 2 and Virtex 5 FPGAs. The two factors partitioning and parallelism have improved the frequency and throughput of proposed design.Keywords: RSA, montgomery modular multiplication, 4:2 compressor, FPGA
Procedia PDF Downloads 4132324 Mixed Number Algebra and Its Application
Authors: Md. Shah Alam
Abstract:
Mushfiq Ahmad has defined a Mixed Number, which is the sum of a scalar and a Cartesian vector. He has also defined the elementary group operations of Mixed numbers i.e. the norm of Mixed numbers, the product of two Mixed numbers, the identity element and the inverse. It has been observed that Mixed Number is consistent with Pauli matrix algebra and a handy tool to work with Dirac electron theory. Its use as a mathematical method in Physics has been studied. (1) We have applied Mixed number in Quantum Mechanics: Mixed Number version of Displacement operator, Vector differential operator, and Angular momentum operator has been developed. Mixed Number method has also been applied to Klein-Gordon equation. (2) We have applied Mixed number in Electrodynamics: Mixed Number version of Maxwell’s equation, the Electric and Magnetic field quantities and Lorentz Force has been found. (3) An associative transformation of Mixed Number numbers fulfilling Lorentz invariance requirement is developed. (4) We have applied Mixed number algebra as an extension of Complex number. Mixed numbers and the Quaternions have isomorphic correspondence, but they are different in algebraic details. The multiplication of unit Mixed number and the multiplication of unit Quaternions are different. Since Mixed Number has properties similar to those of Pauli matrix algebra, Mixed Number algebra is a more convenient tool to deal with Dirac equation.Keywords: mixed number, special relativity, quantum mechanics, electrodynamics, pauli matrix
Procedia PDF Downloads 3632323 Functional Instruction Set Simulator of a Neural Network IP with Native Brain Float-16 Generator
Authors: Debajyoti Mukherjee, Arathy B. S., Arpita Sahu, Saranga P. Pogula
Abstract:
A functional model to mimic the functional correctness of a neural network compute accelerator IP is very crucial for design validation. Neural network workloads are based on a Brain Floating Point (BF-16) data type. The major challenge we were facing was the incompatibility of GCC compilers to the BF-16 datatype, which we addressed with a native BF-16 generator integrated into our functional model. Moreover, working with big GEMM (General Matrix Multiplication) or SpMM (Sparse Matrix Multiplication) Work Loads (Dense or Sparse) and debugging the failures related to data integrity is highly painstaking. In this paper, we are addressing the quality challenge of such a complex neural network accelerator design by proposing a functional model-based scoreboard or software model using SystemC. The proposed functional model executes the assembly code based on the ISA of the processor IP, decodes all instructions, and executes as expected to be done by the DUT. The said model would give a lot of visibility and debug capability in the DUT, bringing up micro-steps of execution.Keywords: ISA, neural network, Brain Float-16, DUT
Procedia PDF Downloads 942322 Efficient Semi-Systolic Finite Field Multiplier Using Redundant Basis
Authors: Hyun-Ho Lee, Kee-Won Kim
Abstract:
The arithmetic operations over GF(2m) have been extensively used in error correcting codes and public-key cryptography schemes. Finite field arithmetic includes addition, multiplication, division and inversion operations. Addition is very simple and can be implemented with an extremely simple circuit. The other operations are much more complex. The multiplication is the most important for cryptosystems, such as the elliptic curve cryptosystem, since computing exponentiation, division, and computing multiplicative inverse can be performed by computing multiplication iteratively. In this paper, we present a parallel computation algorithm that operates Montgomery multiplication over finite field using redundant basis. Also, based on the multiplication algorithm, we present an efficient semi-systolic multiplier over finite field. The multiplier has less space and time complexities compared to related multipliers. As compared to the corresponding existing structures, the multiplier saves at least 5% area, 50% time, and 53% area-time (AT) complexity. Accordingly, it is well suited for VLSI implementation and can be easily applied as a basic component for computing complex operations over finite field, such as inversion and division operation.Keywords: finite field, Montgomery multiplication, systolic array, cryptography
Procedia PDF Downloads 2942321 Integrating Indigenous Students’ Funds of Knowledge to Introduce Multiplication with a Picture Storybook
Authors: Murni Sianturi, Andreas Au Hurit
Abstract:
The low level of Indigenous Papuan students’ literacy and numeracy in Merauke Regency-Indonesia needs to be considered. The development of a learnable storybook with pictures related to their lives might raise their curiosity to read. This study aimed to design a storybook as a complementary resource for the third graders using Indigenous Malind cultural approaches by employing research and development methods. The product developed was a thematic-integrative picture storybook using funds of knowledge from Indigenous students. All the book contents depicted Indigenous students’ lives and were in line with the national curriculum syllabus, specifically representing one sub-theme−multiplication topic. Multiplication material of grade 3 was modified in the form of a story, and at the end of the reading, students were given several multiplication exercises. Based on the results of the evaluation from the expert team, it was found that the average score was in the excellent category. The students’ and teacher’s responses to the storybook were very positive. Students were thrilled when reading this book and also effortlessly understood the concept of multiplication. Therefore, this book might be used as a companion book to the main book and serve as introductory reading material for students prior to discussing multiplication material.Keywords: a picture storybook, funds of knowledge, Indigenous elementary students, literacy, numeracy
Procedia PDF Downloads 1892320 Magnification Factor Based Seismic Response of Moment Resisting Frames with Open Ground Storey
Authors: Subzar Ahmad Bhat, Saraswati Setia, V. K.Sehgal
Abstract:
During the past earthquakes, open ground storey buildings have performed poorly due to the soft storey defect. Indian Standard IS 1893:2002 allows analysis of open ground storey buildings without considering infill stiffness but with a multiplication factor 2.5 in compensation for the stiffness discontinuity. Therefore, the aim of this paper is to check the applicability of the multiplication factor of 2.5 and study behaviour of the structure after the application of the multiplication factor. For this purpose, study is performed on models considering infill stiffness using SAP 2000 (Version 14) by linear static analysis and response spectrum analysis. Total seven models are analysed and designed for the range of multiplication factor ranging from 1.25 to 2.5. The value of multiplication factor equal to 2.5 has been found on the higher side, resulting in increased dimension and percentage of reinforcement without significant enhancement beyond a certain multiplication factor. When the building with OGS is designed for values of MF higher than 1.25 considering infill stiffness soft storey effect shifts from ground storey to first storey. For the analysis of the OGS structure best way to analysis the structure is to analyse it as the frame with stiffness and strength of the infill taken into account. The provision of infill walls in the upper storeys enhances the performance of the structure in terms of displacement and storey drift controls.Keywords: open ground storey, multiplication factor, IS 1893:2002 provisions, static analysis, response spectrum analysis, infill stiffness, equivalent strut
Procedia PDF Downloads 3942319 A Design of Elliptic Curve Cryptography Processor based on SM2 over GF(p)
Authors: Shiji Hu, Lei Li, Wanting Zhou, DaoHong Yang
Abstract:
The data encryption, is the foundation of today’s communication. On this basis, how to improve the speed of data encryption and decryption is always a problem that scholars work for. In this paper, we proposed an elliptic curve crypto processor architecture based on SM2 prime field. In terms of hardware implementation, we optimized the algorithms in different stages of the structure. In finite field modulo operation, we proposed an optimized improvement of Karatsuba-Ofman multiplication algorithm, and shorten the critical path through pipeline structure in the algorithm implementation. Based on SM2 recommended prime field, a fast modular reduction algorithm is used to reduce 512-bit wide data obtained from the multiplication unit. The radix-4 extended Euclidean algorithm was used to realize the conversion between affine coordinate system and Jacobi projective coordinate system. In the parallel scheduling of point operations on elliptic curves, we proposed a three-level parallel structure of point addition and point double based on the Jacobian projective coordinate system. Combined with the scalar multiplication algorithm, we added mutual pre-operation to the point addition and double point operation to improve the efficiency of the scalar point multiplication. The proposed ECC hardware architecture was verified and implemented on Xilinx Virtex-7 and ZYNQ-7 platforms, and each 256-bit scalar multiplication operation took 0.275ms. The performance for handling scalar multiplication is 32 times that of CPU(dual-core ARM Cortex-A9).Keywords: Elliptic curve cryptosystems, SM2, modular multiplication, point multiplication.
Procedia PDF Downloads 982318 Easily Memorable Strong Password Generation and Retrieval
Authors: Shatadru Das, Natarajan Vijayarangan
Abstract:
In this paper, a system and method for generating and recovering an authorization code has been designed and analyzed. The system creates an authorization code by accepting a base-sentence from a user. Based on the characters present in this base-sentence, the system computes a base-sentence matrix. The system also generates a plurality of patterns. The user can either select the pattern from the multiple patterns suggested by the system or can create his/her own pattern. The system then performs multiplications between the base-sentence matrix and the selected pattern matrix at different stages in the path forward, for obtaining a strong authorization code. In case the user forgets the base sentence, the system has a provision to manage and retrieve 'forgotten authorization code'. This is done by fragmenting the base sentence into different matrices and storing the fragmented matrices into a repository after computing matrix multiplication with a security question-answer approach and with a secret key provided by the user.Keywords: easy authentication, key retrieval, memorable passwords, strong password generation
Procedia PDF Downloads 4002317 Symmetry Properties of Linear Algebraic Systems with Non-Canonical Scalar Multiplication
Authors: Krish Jhurani
Abstract:
The research paper presents an in-depth analysis of symmetry properties in linear algebraic systems under the operation of non-canonical scalar multiplication structures, specifically semirings, and near-rings. The objective is to unveil the profound alterations that occur in traditional linear algebraic structures when we replace conventional field multiplication with these non-canonical operations. In the methodology, we first establish the theoretical foundations of non-canonical scalar multiplication, followed by a meticulous investigation into the resulting symmetry properties, focusing on eigenvectors, eigenspaces, and invariant subspaces. The methodology involves a combination of rigorous mathematical proofs and derivations, supplemented by illustrative examples that exhibit these discovered symmetry properties in tangible mathematical scenarios. The core findings uncover unique symmetry attributes. For linear algebraic systems with semiring scalar multiplication, we reveal eigenvectors and eigenvalues. Systems operating under near-ring scalar multiplication disclose unique invariant subspaces. These discoveries drastically broaden the traditional landscape of symmetry properties in linear algebraic systems. With the application of these findings, potential practical implications span across various fields such as physics, coding theory, and cryptography. They could enhance error detection and correction codes, devise more secure cryptographic algorithms, and even influence theoretical physics. This expansion of applicability accentuates the significance of the presented research. The research paper thus contributes to the mathematical community by bringing forth perspectives on linear algebraic systems and their symmetry properties through the lens of non-canonical scalar multiplication, coupled with an exploration of practical applications.Keywords: eigenspaces, eigenvectors, invariant subspaces, near-rings, non-canonical scalar multiplication, semirings, symmetry properties
Procedia PDF Downloads 1232316 The Second Smallest Eigenvalue of Complete Tripartite Hypergraph
Authors: Alfi Y. Zakiyyah, Hanni Garminia, M. Salman, A. N. Irawati
Abstract:
In the terminology of the hypergraph, there is a relation with the terminology graph. In the theory of graph, the edges connected two vertices. In otherwise, in hypergraph, the edges can connect more than two vertices. There is representation matrix of a graph such as adjacency matrix, Laplacian matrix, and incidence matrix. The adjacency matrix is symmetry matrix so that all eigenvalues is real. This matrix is a nonnegative matrix. The all diagonal entry from adjacency matrix is zero so that the trace is zero. Another representation matrix of the graph is the Laplacian matrix. Laplacian matrix is symmetry matrix and semidefinite positive so that all eigenvalues are real and non-negative. According to the spectral study in the graph, some that result is generalized to hypergraph. A hypergraph can be represented by a matrix such as adjacency, incidence, and Laplacian matrix. Throughout for this term, we use Laplacian matrix to represent a complete tripartite hypergraph. The aim from this research is to determine second smallest eigenvalues from this matrix and find a relation this eigenvalue with the connectivity of that hypergraph.Keywords: connectivity, graph, hypergraph, Laplacian matrix
Procedia PDF Downloads 4882315 Embedded Semantic Segmentation Network Optimized for Matrix Multiplication Accelerator
Authors: Jaeyoung Lee
Abstract:
Autonomous driving systems require high reliability to provide people with a safe and comfortable driving experience. However, despite the development of a number of vehicle sensors, it is difficult to always provide high perceived performance in driving environments that vary from time to season. The image segmentation method using deep learning, which has recently evolved rapidly, provides high recognition performance in various road environments stably. However, since the system controls a vehicle in real time, a highly complex deep learning network cannot be used due to time and memory constraints. Moreover, efficient networks are optimized for GPU environments, which degrade performance in embedded processor environments equipped simple hardware accelerators. In this paper, a semantic segmentation network, matrix multiplication accelerator network (MMANet), optimized for matrix multiplication accelerator (MMA) on Texas instrument digital signal processors (TI DSP) is proposed to improve the recognition performance of autonomous driving system. The proposed method is designed to maximize the number of layers that can be performed in a limited time to provide reliable driving environment information in real time. First, the number of channels in the activation map is fixed to fit the structure of MMA. By increasing the number of parallel branches, the lack of information caused by fixing the number of channels is resolved. Second, an efficient convolution is selected depending on the size of the activation. Since MMA is a fixed, it may be more efficient for normal convolution than depthwise separable convolution depending on memory access overhead. Thus, a convolution type is decided according to output stride to increase network depth. In addition, memory access time is minimized by processing operations only in L3 cache. Lastly, reliable contexts are extracted using the extended atrous spatial pyramid pooling (ASPP). The suggested method gets stable features from an extended path by increasing the kernel size and accessing consecutive data. In addition, it consists of two ASPPs to obtain high quality contexts using the restored shape without global average pooling paths since the layer uses MMA as a simple adder. To verify the proposed method, an experiment is conducted using perfsim, a timing simulator, and the Cityscapes validation sets. The proposed network can process an image with 640 x 480 resolution for 6.67 ms, so six cameras can be used to identify the surroundings of the vehicle as 20 frame per second (FPS). In addition, it achieves 73.1% mean intersection over union (mIoU) which is the highest recognition rate among embedded networks on the Cityscapes validation set.Keywords: edge network, embedded network, MMA, matrix multiplication accelerator, semantic segmentation network
Procedia PDF Downloads 1292314 Conditions on Expressing a Matrix as a Sum of α-Involutions
Authors: Ric Joseph R. Murillo, Edna N. Gueco, Dennis I. Merino
Abstract:
Let F be C or R, where C and R are the set of complex numbers and real numbers, respectively, and n be a natural number. An n-by-n matrix A over the field F is called an α-involutory matrix or an α-involution if there exists an α in the field such that the square of the matrix is equal to αI, where I is the n-by-n identity matrix. If α is a complex number or a nonnegative real number, then an n-by-n matrix A over the field F can be written as a sum of n-by-n α-involutory matrices over the field F if and only if the trace of that matrix is an integral multiple of the square root of α. Meanwhile, if α is a negative real number, then a 2n-by-2n matrix A over R can be written as a sum of 2n-by-2n α-involutory matrices over R if and only the trace of the matrix is zero. Some other properties of α-involutory matrices are also determinedKeywords: α-involutory Matrices, sum of α-involutory Matrices, Trace, Matrix Theory
Procedia PDF Downloads 1982313 Optimizing Agricultural Packaging in Fiji: Strategic Barrier Analysis Using Interpretive Structural Modeling and Cross-Impact Matrix Multiplication Applied to Classification
Authors: R. Ananthanarayanan, S. B. Nakula, D. R. Seenivasagam, J. Naua, B. Sharma
Abstract:
Product packaging is a critical component of production, trade, and marketing, playing numerous vital roles that often go unnoticed by consumers. Packaging is essential for maintaining the shelf life, quality assurance, and safety of both manufactured and agricultural products. For example, harvested produce or processed foods can quickly lose quality and freshness, making secure packaging crucial for preservation and safety throughout the food supply chain. In Fiji, agricultural packaging has primarily been managed by local companies for international trade, with gradual advancements in these practices. To further enhance the industry’s performance, this study examines the challenges and constraints hindering the optimization of agricultural packaging practices in Fiji. The study utilizes Multi-Criteria Decision Making (MCDM) tools, specifically Interpretive Structural Modeling (ISM) and Cross-Impact Matrix Multiplication Applied to Classification (MICMAC). ISM analyzes the hierarchical structure of barriers, categorizing them from the least to the most influential, while MICMAC classifies barriers based on their driving and dependence power. This approach helps identify the interrelationships between barriers, providing valuable insights for policymakers and decision-makers to propose innovative solutions for sustainable development in the agricultural packaging sector, ultimately shaping the future of packaging practices in Fiji.Keywords: agricultural packaging, barriers, ISM, MICMAC
Procedia PDF Downloads 282312 High Speed Image Rotation Algorithm
Authors: Hee-Choul Kwon, Hyungjin Cho, Heeyong Kwon
Abstract:
Image rotation is one of main pre-processing step in image processing or image pattern recognition. It is implemented with rotation matrix multiplication. However it requires lots of floating point arithmetic operations and trigonometric function calculations, so it takes long execution time. We propose a new high speed image rotation algorithm without two major time-consuming operations. We compare the proposed algorithm with the conventional rotation one with various size images. Experimental results show that the proposed algorithm is superior to the conventional rotation ones.Keywords: high speed rotation operation, image processing, image rotation, pattern recognition, transformation matrix
Procedia PDF Downloads 5062311 Very Large Scale Integration Architecture of Finite Impulse Response Filter Implementation Using Retiming Technique
Authors: S. Jalaja, A. M. Vijaya Prakash
Abstract:
Recursive combination of an algorithm based on Karatsuba multiplication is exploited to design a generalized transpose and parallel Finite Impulse Response (FIR) Filter. Mid-range Karatsuba multiplication and Carry Save adder based on Karatsuba multiplication reduce time complexity for higher order multiplication implemented up to n-bit. As a result, we design modified N-tap Transpose and Parallel Symmetric FIR Filter Structure using Karatsuba algorithm. The mathematical formulation of the FFA Filter is derived. The proposed architecture involves significantly less area delay product (APD) then the existing block implementation. By adopting retiming technique, hardware cost is reduced further. The filter architecture is designed by using 90 nm technology library and is implemented by using cadence EDA Tool. The synthesized result shows better performance for different word length and block size. The design achieves switching activity reduction and low power consumption by applying with and without retiming for different combination of the circuit. The proposed structure achieves more than a half of the power reduction by adopting with and without retiming techniques compared to the earlier design structure. As a proof of the concept for block size 16 and filter length 64 for CKA method, it achieves a 51% as well as 70% less power by applying retiming technique, and for CSA method it achieves a 57% as well as 77% less power by applying retiming technique compared to the previously proposed design.Keywords: carry save adder Karatsuba multiplication, mid range Karatsuba multiplication, modified FFA and transposed filter, retiming
Procedia PDF Downloads 2352310 Implementation of Integer Sub-Decomposition Method on Elliptic Curves with J-Invariant 1728
Authors: Siti Noor Farwina Anwar, Hailiza Kamarulhaili
Abstract:
In this paper, we present the idea of implementing the Integer Sub-Decomposition (ISD) method on elliptic curves with j-invariant 1728. The ISD method was proposed in 2013 to compute scalar multiplication in elliptic curves, which remains to be the most expensive operation in Elliptic Curve Cryptography (ECC). However, the original ISD method only works on integer number field and solve integer scalar multiplication. By extending the method into the complex quadratic field, we are able to solve complex multiplication and implement the ISD method on elliptic curves with j-invariant 1728. The curve with j-invariant 1728 has a unique discriminant of the imaginary quadratic field. This unique discriminant of quadratic field yields a unique efficiently computable endomorphism, which later able to speed up the computations on this curve. However, the ISD method needs three endomorphisms to be accomplished. Hence, we choose all three endomorphisms to be from the same imaginary quadratic field as the curve itself, where the first endomorphism is the unique endomorphism yield from the discriminant of the imaginary quadratic field.Keywords: efficiently computable endomorphism, elliptic scalar multiplication, j-invariant 1728, quadratic field
Procedia PDF Downloads 1992309 Manufacturing and Characterization of Ni-Matrix Composite Reinforced with Ti3SiC2 and Ti2AlC; and Al-Matrix with Ti2SiC
Authors: M. Hadji, N. Chiker, Y. Hadji, A. Haddad
Abstract:
In this paper, we report for the first time on the synthesis and characterization of novel MAX phases (Ti3SiC2, Ti2AlC) reinforced Ni-matrix and Ti2AlC reinforced Al-matrix. The stability of MAX phases in Al-matrix and Ni-matrix at a temperature of 985°C has been investigated. All the composites were cold pressed and sintered at a temperature of 985°C for 20min in H2 environment, except (Ni/Ti3SiC2) who was sintered at 1100°C for 1h.Microstructure analysis by scanning electron microscopy and phase analysis by X-Ray diffraction confirmed that there was minimal interfacial reaction between MAX particles and Ni, thus Al/MAX samples shown that MAX phases was totally decomposed at 985°C.The Addition of MAX enhanced the Al-matrix and Ni-matrix.Keywords: MAX phase, microstructures, composites, hardness, SEM
Procedia PDF Downloads 3472308 Inverse Matrix in the Theory of Dynamical Systems
Authors: Renata Masarova, Bohuslava Juhasova, Martin Juhas, Zuzana Sutova
Abstract:
In dynamic system theory a mathematical model is often used to describe their properties. In order to find a transfer matrix of a dynamic system we need to calculate an inverse matrix. The paper contains the fusion of the classical theory and the procedures used in the theory of automated control for calculating the inverse matrix. The final part of the paper models the given problem by the Matlab.Keywords: dynamic system, transfer matrix, inverse matrix, modeling
Procedia PDF Downloads 5152307 In vitro Clonal Multiplication and Acclimatization of Large Cardamom (Amomum subulatum Roxb.)
Authors: Krishna Poudel, Tahar Katuwal, Sujan Karki
Abstract:
A rapid propagation and acclimatization method of large cardamom was optimized in this study. Sprouted rhizome buds were collected. The excised rhizome bud explants were cultured on semi solid culture media. The explants were cultured on Murashige and Skoog’s (MS) medium supplemented with different concentration and combinations of BAP (6-Benzyl-amino-purine) and IBA (Indole-3-butyric acid) for shoot and root induction. Explants cultured on MS basal medium supplemented with 1.0 mg/l BAP + 0.5 gm/l IBA showed the highest rate of shoot multiplication. In vitro shoots were rooted on to the half-strength MS basal media supplemented with 0.5 mg/l IBA. Rooted shoots were transplanted in the screen house for hardening process. These hardened plants were subsequently shifted into the netted nursery for further multiplication process.Keywords: concentration, explants, hardening, rhizome
Procedia PDF Downloads 2432306 Image Rotation Using an Augmented 2-Step Shear Transform
Authors: Hee-Choul Kwon, Heeyong Kwon
Abstract:
Image rotation is one of main pre-processing steps for image processing or image pattern recognition. It is implemented with a rotation matrix multiplication. It requires a lot of floating point arithmetic operations and trigonometric calculations, so it takes a long time to execute. Therefore, there has been a need for a high speed image rotation algorithm without two major time-consuming operations. However, the rotated image has a drawback, i.e. distortions. We solved the problem using an augmented two-step shear transform. We compare the presented algorithm with the conventional rotation with images of various sizes. Experimental results show that the presented algorithm is superior to the conventional rotation one.Keywords: high-speed rotation operation, image rotation, transform matrix, image processing, pattern recognition
Procedia PDF Downloads 2772305 On the Application of Heuristics of the Traveling Salesman Problem for the Task of Restoring the DNA Matrix
Authors: Boris Melnikov, Dmitrii Chaikovskii, Elena Melnikova
Abstract:
The traveling salesman problem (TSP) is a well-known optimization problem that seeks to find the shortest possible route that visits a set of points and returns to the starting point. In this paper, we apply some heuristics of the TSP for the task of restoring the DNA matrix. This restoration problem is often considered in biocybernetics. For it, we must recover the matrix of distances between DNA sequences if not all the elements of the matrix under consideration are known at the input. We consider the possibility of using this method in the testing of distance calculation algorithms between a pair of DNAs to restore the partially filled matrix.Keywords: optimization problems, DNA matrix, partially filled matrix, traveling salesman problem, heuristic algorithms
Procedia PDF Downloads 150