Search results for: overheads
22 Reduction of Overheads with Dynamic Caching in Fixed AODV based MANETs
Authors: Babar S. Kawish, Baber Aslam, Shoab A Khan
Abstract:
In this paper we show that adjusting ART in accordance with static network scenario can substantially improve the performance of AODV by reducing control overheads. We explain the relationship of control overheads with network size and request patterns of the users. Through simulation we show that making ART proportionate to network static time reduces the amount of control overheads independent of network size and user request patterns.
Keywords: AODV, ART, MANET, Route Cache, TTL.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 174421 Overhead Estimation over Capacity of Mobile WiMAX
Authors: Saeed AL-Rashdy, Qing Guo
Abstract:
The IEEE802.16 standard which has emerged as Broadband Wireless Access (BWA) technology, promises to deliver high data rate over large areas to a large number of subscribers in the near future. This paper analyze the effect of overheads over capacity of downlink (DL) of orthogonal frequency division multiple access (OFDMA)–based on the IEEE802.16e mobile WiMAX system with and without overheads. The analysis focuses in particular on the impact of Adaptive Modulation and Coding (AMC) as well as deriving an algorithm to determine the maximum numbers of subscribers that each specific WiMAX sector may support. An analytical study of the WiMAX propagation channel by using Cost- 231 Hata Model is presented. Numerical results and discussion estimated by using Matlab to simulate the algorithm for different multi-users parameters.Keywords: BWA, mobile WiMAX, capacity, AMC , overheads.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 208120 Case Study Approach Using Scenario Analysis to Analyze Unabsorbed Head Office Overheads
Authors: K. C. Iyer, T. Gupta, Y. M. Bindal
Abstract:
Head office overhead (HOOH) is an indirect cost and is recovered through individual project billings by the contractor. Delay in a project impacts the absorption of HOOH cost allocated to that particular project and thus diminishes the expected profit of the contractor. This unabsorbed HOOH cost is later claimed by contractors as damages. The subjective nature of the available formulae to compute unabsorbed HOOH is the difficulty that contractors and owners face and thus dispute it. The paper attempts to bring together the rationale of various HOOH formulae by gathering contractor’s HOOH cost data on all of its project, using case study approach and comparing variations in values of HOOH using scenario analysis. The case study approach uses project data collected from four construction projects of a contractor in India to calculate unabsorbed HOOH costs from various available formulae. Scenario analysis provides further variations in HOOH values after considering two independent situations mainly scope changes and new projects during the delay period. Interestingly, one of the findings in this study reveals that, in spite of HOOH getting absorbed by additional works available during the period of delay, a few formulae depict an increase in the value of unabsorbed HOOH, neglecting any absorption by the increase in scope. This indicates that these formulae are inappropriate for use in case of a change to the scope of work. Results of this study can help both parties in deciding on an appropriate formula more objectively, considering the events on a project causing the delay and contractor's position in respect of obtaining new projects.
Keywords: Absorbed and unabsorbed overheads, head office overheads, scenario analysis, scope variation
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 82419 Speedup Breadth-First Search by Graph Ordering
Abstract:
Breadth-First Search (BFS) is a core graph algorithm that is widely used for graph analysis. As it is frequently used in many graph applications, improving the BFS performance is essential. In this paper, we present a graph ordering method that could reorder the graph nodes to achieve better data locality, thus, improving the BFS performance. Our method is based on an observation that the sibling relationships will dominate the cache access pattern during the BFS traversal. Therefore, we propose a frequency-based model to construct the graph order. First, we optimize the graph order according to the nodes’ visit frequency. Nodes with high visit frequency will be processed in priority. Second, we try to maximize the child nodes’ overlap layer by layer. As it is proved to be NP-hard, we propose a heuristic method that could greatly reduce the preprocessing overheads.We conduct extensive experiments on 16 real-world datasets. The result shows that our method could achieve comparable performance with the state-of-the-art methods while the graph ordering overheads are only about 1/15.
Keywords: Breadth-first search, BFS, graph ordering, graph algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 63318 A Comparative Analysis of Activity-Based Costing and Traditional Costing
Authors: Derya Eren Akyol, Gonca Tuncel, G. Mirac Bayhan
Abstract:
Activity-Based Costing (ABC) which has become an important aspect of manufacturing/service organizations can be defined as a methodology that measures the cost and performance of activities, resources and cost objects. It can be considered as an alternative paradigm to traditional cost-based accounting systems. The objective of this paper is to illustrate an application of ABC method and to compare the results of ABC with traditional costing methods. The results of the application highlight the weak points of traditional costing methods and an S-Curve obtained is used to identify the undercosted and overcosted products of the firm.
Keywords: Activity-based costing, cost drivers, overheads, traditional costing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1264017 Mapping Knowledge Model Onto Java Codes
Authors: B.A.Gobin, R.K.Subramanian
Abstract:
This paper gives an overview of the mapping mechanism of SEAM-a methodology for the automatic generation of knowledge models and its mapping onto Java codes. It discusses the rules that will be used to map the different components in the knowledge model automatically onto Java classes, properties and methods. The aim of developing this mechanism is to help in the creation of a prototype which will be used to validate the knowledge model which has been generated automatically. It will also help to link the modeling phase with the implementation phase as existing knowledge engineering methodologies do not provide for proper guidelines for the transition from the knowledge modeling phase to development phase. This will decrease the development overheads associated to the development of Knowledge Based Systems.Keywords: KBS, OWL, ontology, knowledge models
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 138316 Fast Document Segmentation Using Contourand X-Y Cut Technique
Authors: Boontee Kruatrachue, Narongchai Moongfangklang, Kritawan Siriboon
Abstract:
This paper describes fast and efficient method for page segmentation of document containing nonrectangular block. The segmentation is based on edge following algorithm using small window of 16 by 32 pixels. This segmentation is very fast since only border pixels of paragraph are used without scanning the whole page. Still, the segmentation may contain error if the space between them is smaller than the window used in edge following. Consequently, this paper reduce this error by first identify the missed segmentation point using direction information in edge following then, using X-Y cut at the missed segmentation point to separate the connected columns. The advantage of the proposed method is the fast identification of missed segmentation point. This methodology is faster with fewer overheads than other algorithms that need to access much more pixel of a document.
Keywords: Contour Direction Technique, Missed SegmentationPoints, Page Segmentation, Recursive X-Y Cut Technique
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 278215 Dynamic Load Balancing in PVM Using Intelligent Application
Authors: Kashif Bilal, Tassawar Iqbal, Asad Ali Safi, Nadeem Daudpota
Abstract:
This paper deals with dynamic load balancing using PVM. In distributed environment Load Balancing and Heterogeneity are very critical issues and needed to drill down in order to achieve the optimal results and efficiency. Various techniques are being used in order to distribute the load dynamically among different nodes and to deal with heterogeneity. These techniques are using different approaches where Process Migration is basic concept with different optimal flavors. But Process Migration is not an easy job, it impose lot of burden and processing effort in order to track each process in nodes. We will propose a dynamic load balancing technique in which application will intelligently balance the load among different nodes, resulting in efficient use of system and have no overheads of process migration. It would also provide a simple solution to problem of load balancing in heterogeneous environment.
Keywords: PVM, load balancing, task allocation, intelligent application.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 180714 Application-Specific Instruction Sets Processor with Implicit Registers to Improve Register Bandwidth
Authors: Ginhsuan Li, Chiuyun Hung, Desheng Chen, Yiwen Wang
Abstract:
Application-Specific Instruction (ASI ) set Processors (ASIP) have become an important design choice for embedded systems due to runtime flexibility, which cannot be provided by custom ASIC solutions. One major bottleneck in maximizing ASIP performance is the limitation on the data bandwidth between the General Purpose Register File (GPRF) and ASIs. This paper presents the Implicit Registers (IRs) to provide the desirable data bandwidth. An ASI Input/Output model is proposed to formulate the overheads of the additional data transfer between the GPRF and IRs, therefore, an IRs allocation algorithm is used to achieve the better performance by minimizing the number of extra data transfer instructions. The experiment results show an up to 3.33x speedup compared to the results without using IRs.Keywords: Application-Specific Instruction-set Processors, data bandwidth, configurable processor, implicit register.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 153513 Qmulus – A Cloud Driven GPS Based Tracking System for Real-Time Traffic Routing
Authors: Niyati Parameswaran, Bharathi Muthu, Madiajagan Muthaiyan
Abstract:
This paper presents Qmulus- a Cloud Based GPS Model. Qmulus is designed to compute the best possible route which would lead the driver to the specified destination in the shortest time while taking into account real-time constraints. Intelligence incorporated to Qmulus-s design makes it capable of generating and assigning priorities to a list of optimal routes through customizable dynamic updates. The goal of this design is to minimize travel and cost overheads, maintain reliability and consistency, and implement scalability and flexibility. The model proposed focuses on reducing the bridge between a Client Application and a Cloud service so as to render seamless operations. Qmulus-s system model is closely integrated and its concept has the potential to be extended into several other integrated applications making it capable of adapting to different media and resources.Keywords: Cloud Services, GPS, Real-Time Constraints, Shortest Path, System Management and Traffic Routing
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 179512 An Authentication Protocol for Quantum Enabled Mobile Devices
Authors: Natarajan Venkatachalam, Subrahmanya V. R. K. Rao, Vijay Karthikeyan Dhandapani, Swaminathan Saravanavel
Abstract:
The quantum communication technology is an evolving design which connects multiple quantum enabled devices to internet for secret communication or sensitive information exchange. In future, the number of these compact quantum enabled devices will increase immensely making them an integral part of present communication systems. Therefore, safety and security of such devices is also a major concern for us. To ensure the customer sensitive information will not be eavesdropped or deciphered, we need a strong authentications and encryption mechanism. In this paper, we propose a mutual authentication scheme between these smart quantum devices and server based on the secure exchange of information through quantum channel which gives better solutions for symmetric key exchange issues. An important part of this work is to propose a secure mutual authentication protocol over the quantum channel. We show that our approach offers robust authentication protocol and further our solution is lightweight, scalable, cost-effective with optimized computational processing overheads.Keywords: Quantum cryptography, quantum key distribution, wireless quantum communication, authentication protocol, quantum enabled device, trusted third party.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 122011 A Proxy Multi-Signature Scheme with Anonymous Vetoable Delegation
Authors: Pei-yih Ting, Dream-Ming Huang, Xiao-Wei Huang
Abstract:
Frequently a group of people jointly decide and authorize a specific person as a representative in some business/poitical occasions, e.g., the board of a company authorizes the chief executive officer to close a multi-billion acquisition deal. In this paper, an integrated proxy multi-signature scheme that allows anonymously vetoable delegation is proposed. This protocol integrates mechanisms of private veto, distributed proxy key generation, secure transmission of proxy key, and existentially unforgeable proxy multi-signature scheme. First, a provably secure Guillou-Quisquater proxy signature scheme is presented, then the “zero-sharing" protocol is extended over a composite modulus multiplicative group, and finally the above two are combined to realize the GQ proxy multi-signature with anonymously vetoable delegation. As a proxy signature scheme, this protocol protects both the original signers and the proxy signer. The modular design allows simplified implementation with less communication overheads and better computation performance than a general secure multi-party protocol.Keywords: GQ proxy signature, proxy multi-signature, zero-sharing protocol, secure multi-party protocol, private veto protocol
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 154210 Low Complexity Multi Mode Interleaver Core for WiMAX with Support for Convolutional Interleaving
Authors: Rizwan Asghar, Dake Liu
Abstract:
A hardware efficient, multi mode, re-configurable architecture of interleaver/de-interleaver for multiple standards, like DVB, WiMAX and WLAN is presented. The interleavers consume a large part of silicon area when implemented by using conventional methods as they use memories to store permutation patterns. In addition, different types of interleavers in different standards cannot share the hardware due to different construction methodologies. The novelty of the work presented in this paper is threefold: 1) Mapping of vital types of interleavers including convolutional interleaver onto a single architecture with flexibility to change interleaver size; 2) Hardware complexity for channel interleaving in WiMAX is reduced by using 2-D realization of the interleaver functions; and 3) Silicon cost overheads reduced by avoiding the use of small memories. The proposed architecture consumes 0.18mm2 silicon area for 0.12μm process and can operate at a frequency of 140 MHz. The reduced complexity helps in minimizing the memory utilization, and at the same time provides strong support to on-the-fly computation of permutation patterns.Keywords: Hardware interleaver implementation, WiMAX, DVB, block interleaver, convolutional interleaver, hardwaremultiplexing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20349 MiSense Hierarchical Cluster-Based Routing Algorithm (MiCRA) for Wireless Sensor Networks
Authors: Kavi K. Khedo, R. K. Subramanian
Abstract:
Wireless sensor networks (WSN) are currently receiving significant attention due to their unlimited potential. These networks are used for various applications, such as habitat monitoring, automation, agriculture, and security. The efficient nodeenergy utilization is one of important performance factors in wireless sensor networks because sensor nodes operate with limited battery power. In this paper, we proposed the MiSense hierarchical cluster based routing algorithm (MiCRA) to extend the lifetime of sensor networks and to maintain a balanced energy consumption of nodes. MiCRA is an extension of the HEED algorithm with two levels of cluster heads. The performance of the proposed protocol has been examined and evaluated through a simulation study. The simulation results clearly show that MiCRA has a better performance in terms of lifetime than HEED. Indeed, MiCRA our proposed protocol can effectively extend the network lifetime without other critical overheads and performance degradation. It has been noted that there is about 35% of energy saving for MiCRA during the clustering process and 65% energy savings during the routing process compared to the HEED algorithm.Keywords: Clustering algorithm, energy consumption, hierarchical model, sensor networks.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17798 Effective Defect Prevention Approach in Software Process for Achieving Better Quality Levels
Authors: Suma. V., T. R. Gopalakrishnan Nair
Abstract:
Defect prevention is the most vital but habitually neglected facet of software quality assurance in any project. If functional at all stages of software development, it can condense the time, overheads and wherewithal entailed to engineer a high quality product. The key challenge of an IT industry is to engineer a software product with minimum post deployment defects. This effort is an analysis based on data obtained for five selected projects from leading software companies of varying software production competence. The main aim of this paper is to provide information on various methods and practices supporting defect detection and prevention leading to thriving software generation. The defect prevention technique unearths 99% of defects. Inspection is found to be an essential technique in generating ideal software generation in factories through enhanced methodologies of abetted and unaided inspection schedules. On an average 13 % to 15% of inspection and 25% - 30% of testing out of whole project effort time is required for 99% - 99.75% of defect elimination. A comparison of the end results for the five selected projects between the companies is also brought about throwing light on the possibility of a particular company to position itself with an appropriate complementary ratio of inspection testing.Keywords: Defect Detection and Prevention, Inspections, Software Engineering, Software Process, Testing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15367 Value Engineering and Its Effect in Reduction of Industrial Organization Energy Expenses
Authors: Habibollah Najafi, Amir Abbas Yazdani, Hosseinali Nahavandi
Abstract:
The review performed on the condition of energy consumption & rate in Iran, shows that unfortunately the subject of optimization and conservation of energy in active industries of country lacks a practical & effective method and in most factories, the energy consumption and rate is more than in similar industries of industrial countries. The increasing demand of electrical energy and the overheads which it imposes on the organization, forces companies to search for suitable approaches to optimize energy consumption and demand management. Application of value engineering techniques is among these approaches. Value engineering is considered a powerful tool for improving profitability. These tools are used for reduction of expenses, increasing profits, quality improvement, increasing market share, performing works in shorter durations, more efficient utilization of sources & etc. In this article, we shall review the subject of value engineering and its capabilities for creating effective transformations in industrial organizations, in order to reduce energy costs & the results have been investigated and described during a case study in Mazandaran wood and paper industries, the biggest consumer of energy in north of Iran, for the purpose of presenting the effects of performed tasks in optimization of energy consumption by utilizing value engineering techniques in one case study.Keywords: Value Engineering (VE), Expense, Energy, Industrial
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 22666 Providing a Secure, Reliable and Decentralized Document Management Solution Using Blockchain by a Virtual Identity Card
Authors: Meet Shah, Ankita Aditya, Dhruv Bindra, V. S. Omkar, Aashruti Seervi
Abstract:
In today's world, we need documents everywhere for a smooth workflow in the identification process or any other security aspects. The current system and techniques which are used for identification need one thing, that is ‘proof of existence’, which involves valid documents, for example, educational, financial, etc. The main issue with the current identity access management system and digital identification process is that the system is centralized in their network, which makes it inefficient. The paper presents the system which resolves all these cited issues. It is based on ‘blockchain’ technology, which is a 'decentralized system'. It allows transactions in a decentralized and immutable manner. The primary notion of the model is to ‘have everything with nothing’. It involves inter-linking required documents of a person with a single identity card so that a person can go anywhere without having the required documents with him/her. The person just needs to be physically present at a place wherein documents are necessary, and using a fingerprint impression and an iris scan print, the rest of the verification will progress. Furthermore, some technical overheads and advancements are listed. This paper also aims to layout its far-vision scenario of blockchain and its impact on future trends.
Keywords: Blockchain, decentralized system, fingerprint impression, identity management, iris scan.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13025 Performance Evaluation of Distributed Deep Learning Frameworks in Cloud Environment
Authors: Shuen-Tai Wang, Fang-An Kuo, Chau-Yi Chou, Yu-Bin Fang
Abstract:
2016 has become the year of the Artificial Intelligence explosion. AI technologies are getting more and more matured that most world well-known tech giants are making large investment to increase the capabilities in AI. Machine learning is the science of getting computers to act without being explicitly programmed, and deep learning is a subset of machine learning that uses deep neural network to train a machine to learn features directly from data. Deep learning realizes many machine learning applications which expand the field of AI. At the present time, deep learning frameworks have been widely deployed on servers for deep learning applications in both academia and industry. In training deep neural networks, there are many standard processes or algorithms, but the performance of different frameworks might be different. In this paper we evaluate the running performance of two state-of-the-art distributed deep learning frameworks that are running training calculation in parallel over multi GPU and multi nodes in our cloud environment. We evaluate the training performance of the frameworks with ResNet-50 convolutional neural network, and we analyze what factors that result in the performance among both distributed frameworks as well. Through the experimental analysis, we identify the overheads which could be further optimized. The main contribution is that the evaluation results provide further optimization directions in both performance tuning and algorithmic design.
Keywords: Artificial Intelligence, machine learning, deep learning, convolutional neural networks.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12574 A PIM (Processor-In-Memory) for Computer Graphics : Data Partitioning and Placement Schemes
Authors: Jae Chul Cha, Sandeep K. Gupta
Abstract:
The demand for higher performance graphics continues to grow because of the incessant desire towards realism. And, rapid advances in fabrication technology have enabled us to build several processor cores on a single die. Hence, it is important to develop single chip parallel architectures for such data-intensive applications. In this paper, we propose an efficient PIM architectures tailored for computer graphics which requires a large number of memory accesses. We then address the two important tasks necessary for maximally exploiting the parallelism provided by the architecture, namely, partitioning and placement of graphic data, which affect respectively load balances and communication costs. Under the constraints of uniform partitioning, we develop approaches for optimal partitioning and placement, which significantly reduce search space. We also present heuristics for identifying near-optimal placement, since the search space for placement is impractically large despite our optimization. We then demonstrate the effectiveness of our partitioning and placement approaches via analysis of example scenes; simulation results show considerable search space reductions, and our heuristics for placement performs close to optimal – the average ratio of communication overheads between our heuristics and the optimal was 1.05. Our uniform partitioning showed average load-balance ratio of 1.47 for geometry processing and 1.44 for rasterization, which is reasonable.Keywords: Data Partitioning and Placement, Graphics, PIM, Search Space Reduction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14923 A Survey on Data-Centric and Data-Aware Techniques for Large Scale Infrastructures
Authors: Silvina Caíno-Lores, Jesús Carretero
Abstract:
Large scale computing infrastructures have been widely developed with the core objective of providing a suitable platform for high-performance and high-throughput computing. These systems are designed to support resource-intensive and complex applications, which can be found in many scientific and industrial areas. Currently, large scale data-intensive applications are hindered by the high latencies that result from the access to vastly distributed data. Recent works have suggested that improving data locality is key to move towards exascale infrastructures efficiently, as solutions to this problem aim to reduce the bandwidth consumed in data transfers, and the overheads that arise from them. There are several techniques that attempt to move computations closer to the data. In this survey we analyse the different mechanisms that have been proposed to provide data locality for large scale high-performance and high-throughput systems. This survey intends to assist scientific computing community in understanding the various technical aspects and strategies that have been reported in recent literature regarding data locality. As a result, we present an overview of locality-oriented techniques, which are grouped in four main categories: application development, task scheduling, in-memory computing and storage platforms. Finally, the authors include a discussion on future research lines and synergies among the former techniques.Keywords: Co-scheduling, data-centric, data-intensive, data locality, in-memory storage, large scale.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14912 GridNtru: High Performance PKCS
Authors: Narasimham Challa, Jayaram Pradhan
Abstract:
Cryptographic algorithms play a crucial role in the information society by providing protection from unauthorized access to sensitive data. It is clear that information technology will become increasingly pervasive, Hence we can expect the emergence of ubiquitous or pervasive computing, ambient intelligence. These new environments and applications will present new security challenges, and there is no doubt that cryptographic algorithms and protocols will form a part of the solution. The efficiency of a public key cryptosystem is mainly measured in computational overheads, key size and bandwidth. In particular the RSA algorithm is used in many applications for providing the security. Although the security of RSA is beyond doubt, the evolution in computing power has caused a growth in the necessary key length. The fact that most chips on smart cards can-t process key extending 1024 bit shows that there is need for alternative. NTRU is such an alternative and it is a collection of mathematical algorithm based on manipulating lists of very small integers and polynomials. This allows NTRU to high speeds with the use of minimal computing power. NTRU (Nth degree Truncated Polynomial Ring Unit) is the first secure public key cryptosystem not based on factorization or discrete logarithm problem. This means that given sufficient computational resources and time, an adversary, should not be able to break the key. The multi-party communication and requirement of optimal resource utilization necessitated the need for the present day demand of applications that need security enforcement technique .and can be enhanced with high-end computing. This has promoted us to develop high-performance NTRU schemes using approaches such as the use of high-end computing hardware. Peer-to-peer (P2P) or enterprise grids are proven as one of the approaches for developing high-end computing systems. By utilizing them one can improve the performance of NTRU through parallel execution. In this paper we propose and develop an application for NTRU using enterprise grid middleware called Alchemi. An analysis and comparison of its performance for various text files is presented.Keywords: Alchemi, GridNtru, Ntru, PKCS.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16901 Performance Analysis and Optimization for Diagonal Sparse Matrix-Vector Multiplication on Machine Learning Unit
Authors: Qiuyu Dai, Haochong Zhang, Xiangrong Liu
Abstract:
Efficient matrix-vector multiplication with diagonal sparse matrices is pivotal in a multitude of computational domains, ranging from scientific simulations to machine learning workloads. When encoded in the conventional Diagonal (DIA) format, these matrices often induce computational overheads due to extensive zero-padding and non-linear memory accesses, which can hamper the computational throughput, and elevate the usage of precious compute and memory resources beyond necessity. The ’DIA-Adaptive’ approach, a methodological enhancement introduced in this paper, confronts these challenges head-on by leveraging the advanced parallel instruction sets embedded within Machine Learning Units (MLUs). This research presents a thorough analysis of the DIA-Adaptive scheme’s efficacy in optimizing Sparse Matrix-Vector Multiplication (SpMV) operations. The scope of the evaluation extends to a variety of hardware architectures, examining the repercussions of distinct thread allocation strategies and cluster configurations across multiple storage formats. A dedicated computational kernel, intrinsic to the DIA-Adaptive approach, has been meticulously developed to synchronize with the nuanced performance characteristics of MLUs. Empirical results, derived from rigorous experimentation, reveal that the DIA-Adaptive methodology not only diminishes the performance bottlenecks associated with the DIA format but also exhibits pronounced enhancements in execution speed and resource utilization. The analysis delineates a marked improvement in parallelism, showcasing the DIA-Adaptive scheme’s ability to adeptly manage the interplay between storage formats, hardware capabilities, and algorithmic design. The findings suggest that this approach could set a precedent for accelerating SpMV tasks, thereby contributing significantly to the broader domain of high-performance computing and data-intensive applications.
Keywords: Adaptive method, DIA, diagonal sparse matrices, MLU, sparse matrix-vector multiplication.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 234