Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 23

Search results for: cache%20replacement

23 Formal Verification of Cache System Using a Novel Cache Memory Model

Authors: Guowei Hou, Lixin Yu, Wei Zhuang, Hui Qin, Xue Yang

Abstract:

Formal verification is proposed to ensure the correctness of the design and make functional verification more efficient. As cache plays a vital role in the design of System on Chip (SoC), and cache with Memory Management Unit (MMU) and cache memory unit makes the state space too large for simulation to verify, then a formal verification is presented for such system design. In the paper, a formal model checking verification flow is suggested and a new cache memory model which is called “exhaustive search model” is proposed. Instead of using large size ram to denote the whole cache memory, exhaustive search model employs just two cache blocks. For cache system contains data cache (Dcache) and instruction cache (Icache), Dcache memory model and Icache memory model are established separately using the same mechanism. At last, the novel model is employed to the verification of a cache which is module of a custom-built SoC system that has been applied in practical, and the result shows that the cache system is verified correctly using the exhaustive search model, and it makes the verification much more manageable and flexible.

Keywords: cache system, formal verification, novel model, system on chip (SoC)

Procedia PDF Downloads 468

22 Impact of Stack Caches: Locality Awareness and Cost Effectiveness

Authors: Abdulrahman K. Alshegaifi, Chun-Hsi Huang

Abstract:

Treating data based on its location in memory has received much attention in recent years due to its different properties, which offer important aspects for cache utilization. Stack data and non-stack data may interfere with each other’s locality in the data cache. One of the important aspects of stack data is that it has high spatial and temporal locality. In this work, we simulate non-unified cache design that split data cache into stack and non-stack caches in order to maintain stack data and non-stack data separate in different caches. We observe that the overall hit rate of non-unified cache design is sensitive to the size of non-stack cache. Then, we investigate the appropriate size and associativity for stack cache to achieve high hit ratio especially when over 99% of accesses are directed to stack cache. The result shows that on average more than 99% of stack cache accuracy is achieved by using 2KB of capacity and 1-way associativity. Further, we analyze the improvement in hit rate when adding small, fixed, size of stack cache at level1 to unified cache architecture. The result shows that the overall hit rate of unified cache design with adding 1KB of stack cache is improved by approximately, on average, 3.9% for Rijndael benchmark. The stack cache is simulated by using SimpleScalar toolset.

Keywords: hit rate, locality of program, stack cache, stack data

Procedia PDF Downloads 274

21 Evaluating the Impact of Replacement Policies on the Cache Performance and Energy Consumption in Different Multicore Embedded Systems

Authors: Sajjad Rostami-Sani, Mojtaba Valinataj, Amir-Hossein Khojir-Angasi

Abstract:

The cache has an important role in the reduction of access delay between a processor and memory in high-performance embedded systems. In these systems, the energy consumption is one of the most important concerns, and it will become more important with smaller processor feature sizes and higher frequencies. Meanwhile, the cache system dissipates a significant portion of energy compared to the other components of a processor. There are some elements that can affect the energy consumption of the cache such as replacement policy and degree of associativity. Due to these points, it can be inferred that selecting an appropriate configuration for the cache is a crucial part of designing a system. In this paper, we investigate the effect of different cache replacement policies on both cache’s performance and energy consumption. Furthermore, the impact of different Instruction Set Architectures (ISAs) on cache’s performance and energy consumption has been investigated.

Keywords: energy consumption, replacement policy, instruction set architecture, multicore processor

Procedia PDF Downloads 113

20 On Performance of Cache Replacement Schemes in NDN-IoT

Authors: Rasool Sadeghi, Sayed Mahdi Faghih Imani, Negar Najafi

Abstract:

The inherent features of Named Data Networking (NDN) provides a robust solution for Internet of Thing (IoT). Therefore, NDN-IoT has emerged as a combined architecture which exploits the benefits of NDN for interconnecting of the heterogeneous objects in IoT. In NDN-IoT, caching schemes are a key role to improve the network performance. In this paper, we consider the effectiveness of cache replacement schemes in NDN-IoT scenarios. We investigate the impact of replacement schemes on average delay, average hop count, and average interest retransmission when replacement schemes are Least Frequently Used (LFU), Least Recently Used (LRU), First-In-First-Out (FIFO) and Random. The simulation results demonstrate that LFU and LRU present a stable performance when the cache size changes. Moreover, the network performance improves when the number of consumers increases.

Keywords: NDN-IoT, cache replacement, performance, ndnSIM

Procedia PDF Downloads 324

19 Cache Analysis and Software Optimizations for Faster on-Chip Network Simulations

Authors: Khyamling Parane, B. M. Prabhu Prasad, Basavaraj Talawar

Abstract:

Fast simulations are critical in reducing time to market in CMPs and SoCs. Several simulators have been used to evaluate the performance and power consumed by Network-on-Chips. Researchers and designers rely upon these simulators for design space exploration of NoC architectures. Our experiments show that simulating large NoC topologies take hours to several days for completion. To speed up the simulations, it is necessary to investigate and optimize the hotspots in simulator source code. Among several simulators available, we choose Booksim2.0, as it is being extensively used in the NoC community. In this paper, we analyze the cache and memory system behaviour of Booksim2.0 to accurately monitor input dependent performance bottlenecks. Our measurements show that cache and memory usage patterns vary widely based on the input parameters given to Booksim2.0. Based on these measurements, the cache configuration having least misses has been identified. To further reduce the cache misses, we use software optimization techniques such as removal of unused functions, loop interchanging and replacing post-increment operator with pre-increment operator for non-primitive data types. The cache misses were reduced by 18.52%, 5.34% and 3.91% by employing above technology respectively. We also employ thread parallelization and vectorization to improve the overall performance of Booksim2.0. The OpenMP programming model and SIMD are used for parallelizing and vectorizing the more time-consuming portions of Booksim2.0. Speedups of 2.93x and 3.97x were observed for the Mesh topology with 30 × 30 network size by employing thread parallelization and vectorization respectively.

Keywords: cache behaviour, network-on-chip, performance profiling, vectorization

Procedia PDF Downloads 167

18 DCASH: Dynamic Cache Synchronization Algorithm for Heterogeneous Reverse Y Synchronizing Mobile Database Systems

Authors: Gunasekaran Raja, Kottilingam Kottursamy, Rajakumar Arul, Ramkumar Jayaraman, Krithika Sairam, Lakshmi Ravi

Abstract:

The synchronization server maintains a dynamically changing cache, which contains the data items which were requested and collected by the mobile node from the server. The order and presence of tuples in the cache changes dynamically according to the frequency of updates performed on the data, by the server and client. To synchronize, the data which has been modified by client and the server at an instant are collected, batched together by the type of modification (insert/ update/ delete), and sorted according to their update frequencies. This ensures that the DCASH (Dynamic Cache Synchronization Algorithm for Heterogeneous Reverse Y synchronizing Mobile Database Systems) gives priority to the frequently accessed data with high usage. The optimal memory management algorithm is proposed to manage data items according to their frequency, theorems were written to show the current mobile data activity is reverse Y in nature and the experiments were tested with 2g and 3g networks for various mobile devices to show the reduced response time and energy consumption.

Keywords: mobile databases, synchronization, cache, response time

Procedia PDF Downloads 357

17 A Survey on Countermeasures of Cache-Timing Attack on AES Systems

Authors: Settana M. Abdulh, Naila A. Sadalla, Yaseen H. Taha, Howaida Elshoush

Abstract:

Side channel attacks are based on side channel information, which is information that is leaked from encryption systems. This includes timing information, power consumption as well as electromagnetic or even sound leaking which can exploited by an attacker. Implementing side channel attacks are possible if and only if an attacker has access to a cryptosystem. In this case, the attacker can exploit bad implementation in software or hardware which is not controlled by encryption implementer. Thus, he/she will represent a real threat to the security system. Several countermeasures have been proposed to eliminate side channel information vulnerability.Cache timing attack is a special type of side channel attack. Here, timing information is collected and analyzed by an attacker to guess sensitive information such as encryption key or plaintext. This paper reviews the technique applied in this attack and surveys the countermeasures against it, evaluating the feasibility and usability of each. Based on this evaluation, finally we pose several recommendations about using these countermeasures.

Keywords: AES algorithm, side channel attack, cache timing attack, cache timing countermeasure

Procedia PDF Downloads 263

16 Trimma: Trimming Metadata Storage and Latency for Hybrid Memory Systems

Authors: Yiwei Li, Boyu Tian, Mingyu Gao

Abstract:

Hybrid main memory systems combine both performance and capacity advantages from heterogeneous memory technologies. With larger capacities, higher associativities, and finer granularities, hybrid memory systems currently exhibit significant metadata storage and lookup overheads for flexibly remapping data blocks between the two memory tiers. To alleviate the inefficiencies of existing designs, we propose Trimma, the combination of a multi-level metadata structure and an efficient metadata cache design. Trimma uses a multilevel metadata table to only track truly necessary address remap entries. The saved memory space is effectively utilized as extra DRAM cache capacity to improve performance. Trimma also uses separate formats to store the entries with non-identity and identity mappings. This improves the overall remap cache hit rate, further boosting the performance. Trimma is transparent to software and compatible with various types of hybrid memory systems. When evaluated on a representative DDR4 + NVM hybrid memory system, Trimma achieves up to 2.4× and on average 58.1% speedup benefits, compared with a state-of-the-art design that only leverages the unallocated fast memory space for caching. Trimma addresses metadata management overheads and targets future scalable large-scale hybrid memory architectures.

Keywords: memory system, data cache, hybrid memory, non-volatile memory

Procedia PDF Downloads 23

15 A Privacy Protection Scheme Supporting Fuzzy Search for NDN Routing Cache Data Name

Authors: Feng Tao, Ma Jing, Guo Xian, Wang Jing

Abstract:

Named Data Networking (NDN) replaces IP address of traditional network with data name, and adopts dynamic cache mechanism. In the existing mechanism, however, only one-to-one search can be achieved because every data has a unique name corresponding to it. There is a certain mapping relationship between data content and data name, so if the data name is intercepted by an adversary, the privacy of the data content and user’s interest can hardly be guaranteed. In order to solve this problem, this paper proposes a one-to-many fuzzy search scheme based on order-preserving encryption to reduce the query overhead by optimizing the caching strategy. In this scheme, we use hash value to ensure the user’s query safe from each node in the process of search, so does the privacy of the requiring data content.

Keywords: NDN, order-preserving encryption, fuzzy search, privacy

Procedia PDF Downloads 447

14 Hydrogen: Contention-Aware Hybrid Memory Management for Heterogeneous CPU-GPU Architectures

Authors: Yiwei Li, Mingyu Gao

Abstract:

Integrating hybrid memories with heterogeneous processors could leverage heterogeneity in both compute and memory domains for better system efficiency. To ensure performance isolation, we introduce Hydrogen, a hardware architecture to optimize the allocation of hybrid memory resources to heterogeneous CPU-GPU systems. Hydrogen supports efficient capacity and bandwidth partitioning between CPUs and GPUs in both memory tiers. We propose decoupled memory channel mapping and token-based data migration throttling to enable flexible partitioning. We also support epoch-based online search for optimized configurations and lightweight reconfiguration with reduced data movements. Hydrogen significantly outperforms existing designs by 1.21x on average and up to 1.31x.

Keywords: hybrid memory, heterogeneous systems, dram cache, graphics processing units

Procedia PDF Downloads 18

13 Low Power CNFET SRAM Design

Authors: Pejman Hosseiniun, Rose Shayeghi, Iman Rahbari, Mohamad Reza Kalhor

Abstract:

CNFET has emerged as an alternative material to silicon for high performance, high stability and low power SRAM design in recent years. SRAM functions as cache memory in computers and many portable devices. In this paper, a new SRAM cell design based on CNFET technology is proposed. The proposed SRAM cell design for CNFET is compared with SRAM cell designs implemented with the conventional CMOS and FinFET in terms of speed, power consumption, stability, and leakage current. The HSPICE simulation and analysis show that the dynamic power consumption of the proposed 8T CNFET SRAM cell’s is reduced about 48% and the SNM is widened up to 56% compared to the conventional CMOS SRAM structure at the expense of 2% leakage power and 3% write delay increase.

Keywords: SRAM cell, CNFET, low power, HSPICE

Procedia PDF Downloads 369

12 Speedup Breadth-First Search by Graph Ordering

Authors: Qiuyi Lyu, Bin Gong

Abstract:

Breadth-First Search(BFS) is a core graph algorithm that is widely used for graph analysis. As it is frequently used in many graph applications, improve the BFS performance is essential. In this paper, we present a graph ordering method that could reorder the graph nodes to achieve better data locality, thus, improving the BFS performance. Our method is based on an observation that the sibling relationships will dominate the cache access pattern during the BFS traversal. Therefore, we propose a frequency-based model to construct the graph order. First, we optimize the graph order according to the nodes’ visit frequency. Nodes with high visit frequency will be processed in priority. Second, we try to maximize the child nodes overlap layer by layer. As it is proved to be NP-hard, we propose a heuristic method that could greatly reduce the preprocessing overheads. We conduct extensive experiments on 16 real-world datasets. The result shows that our method could achieve comparable performance with the state-of-the-art methods while the graph ordering overheads are only about 1/15.

Keywords: breadth-first search, BFS, graph ordering, graph algorithm

Procedia PDF Downloads 101

11 Secure Network Coding against Content Pollution Attacks in Named Data Network

Authors: Tao Feng, Xiaomei Ma, Xian Guo, Jing Wang

Abstract:

Named Data Network (NDN) is one of the future Internet architecture, all nodes (i.e., hosts, routers) are allowed to have a local cache, used to satisfy incoming requests for content. However, depending on caching allows an adversary to perform attacks that are very effective and relatively easy to implement, such as content pollution attack. In this paper, we use a method of secure network coding based on homomorphic signature system to solve this problem. Firstly ,we use a dynamic public key technique, our scheme for each generation authentication without updating the initial secret key used. Secondly, employing the homomorphism of hash function, intermediate node and destination node verify the signature of the received message. In addition, when the network topology of NDN is simple and fixed, the code coefficients in our scheme are generated in a pseudorandom number generator in each node, so the distribution of the coefficients is also avoided. In short, our scheme not only can efficiently prevent against Intra/Inter-GPAs, but also can against the content poisoning attack in NDN.

Keywords: named data networking, content polloution attack, network coding signature, internet architecture

Procedia PDF Downloads 301

10 The Classical and Hellenistic Architectural Elements of the Temple of Echmun in Sidon

Authors: Amal Alatar

Abstract:

The paper focuses on the exploration of architectural characteristics and decorative elements of the temple of Echmun, emphasizing the socio-economic significance of Sidon during the Greek and Roman periods to understand the implications of their spread and development on the Phoenician cities, as well as reveal the symbolical and societal connotations that may have been connected with the buildings, in order to allow a well-founded examination of common characteristics. In general, studying Phoenician archaeology posed some problems. The main problem is that most major Phoenician settlements lay beneath modern urban centers. This situation often prevented or largely restricted full archaeological investigations; the publications are frequently not complete enough to determine the basic characteristics of the architectural elements. Another key problem is the political instability of the region, which affected the archaeological research in the Phoenician homeland for many years. Nevertheless, during the past decades, an ever-growing cache of data was acquired from the archaeological surroundings of the Phoenician sites. Both the architectural elements from the Greek and Roman period have never been studied as a group before. Surprisingly, they have been largely ignored, despite their apparent profusion throughout the cities. The Roman period of Sidon has generally been neglected in preference to earlier periods, where it is often difficult to distinguish between Roman, Bronze age, medieval and Ottoman structures.

Keywords: archaeology, classical, Hellenistic, Eshmun Temple, architecture, Sidon, Lebanon

Procedia PDF Downloads 60

9 Automatic Tuning for a Systemic Model of Banking Originated Losses (SYMBOL) Tool on Multicore

Authors: Ronal Muresano, Andrea Pagano

Abstract:

Nowadays, the mathematical/statistical applications are developed with more complexity and accuracy. However, these precisions and complexities have brought as result that applications need more computational power in order to be executed faster. In this sense, the multicore environments are playing an important role to improve and to optimize the execution time of these applications. These environments allow us the inclusion of more parallelism inside the node. However, to take advantage of this parallelism is not an easy task, because we have to deal with some problems such as: cores communications, data locality, memory sizes (cache and RAM), synchronizations, data dependencies on the model, etc. These issues are becoming more important when we wish to improve the application’s performance and scalability. Hence, this paper describes an optimization method developed for Systemic Model of Banking Originated Losses (SYMBOL) tool developed by the European Commission, which is based on analyzing the application's weakness in order to exploit the advantages of the multicore. All these improvements are done in an automatic and transparent manner with the aim of improving the performance metrics of our tool. Finally, experimental evaluations show the effectiveness of our new optimized version, in which we have achieved a considerable improvement on the execution time. The time has been reduced around 96% for the best case tested, between the original serial version and the automatic parallel version.

Keywords: algorithm optimization, bank failures, OpenMP, parallel techniques, statistical tool

Procedia PDF Downloads 340

8 Security Design of Root of Trust Based on RISC-V

Authors: Kang Huang, Wanting Zhou, Shiwei Yuan, Lei Li

Abstract:

Since information technology develops rapidly, the security issue has become an increasingly critical for computer system. In particular, as cloud computing and the Internet of Things (IoT) continue to gain widespread adoption, computer systems need to new security threats and attacks. The Root of Trust (RoT) is the foundation for providing basic trusted computing, which is used to verify the security and trustworthiness of other components. Design a reliable Root of Trust and guarantee its own security are essential for improving the overall security and credibility of computer systems. In this paper, we discuss the implementation of self-security technology based on the RISC-V Root of Trust at the hardware level. To effectively safeguard the security of the Root of Trust, researches on security safeguard technology on the Root of Trust have been studied. At first, a lightweight and secure boot framework is proposed as a secure mechanism. Secondly, two kinds of memory protection mechanism are built to against memory attacks. Moreover, hardware implementation of proposed method has been also investigated. A series of experiments and tests have been carried on to verify to effectiveness of the proposed method. The experimental results demonstrated that the proposed approach is effective in verifying the integrity of the Root of Trust’s own boot rom, user instructions, and data, ensuring authenticity and enabling the secure boot of the Root of Trust’s own system. Additionally, our approach provides memory protection against certain types of memory attacks, such as cache leaks and tampering, and ensures the security of root-of-trust sensitive information, including keys.

Keywords: root of trust, secure boot, memory protection, hardware security

Procedia PDF Downloads 140

7 Improve B-Tree Index’s Performance Using Lock-Free Hash Table

Authors: Zhanfeng Ma, Zhiping Xiong, Hu Yin, Zhengwei She, Aditya P. Gurajada, Tianlun Chen, Ying Li

Abstract:

Many RDBMS vendors use B-tree index to achieve high performance for point queries and range queries, and some of them also employ hash index to further enhance the performance as hash table is more efficient for point queries. However, there are extra overheads to maintain a separate hash index, for example, hash mapping for all data records must always be maintained, which results in more memory space consumption; locking, logging and other mechanisms are needed to guarantee ACID, which affects the concurrency and scalability of the system. To relieve the overheads, Hash Cached B-tree (HCB) index is proposed in this paper, which consists of a standard disk-based B-tree index and an additional in-memory lock-free hash table. Initially, only the B-tree index is constructed for all data records, the hash table is built on the fly based on runtime workload, only data records accessed by point queries are indexed using hash table, this helps reduce the memory footprint. Changes to hash table are done using compare-and-swap (CAS) without performing locking and logging, this helps improve the concurrency and avoid contention. The hash table is also optimized to be cache conscious. HCB index is implemented in SAP ASE database, compared with the standard B-tree index, early experiments and customer adoptions show significant performance improvement. This paper provides an overview of the design of HCB index and reports the experimental results.

Keywords: B-tree, compare-and-swap, lock-free hash table, point queries, range queries, SAP ASE database

Procedia PDF Downloads 255

6 Documentation Project on Decorated Wooden Coffins From Luxor, in the Cairo Museum

Authors: Hassan Mohmed, Mohamed Ismail, Aiman Rezk

Abstract:

Introduction: This project aims to document and preserve decorated wooden coffins which were discovered in Luxor by Egyptian mission at Luxor, (SR Numbers:2514,2519,2520,2521,5469).These decorated wooden coffins dates back to Egyptian New Kingdom period and has been transferred to the Cairo Museum, to be displayed at the museum. These decorated wooden coffins discovered in the cache-tomb of Bab el-gasus at Deir el-Bahari, Luxor. This site has been dictated for the burials of priests of Amun through 18th Dynasty the coffins owners held these titles, which are as follows: "the embalmer of the beautiful-house (the place of embalming)" and "the servant in the place of truth". Methodology: Methodology: The project objectives making such decorated wooden coffins more visible to visitors through the use of 3D reconstructed coffins and high resolution photos which describe the history of using the wooden coffins during the Ancient Egyptian history Especially, The Cairo Museum is going to exhibit decorated wooden coffins in New kingdom. The project goals is to document decorated wooden coffins and arrange an exhibition, where such decorated wooden coffins going to be displayed next to the Ramses 2nd coffin, This research focuses on the text analyses and the technology. Paleographic information found on these objects. Conclusion: The project shows the importance of using coffins in Ancient Egypt, and connecting their usage through Ancient Egyptian periods; the coffins had a unique Symbolized in ancient Egypt and connect the public with their kings. The Egyptian put coffins in their tombs that they hope to save their bodies’ afterlife. This research will be beneficial and useful for the heritage and ancient civilizations, Indeed this study will open a destination in order to know how to identify these collections and how to exhibit them commensurate with the natural of the ancient Egyptian history and heritage.

Keywords: archaeology, decorated wooden coffins, 3D digital tools for heritage management, museums

Procedia PDF Downloads 44

5 Evaluating Habitat Manipulation as a Strategy for Rodent Control in Agricultural Ecosystems of Pothwar Region, Pakistan

Authors: Nadeem Munawar, Tariq Mahmood

Abstract:

Habitat manipulation is an important technique that can be used for controlling rodent damage in agricultural ecosystems. It involves intentionally manipulation of vegetation cover in adjacent habitats around the active burrows of rodents to reduce shelter, food availability and to increase predation pressure. The current study was conducted in the Pothwar Plateau during the respective non-crop period of wheat-groundnut (post-harvested and un-ploughed/non-crop fallow lands) with the aim to assess the impact of the reduction in vegetation height of adjacent habitats (field borders) on rodent’s richness and abundance. The study area was divided into two sites viz. treated and non-treated. At the treated sites, habitat manipulation was carried out by removing crop cache, and non-crop vegetation’s over 10 cm in height to a distance of approximately 20 m from the fields. The trapping sessions carried out at both treated and non-treated sites adjacent to wheat-groundnut fields were significantly different (F 2, 6 = 13.2, P = 0.001) from each other, which revealed that a maximum number of rodents were captured from non-treated sites. There was a significant difference in the overall abundance of rodents (P < 0.05) between crop stages and between treatments in both crops. The manipulation effect was significantly observed on damage to crops, and yield production resulted in the reduction of damage within the associated croplands (P < 0.05). The outcomes of this study indicated a significant reduction of rodent population at treated sites due to changes in vegetation height and cover which affect important components, i.e., food, shelter, movements and increased risk sensitivity in their feeding behavior; therefore, they were unable to reach levels where they cause significant crop damage. This method is recommended for being a cost-effective and easy application.

Keywords: agricultural ecosystems, crop damage, habitat manipulation, rodents, trapping

Procedia PDF Downloads 129

4 Enhanced Disk-Based Databases towards Improved Hybrid in-Memory Systems

Authors: Samuel Kaspi, Sitalakshmi Venkatraman

Abstract:

In-memory database systems are becoming popular due to the availability and affordability of sufficiently large RAM and processors in modern high-end servers with the capacity to manage large in-memory database transactions. While fast and reliable in-memory systems are still being developed to overcome cache misses, CPU/IO bottlenecks and distributed transaction costs, disk-based data stores still serve as the primary persistence. In addition, with the recent growth in multi-tenancy cloud applications and associated security concerns, many organisations consider the trade-offs and continue to require fast and reliable transaction processing of disk-based database systems as an available choice. For these organizations, the only way of increasing throughput is by improving the performance of disk-based concurrency control. This warrants a hybrid database system with the ability to selectively apply an enhanced disk-based data management within the context of in-memory systems that would help improve overall throughput. The general view is that in-memory systems substantially outperform disk-based systems. We question this assumption and examine how a modified variation of access invariance that we call enhanced memory access, (EMA) can be used to allow very high levels of concurrency in the pre-fetching of data in disk-based systems. We demonstrate how this prefetching in disk-based systems can yield close to in-memory performance, which paves the way for improved hybrid database systems. This paper proposes a novel EMA technique and presents a comparative study between disk-based EMA systems and in-memory systems running on hardware configurations of equivalent power in terms of the number of processors and their speeds. The results of the experiments conducted clearly substantiate that when used in conjunction with all concurrency control mechanisms, EMA can increase the throughput of disk-based systems to levels quite close to those achieved by in-memory system. The promising results of this work show that enhanced disk-based systems facilitate in improving hybrid data management within the broader context of in-memory systems.

Keywords: in-memory database, disk-based system, hybrid database, concurrency control

Procedia PDF Downloads 383

3 The Ideal Memory Substitute for Computer Memory Hierarchy

Authors: Kayode A. Olaniyi, Olabanji F. Omotoye, Adeola A. Ogunleye

Abstract:

Computer system components such as the CPU, the Controllers, and the operating system, work together as a team, and storage or memory is the essential parts of this team apart from the processor. The memory and storage system including processor caches, main memory, and storage, form basic storage component of a computer system. The characteristics of the different types of storage are inherent in the design and the technology employed in the manufacturing. These memory characteristics define the speed, compatibility, cost, volatility, and density of the various storage types. Most computers rely on a hierarchy of storage devices for performance. The effective and efficient use of the memory hierarchy of the computer system therefore is the single most important aspect of computer system design and use. The memory hierarchy is becoming a fundamental performance and energy bottleneck, due to the widening gap between the increasing demands of modern computer applications and the limited performance and energy efficiency provided by traditional memory technologies. With the dramatic development in the computers systems, computer storage has had a difficult time keeping up with the processor speed. Computer architects are therefore facing constant challenges in developing high-speed computer storage with high-performance which is energy-efficient, cost-effective and reliable, to intercept processor requests. It is very clear that substantial advancements in redesigning the existing memory physical and logical structures to meet up with the latest processor potential is crucial. This research work investigates the importance of computer memory (storage) hierarchy in the design of computer systems. The constituent storage types of the hierarchy today were investigated looking at the design technologies and how the technologies affect memory characteristics: speed, density, stability and cost. The investigation considered how these characteristics could best be harnessed for overall efficiency of the computer system. The research revealed that the best single type of storage, which we refer to as ideal memory is that logical single physical memory which would combine the best attributes of each memory type that make up the memory hierarchy. It is a single memory with access speed as high as one found in CPU registers, combined with the highest storage capacity, offering excellent stability in the presence or absence of power as found in the magnetic and optical disks as against volatile DRAM, and yet offers a cost-effective attribute that is far away from the expensive SRAM. The research work suggests that to overcome these barriers it may then mean that memory manufacturing will take a total deviation from the present technologies and adopt one that overcomes the associated challenges with the traditional memory technologies.

Keywords: cache, memory-hierarchy, memory, registers, storage

Procedia PDF Downloads 131

2 Pareto Optimal Material Allocation Mechanism

Authors: Peter Egri, Tamas Kis

Abstract:

Scheduling problems have been studied by the algorithmic mechanism design research from the beginning. This paper is focusing on a practically important, but theoretically rather neglected field: the project scheduling problem where the jobs connected by precedence constraints compete for various nonrenewable resources, such as materials. Although the centralized problem can be solved in polynomial-time by applying the algorithm of Carlier and Rinnooy Kan from the Eighties, obtaining materials in a decentralized environment is usually far from optimal. It can be observed in practical production scheduling situations that project managers tend to cache the required materials as soon as possible in order to avoid later delays due to material shortages. This greedy practice usually leads both to excess stocks for some projects and materials, and simultaneously, to shortages for others. The aim of this study is to develop a model for the material allocation problem of a production plant, where a central decision maker—the inventory—should assign the resources arriving at different points in time to the jobs. Since the actual due dates are not known by the inventory, the mechanism design approach is applied with the projects as the self-interested agents. The goal of the mechanism is to elicit the required information and allocate the available materials such that it minimizes the maximal tardiness among the projects. It is assumed that except the due dates, the inventory is familiar with every other parameters of the problem. A further requirement is that due to practical considerations monetary transfer is not allowed. Therefore a mechanism without money is sought which excludes some widely applied solutions such as the Vickrey–Clarke–Groves scheme. In this work, a type of Serial Dictatorship Mechanism (SDM) is presented for the studied problem, including a polynomial-time algorithm for computing the material allocation. The resulted mechanism is both truthful and Pareto optimal. Thus the randomization over the possible priority orderings of the projects results in a universally truthful and Pareto optimal randomized mechanism. However, it is shown that in contrast to problems like the many-to-many matching market, not every Pareto optimal solution can be generated with an SDM. In addition, no performance guarantee can be given compared to the optimal solution, therefore this approximation characteristic is investigated with experimental study. All in all, the current work studies a practically relevant scheduling problem and presents a novel truthful material allocation mechanism which eliminates the potential benefit of the greedy behavior that negatively influences the outcome. The resulted allocation is also shown to be Pareto optimal, which is the most widely used criteria describing a necessary condition for a reasonable solution.

Keywords: material allocation, mechanism without money, polynomial-time mechanism, project scheduling

Procedia PDF Downloads 296

1 Embedded Semantic Segmentation Network Optimized for Matrix Multiplication Accelerator

Authors: Jaeyoung Lee

Abstract:

Autonomous driving systems require high reliability to provide people with a safe and comfortable driving experience. However, despite the development of a number of vehicle sensors, it is difficult to always provide high perceived performance in driving environments that vary from time to season. The image segmentation method using deep learning, which has recently evolved rapidly, provides high recognition performance in various road environments stably. However, since the system controls a vehicle in real time, a highly complex deep learning network cannot be used due to time and memory constraints. Moreover, efficient networks are optimized for GPU environments, which degrade performance in embedded processor environments equipped simple hardware accelerators. In this paper, a semantic segmentation network, matrix multiplication accelerator network (MMANet), optimized for matrix multiplication accelerator (MMA) on Texas instrument digital signal processors (TI DSP) is proposed to improve the recognition performance of autonomous driving system. The proposed method is designed to maximize the number of layers that can be performed in a limited time to provide reliable driving environment information in real time. First, the number of channels in the activation map is fixed to fit the structure of MMA. By increasing the number of parallel branches, the lack of information caused by fixing the number of channels is resolved. Second, an efficient convolution is selected depending on the size of the activation. Since MMA is a fixed, it may be more efficient for normal convolution than depthwise separable convolution depending on memory access overhead. Thus, a convolution type is decided according to output stride to increase network depth. In addition, memory access time is minimized by processing operations only in L3 cache. Lastly, reliable contexts are extracted using the extended atrous spatial pyramid pooling (ASPP). The suggested method gets stable features from an extended path by increasing the kernel size and accessing consecutive data. In addition, it consists of two ASPPs to obtain high quality contexts using the restored shape without global average pooling paths since the layer uses MMA as a simple adder. To verify the proposed method, an experiment is conducted using perfsim, a timing simulator, and the Cityscapes validation sets. The proposed network can process an image with 640 x 480 resolution for 6.67 ms, so six cameras can be used to identify the surroundings of the vehicle as 20 frame per second (FPS). In addition, it achieves 73.1% mean intersection over union (mIoU) which is the highest recognition rate among embedded networks on the Cityscapes validation set.

Keywords: edge network, embedded network, MMA, matrix multiplication accelerator, semantic segmentation network

Procedia PDF Downloads 95