Search results for: data cache
25120 Impact of Stack Caches: Locality Awareness and Cost Effectiveness
Authors: Abdulrahman K. Alshegaifi, Chun-Hsi Huang
Abstract:
Treating data based on its location in memory has received much attention in recent years due to its different properties, which offer important aspects for cache utilization. Stack data and non-stack data may interfere with each other’s locality in the data cache. One of the important aspects of stack data is that it has high spatial and temporal locality. In this work, we simulate non-unified cache design that split data cache into stack and non-stack caches in order to maintain stack data and non-stack data separate in different caches. We observe that the overall hit rate of non-unified cache design is sensitive to the size of non-stack cache. Then, we investigate the appropriate size and associativity for stack cache to achieve high hit ratio especially when over 99% of accesses are directed to stack cache. The result shows that on average more than 99% of stack cache accuracy is achieved by using 2KB of capacity and 1-way associativity. Further, we analyze the improvement in hit rate when adding small, fixed, size of stack cache at level1 to unified cache architecture. The result shows that the overall hit rate of unified cache design with adding 1KB of stack cache is improved by approximately, on average, 3.9% for Rijndael benchmark. The stack cache is simulated by using SimpleScalar toolset.Keywords: hit rate, locality of program, stack cache, stack data
Procedia PDF Downloads 30325119 Formal Verification of Cache System Using a Novel Cache Memory Model
Authors: Guowei Hou, Lixin Yu, Wei Zhuang, Hui Qin, Xue Yang
Abstract:
Formal verification is proposed to ensure the correctness of the design and make functional verification more efficient. As cache plays a vital role in the design of System on Chip (SoC), and cache with Memory Management Unit (MMU) and cache memory unit makes the state space too large for simulation to verify, then a formal verification is presented for such system design. In the paper, a formal model checking verification flow is suggested and a new cache memory model which is called “exhaustive search model” is proposed. Instead of using large size ram to denote the whole cache memory, exhaustive search model employs just two cache blocks. For cache system contains data cache (Dcache) and instruction cache (Icache), Dcache memory model and Icache memory model are established separately using the same mechanism. At last, the novel model is employed to the verification of a cache which is module of a custom-built SoC system that has been applied in practical, and the result shows that the cache system is verified correctly using the exhaustive search model, and it makes the verification much more manageable and flexible.Keywords: cache system, formal verification, novel model, system on chip (SoC)
Procedia PDF Downloads 49625118 Evaluating the Impact of Replacement Policies on the Cache Performance and Energy Consumption in Different Multicore Embedded Systems
Authors: Sajjad Rostami-Sani, Mojtaba Valinataj, Amir-Hossein Khojir-Angasi
Abstract:
The cache has an important role in the reduction of access delay between a processor and memory in high-performance embedded systems. In these systems, the energy consumption is one of the most important concerns, and it will become more important with smaller processor feature sizes and higher frequencies. Meanwhile, the cache system dissipates a significant portion of energy compared to the other components of a processor. There are some elements that can affect the energy consumption of the cache such as replacement policy and degree of associativity. Due to these points, it can be inferred that selecting an appropriate configuration for the cache is a crucial part of designing a system. In this paper, we investigate the effect of different cache replacement policies on both cache’s performance and energy consumption. Furthermore, the impact of different Instruction Set Architectures (ISAs) on cache’s performance and energy consumption has been investigated.Keywords: energy consumption, replacement policy, instruction set architecture, multicore processor
Procedia PDF Downloads 15425117 On Performance of Cache Replacement Schemes in NDN-IoT
Authors: Rasool Sadeghi, Sayed Mahdi Faghih Imani, Negar Najafi
Abstract:
The inherent features of Named Data Networking (NDN) provides a robust solution for Internet of Thing (IoT). Therefore, NDN-IoT has emerged as a combined architecture which exploits the benefits of NDN for interconnecting of the heterogeneous objects in IoT. In NDN-IoT, caching schemes are a key role to improve the network performance. In this paper, we consider the effectiveness of cache replacement schemes in NDN-IoT scenarios. We investigate the impact of replacement schemes on average delay, average hop count, and average interest retransmission when replacement schemes are Least Frequently Used (LFU), Least Recently Used (LRU), First-In-First-Out (FIFO) and Random. The simulation results demonstrate that LFU and LRU present a stable performance when the cache size changes. Moreover, the network performance improves when the number of consumers increases.Keywords: NDN-IoT, cache replacement, performance, ndnSIM
Procedia PDF Downloads 36525116 DCASH: Dynamic Cache Synchronization Algorithm for Heterogeneous Reverse Y Synchronizing Mobile Database Systems
Authors: Gunasekaran Raja, Kottilingam Kottursamy, Rajakumar Arul, Ramkumar Jayaraman, Krithika Sairam, Lakshmi Ravi
Abstract:
The synchronization server maintains a dynamically changing cache, which contains the data items which were requested and collected by the mobile node from the server. The order and presence of tuples in the cache changes dynamically according to the frequency of updates performed on the data, by the server and client. To synchronize, the data which has been modified by client and the server at an instant are collected, batched together by the type of modification (insert/ update/ delete), and sorted according to their update frequencies. This ensures that the DCASH (Dynamic Cache Synchronization Algorithm for Heterogeneous Reverse Y synchronizing Mobile Database Systems) gives priority to the frequently accessed data with high usage. The optimal memory management algorithm is proposed to manage data items according to their frequency, theorems were written to show the current mobile data activity is reverse Y in nature and the experiments were tested with 2g and 3g networks for various mobile devices to show the reduced response time and energy consumption.Keywords: mobile databases, synchronization, cache, response time
Procedia PDF Downloads 40525115 Cache Analysis and Software Optimizations for Faster on-Chip Network Simulations
Authors: Khyamling Parane, B. M. Prabhu Prasad, Basavaraj Talawar
Abstract:
Fast simulations are critical in reducing time to market in CMPs and SoCs. Several simulators have been used to evaluate the performance and power consumed by Network-on-Chips. Researchers and designers rely upon these simulators for design space exploration of NoC architectures. Our experiments show that simulating large NoC topologies take hours to several days for completion. To speed up the simulations, it is necessary to investigate and optimize the hotspots in simulator source code. Among several simulators available, we choose Booksim2.0, as it is being extensively used in the NoC community. In this paper, we analyze the cache and memory system behaviour of Booksim2.0 to accurately monitor input dependent performance bottlenecks. Our measurements show that cache and memory usage patterns vary widely based on the input parameters given to Booksim2.0. Based on these measurements, the cache configuration having least misses has been identified. To further reduce the cache misses, we use software optimization techniques such as removal of unused functions, loop interchanging and replacing post-increment operator with pre-increment operator for non-primitive data types. The cache misses were reduced by 18.52%, 5.34% and 3.91% by employing above technology respectively. We also employ thread parallelization and vectorization to improve the overall performance of Booksim2.0. The OpenMP programming model and SIMD are used for parallelizing and vectorizing the more time-consuming portions of Booksim2.0. Speedups of 2.93x and 3.97x were observed for the Mesh topology with 30 × 30 network size by employing thread parallelization and vectorization respectively.Keywords: cache behaviour, network-on-chip, performance profiling, vectorization
Procedia PDF Downloads 19725114 A Privacy Protection Scheme Supporting Fuzzy Search for NDN Routing Cache Data Name
Authors: Feng Tao, Ma Jing, Guo Xian, Wang Jing
Abstract:
Named Data Networking (NDN) replaces IP address of traditional network with data name, and adopts dynamic cache mechanism. In the existing mechanism, however, only one-to-one search can be achieved because every data has a unique name corresponding to it. There is a certain mapping relationship between data content and data name, so if the data name is intercepted by an adversary, the privacy of the data content and user’s interest can hardly be guaranteed. In order to solve this problem, this paper proposes a one-to-many fuzzy search scheme based on order-preserving encryption to reduce the query overhead by optimizing the caching strategy. In this scheme, we use hash value to ensure the user’s query safe from each node in the process of search, so does the privacy of the requiring data content.Keywords: NDN, order-preserving encryption, fuzzy search, privacy
Procedia PDF Downloads 48425113 Trimma: Trimming Metadata Storage and Latency for Hybrid Memory Systems
Authors: Yiwei Li, Boyu Tian, Mingyu Gao
Abstract:
Hybrid main memory systems combine both performance and capacity advantages from heterogeneous memory technologies. With larger capacities, higher associativities, and finer granularities, hybrid memory systems currently exhibit significant metadata storage and lookup overheads for flexibly remapping data blocks between the two memory tiers. To alleviate the inefficiencies of existing designs, we propose Trimma, the combination of a multi-level metadata structure and an efficient metadata cache design. Trimma uses a multilevel metadata table to only track truly necessary address remap entries. The saved memory space is effectively utilized as extra DRAM cache capacity to improve performance. Trimma also uses separate formats to store the entries with non-identity and identity mappings. This improves the overall remap cache hit rate, further boosting the performance. Trimma is transparent to software and compatible with various types of hybrid memory systems. When evaluated on a representative DDR4 + NVM hybrid memory system, Trimma achieves up to 2.4× and on average 58.1% speedup benefits, compared with a state-of-the-art design that only leverages the unallocated fast memory space for caching. Trimma addresses metadata management overheads and targets future scalable large-scale hybrid memory architectures.Keywords: memory system, data cache, hybrid memory, non-volatile memory
Procedia PDF Downloads 7825112 A Survey on Countermeasures of Cache-Timing Attack on AES Systems
Authors: Settana M. Abdulh, Naila A. Sadalla, Yaseen H. Taha, Howaida Elshoush
Abstract:
Side channel attacks are based on side channel information, which is information that is leaked from encryption systems. This includes timing information, power consumption as well as electromagnetic or even sound leaking which can exploited by an attacker. Implementing side channel attacks are possible if and only if an attacker has access to a cryptosystem. In this case, the attacker can exploit bad implementation in software or hardware which is not controlled by encryption implementer. Thus, he/she will represent a real threat to the security system. Several countermeasures have been proposed to eliminate side channel information vulnerability.Cache timing attack is a special type of side channel attack. Here, timing information is collected and analyzed by an attacker to guess sensitive information such as encryption key or plaintext. This paper reviews the technique applied in this attack and surveys the countermeasures against it, evaluating the feasibility and usability of each. Based on this evaluation, finally we pose several recommendations about using these countermeasures.Keywords: AES algorithm, side channel attack, cache timing attack, cache timing countermeasure
Procedia PDF Downloads 29925111 Hydrogen: Contention-Aware Hybrid Memory Management for Heterogeneous CPU-GPU Architectures
Authors: Yiwei Li, Mingyu Gao
Abstract:
Integrating hybrid memories with heterogeneous processors could leverage heterogeneity in both compute and memory domains for better system efficiency. To ensure performance isolation, we introduce Hydrogen, a hardware architecture to optimize the allocation of hybrid memory resources to heterogeneous CPU-GPU systems. Hydrogen supports efficient capacity and bandwidth partitioning between CPUs and GPUs in both memory tiers. We propose decoupled memory channel mapping and token-based data migration throttling to enable flexible partitioning. We also support epoch-based online search for optimized configurations and lightweight reconfiguration with reduced data movements. Hydrogen significantly outperforms existing designs by 1.21x on average and up to 1.31x.Keywords: hybrid memory, heterogeneous systems, dram cache, graphics processing units
Procedia PDF Downloads 9625110 Secure Network Coding against Content Pollution Attacks in Named Data Network
Authors: Tao Feng, Xiaomei Ma, Xian Guo, Jing Wang
Abstract:
Named Data Network (NDN) is one of the future Internet architecture, all nodes (i.e., hosts, routers) are allowed to have a local cache, used to satisfy incoming requests for content. However, depending on caching allows an adversary to perform attacks that are very effective and relatively easy to implement, such as content pollution attack. In this paper, we use a method of secure network coding based on homomorphic signature system to solve this problem. Firstly ,we use a dynamic public key technique, our scheme for each generation authentication without updating the initial secret key used. Secondly, employing the homomorphism of hash function, intermediate node and destination node verify the signature of the received message. In addition, when the network topology of NDN is simple and fixed, the code coefficients in our scheme are generated in a pseudorandom number generator in each node, so the distribution of the coefficients is also avoided. In short, our scheme not only can efficiently prevent against Intra/Inter-GPAs, but also can against the content poisoning attack in NDN.Keywords: named data networking, content polloution attack, network coding signature, internet architecture
Procedia PDF Downloads 33725109 Speedup Breadth-First Search by Graph Ordering
Abstract:
Breadth-First Search(BFS) is a core graph algorithm that is widely used for graph analysis. As it is frequently used in many graph applications, improve the BFS performance is essential. In this paper, we present a graph ordering method that could reorder the graph nodes to achieve better data locality, thus, improving the BFS performance. Our method is based on an observation that the sibling relationships will dominate the cache access pattern during the BFS traversal. Therefore, we propose a frequency-based model to construct the graph order. First, we optimize the graph order according to the nodes’ visit frequency. Nodes with high visit frequency will be processed in priority. Second, we try to maximize the child nodes overlap layer by layer. As it is proved to be NP-hard, we propose a heuristic method that could greatly reduce the preprocessing overheads. We conduct extensive experiments on 16 real-world datasets. The result shows that our method could achieve comparable performance with the state-of-the-art methods while the graph ordering overheads are only about 1/15.Keywords: breadth-first search, BFS, graph ordering, graph algorithm
Procedia PDF Downloads 13825108 Improve B-Tree Index’s Performance Using Lock-Free Hash Table
Authors: Zhanfeng Ma, Zhiping Xiong, Hu Yin, Zhengwei She, Aditya P. Gurajada, Tianlun Chen, Ying Li
Abstract:
Many RDBMS vendors use B-tree index to achieve high performance for point queries and range queries, and some of them also employ hash index to further enhance the performance as hash table is more efficient for point queries. However, there are extra overheads to maintain a separate hash index, for example, hash mapping for all data records must always be maintained, which results in more memory space consumption; locking, logging and other mechanisms are needed to guarantee ACID, which affects the concurrency and scalability of the system. To relieve the overheads, Hash Cached B-tree (HCB) index is proposed in this paper, which consists of a standard disk-based B-tree index and an additional in-memory lock-free hash table. Initially, only the B-tree index is constructed for all data records, the hash table is built on the fly based on runtime workload, only data records accessed by point queries are indexed using hash table, this helps reduce the memory footprint. Changes to hash table are done using compare-and-swap (CAS) without performing locking and logging, this helps improve the concurrency and avoid contention. The hash table is also optimized to be cache conscious. HCB index is implemented in SAP ASE database, compared with the standard B-tree index, early experiments and customer adoptions show significant performance improvement. This paper provides an overview of the design of HCB index and reports the experimental results.Keywords: B-tree, compare-and-swap, lock-free hash table, point queries, range queries, SAP ASE database
Procedia PDF Downloads 28625107 Automatic Tuning for a Systemic Model of Banking Originated Losses (SYMBOL) Tool on Multicore
Authors: Ronal Muresano, Andrea Pagano
Abstract:
Nowadays, the mathematical/statistical applications are developed with more complexity and accuracy. However, these precisions and complexities have brought as result that applications need more computational power in order to be executed faster. In this sense, the multicore environments are playing an important role to improve and to optimize the execution time of these applications. These environments allow us the inclusion of more parallelism inside the node. However, to take advantage of this parallelism is not an easy task, because we have to deal with some problems such as: cores communications, data locality, memory sizes (cache and RAM), synchronizations, data dependencies on the model, etc. These issues are becoming more important when we wish to improve the application’s performance and scalability. Hence, this paper describes an optimization method developed for Systemic Model of Banking Originated Losses (SYMBOL) tool developed by the European Commission, which is based on analyzing the application's weakness in order to exploit the advantages of the multicore. All these improvements are done in an automatic and transparent manner with the aim of improving the performance metrics of our tool. Finally, experimental evaluations show the effectiveness of our new optimized version, in which we have achieved a considerable improvement on the execution time. The time has been reduced around 96% for the best case tested, between the original serial version and the automatic parallel version.Keywords: algorithm optimization, bank failures, OpenMP, parallel techniques, statistical tool
Procedia PDF Downloads 36925106 Enhanced Disk-Based Databases towards Improved Hybrid in-Memory Systems
Authors: Samuel Kaspi, Sitalakshmi Venkatraman
Abstract:
In-memory database systems are becoming popular due to the availability and affordability of sufficiently large RAM and processors in modern high-end servers with the capacity to manage large in-memory database transactions. While fast and reliable in-memory systems are still being developed to overcome cache misses, CPU/IO bottlenecks and distributed transaction costs, disk-based data stores still serve as the primary persistence. In addition, with the recent growth in multi-tenancy cloud applications and associated security concerns, many organisations consider the trade-offs and continue to require fast and reliable transaction processing of disk-based database systems as an available choice. For these organizations, the only way of increasing throughput is by improving the performance of disk-based concurrency control. This warrants a hybrid database system with the ability to selectively apply an enhanced disk-based data management within the context of in-memory systems that would help improve overall throughput. The general view is that in-memory systems substantially outperform disk-based systems. We question this assumption and examine how a modified variation of access invariance that we call enhanced memory access, (EMA) can be used to allow very high levels of concurrency in the pre-fetching of data in disk-based systems. We demonstrate how this prefetching in disk-based systems can yield close to in-memory performance, which paves the way for improved hybrid database systems. This paper proposes a novel EMA technique and presents a comparative study between disk-based EMA systems and in-memory systems running on hardware configurations of equivalent power in terms of the number of processors and their speeds. The results of the experiments conducted clearly substantiate that when used in conjunction with all concurrency control mechanisms, EMA can increase the throughput of disk-based systems to levels quite close to those achieved by in-memory system. The promising results of this work show that enhanced disk-based systems facilitate in improving hybrid data management within the broader context of in-memory systems.Keywords: in-memory database, disk-based system, hybrid database, concurrency control
Procedia PDF Downloads 41725105 The Classical and Hellenistic Architectural Elements of the Temple of Echmun in Sidon
Authors: Amal Alatar
Abstract:
The paper focuses on the exploration of architectural characteristics and decorative elements of the temple of Echmun, emphasizing the socio-economic significance of Sidon during the Greek and Roman periods to understand the implications of their spread and development on the Phoenician cities, as well as reveal the symbolical and societal connotations that may have been connected with the buildings, in order to allow a well-founded examination of common characteristics. In general, studying Phoenician archaeology posed some problems. The main problem is that most major Phoenician settlements lay beneath modern urban centers. This situation often prevented or largely restricted full archaeological investigations; the publications are frequently not complete enough to determine the basic characteristics of the architectural elements. Another key problem is the political instability of the region, which affected the archaeological research in the Phoenician homeland for many years. Nevertheless, during the past decades, an ever-growing cache of data was acquired from the archaeological surroundings of the Phoenician sites. Both the architectural elements from the Greek and Roman period have never been studied as a group before. Surprisingly, they have been largely ignored, despite their apparent profusion throughout the cities. The Roman period of Sidon has generally been neglected in preference to earlier periods, where it is often difficult to distinguish between Roman, Bronze age, medieval and Ottoman structures.Keywords: archaeology, classical, Hellenistic, Eshmun Temple, architecture, Sidon, Lebanon
Procedia PDF Downloads 10125104 Low Power CNFET SRAM Design
Authors: Pejman Hosseiniun, Rose Shayeghi, Iman Rahbari, Mohamad Reza Kalhor
Abstract:
CNFET has emerged as an alternative material to silicon for high performance, high stability and low power SRAM design in recent years. SRAM functions as cache memory in computers and many portable devices. In this paper, a new SRAM cell design based on CNFET technology is proposed. The proposed SRAM cell design for CNFET is compared with SRAM cell designs implemented with the conventional CMOS and FinFET in terms of speed, power consumption, stability, and leakage current. The HSPICE simulation and analysis show that the dynamic power consumption of the proposed 8T CNFET SRAM cell’s is reduced about 48% and the SNM is widened up to 56% compared to the conventional CMOS SRAM structure at the expense of 2% leakage power and 3% write delay increase.Keywords: SRAM cell, CNFET, low power, HSPICE
Procedia PDF Downloads 41425103 Security Design of Root of Trust Based on RISC-V
Authors: Kang Huang, Wanting Zhou, Shiwei Yuan, Lei Li
Abstract:
Since information technology develops rapidly, the security issue has become an increasingly critical for computer system. In particular, as cloud computing and the Internet of Things (IoT) continue to gain widespread adoption, computer systems need to new security threats and attacks. The Root of Trust (RoT) is the foundation for providing basic trusted computing, which is used to verify the security and trustworthiness of other components. Design a reliable Root of Trust and guarantee its own security are essential for improving the overall security and credibility of computer systems. In this paper, we discuss the implementation of self-security technology based on the RISC-V Root of Trust at the hardware level. To effectively safeguard the security of the Root of Trust, researches on security safeguard technology on the Root of Trust have been studied. At first, a lightweight and secure boot framework is proposed as a secure mechanism. Secondly, two kinds of memory protection mechanism are built to against memory attacks. Moreover, hardware implementation of proposed method has been also investigated. A series of experiments and tests have been carried on to verify to effectiveness of the proposed method. The experimental results demonstrated that the proposed approach is effective in verifying the integrity of the Root of Trust’s own boot rom, user instructions, and data, ensuring authenticity and enabling the secure boot of the Root of Trust’s own system. Additionally, our approach provides memory protection against certain types of memory attacks, such as cache leaks and tampering, and ensures the security of root-of-trust sensitive information, including keys.Keywords: root of trust, secure boot, memory protection, hardware security
Procedia PDF Downloads 21525102 Data Transformations in Data Envelopment Analysis
Authors: Mansour Mohammadpour
Abstract:
Data transformation refers to the modification of any point in a data set by a mathematical function. When applying transformations, the measurement scale of the data is modified. Data transformations are commonly employed to turn data into the appropriate form, which can serve various functions in the quantitative analysis of the data. This study addresses the investigation of the use of data transformations in Data Envelopment Analysis (DEA). Although data transformations are important options for analysis, they do fundamentally alter the nature of the variable, making the interpretation of the results somewhat more complex.Keywords: data transformation, data envelopment analysis, undesirable data, negative data
Procedia PDF Downloads 2025101 Reactive and Concurrency-Based Image Resource Management Module for iOS Applications
Authors: Shubham V. Kamdi
Abstract:
This paper aims to serve as an introduction to image resource caching techniques for iOS mobile applications. It will explain how developers can break down multiple image-downloading tasks concurrently using state-of-the-art iOS frameworks, namely Swift Concurrency and Combine. The paper will explain how developers can leverage SwiftUI to develop reactive view components and use declarative coding patterns. Developers will learn to bypass built-in image caching systems by curating the procedure to implement a swift-based LRU cache system. The paper will provide a full architectural overview of a system, helping readers understand how mobile applications are designed professionally. It will cover technical discussion, helping readers understand the low-level details of threads and how they can switch between them, as well as the significance of the main and background threads for requesting HTTP services via mobile applications.Keywords: main thread, background thread, reactive view components, declarative coding
Procedia PDF Downloads 2525100 Processing Big Data: An Approach Using Feature Selection
Authors: Nikat Parveen, M. Ananthi
Abstract:
Big data is one of the emerging technology, which collects the data from various sensors and those data will be used in many fields. Data retrieval is one of the major issue where there is a need to extract the exact data as per the need. In this paper, large amount of data set is processed by using the feature selection. Feature selection helps to choose the data which are actually needed to process and execute the task. The key value is the one which helps to point out exact data available in the storage space. Here the available data is streamed and R-Center is proposed to achieve this task.Keywords: big data, key value, feature selection, retrieval, performance
Procedia PDF Downloads 34125099 Applications of Big Data in Education
Authors: Faisal Kalota
Abstract:
Big Data and analytics have gained a huge momentum in recent years. Big Data feeds into the field of Learning Analytics (LA) that may allow academic institutions to better understand the learners’ needs and proactively address them. Hence, it is important to have an understanding of Big Data and its applications. The purpose of this descriptive paper is to provide an overview of Big Data, the technologies used in Big Data, and some of the applications of Big Data in education. Additionally, it discusses some of the concerns related to Big Data and current research trends. While Big Data can provide big benefits, it is important that institutions understand their own needs, infrastructure, resources, and limitation before jumping on the Big Data bandwagon.Keywords: big data, learning analytics, analytics, big data in education, Hadoop
Procedia PDF Downloads 42525098 Analysis of Big Data
Authors: Sandeep Sharma, Sarabjit Singh
Abstract:
As per the user demand and growth trends of large free data the storage solutions are now becoming more challenge-able to protect, store and to retrieve data. The days are not so far when the storage companies and organizations are start saying 'no' to store our valuable data or they will start charging a huge amount for its storage and protection. On the other hand as per the environmental conditions it becomes challenge-able to maintain and establish new data warehouses and data centers to protect global warming threats. A challenge of small data is over now, the challenges are big that how to manage the exponential growth of data. In this paper we have analyzed the growth trend of big data and its future implications. We have also focused on the impact of the unstructured data on various concerns and we have also suggested some possible remedies to streamline big data.Keywords: big data, unstructured data, volume, variety, velocity
Procedia PDF Downloads 54825097 Research of Data Cleaning Methods Based on Dependency Rules
Authors: Yang Bao, Shi Wei Deng, WangQun Lin
Abstract:
This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSQL), and gives 6 data cleaning methods based on these algorithms.Keywords: data cleaning, dependency rules, violation data discovery, data repair
Procedia PDF Downloads 56425096 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity
Authors: Hoda A. Abdel Hafez
Abstract:
Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.Keywords: mining big data, big data, machine learning, telecommunication
Procedia PDF Downloads 40925095 JavaScript Object Notation Data against eXtensible Markup Language Data in Software Applications a Software Testing Approach
Authors: Theertha Chandroth
Abstract:
This paper presents a comparative study on how to check JSON (JavaScript Object Notation) data against XML (eXtensible Markup Language) data from a software testing point of view. JSON and XML are widely used data interchange formats, each with its unique syntax and structure. The objective is to explore various techniques and methodologies for validating comparison and integration between JSON data to XML and vice versa. By understanding the process of checking JSON data against XML data, testers, developers and data practitioners can ensure accurate data representation, seamless data interchange, and effective data validation.Keywords: XML, JSON, data comparison, integration testing, Python, SQL
Procedia PDF Downloads 14025094 Using Machine Learning Techniques to Extract Useful Information from Dark Data
Authors: Nigar Hussain
Abstract:
It is a subset of big data. Dark data means those data in which we fail to use for future decisions. There are many issues in existing work, but some need powerful tools for utilizing dark data. It needs sufficient techniques to deal with dark data. That enables users to exploit their excellence, adaptability, speed, less time utilization, execution, and accessibility. Another issue is the way to utilize dark data to extract helpful information to settle on better choices. In this paper, we proposed upgrade strategies to remove the dark side from dark data. Using a supervised model and machine learning techniques, we utilized dark data and achieved an F1 score of 89.48%.Keywords: big data, dark data, machine learning, heatmap, random forest
Procedia PDF Downloads 2825093 Multi-Source Data Fusion for Urban Comprehensive Management
Authors: Bolin Hua
Abstract:
In city governance, various data are involved, including city component data, demographic data, housing data and all kinds of business data. These data reflects different aspects of people, events and activities. Data generated from various systems are different in form and data source are different because they may come from different sectors. In order to reflect one or several facets of an event or rule, data from multiple sources need fusion together. Data from different sources using different ways of collection raised several issues which need to be resolved. Problem of data fusion include data update and synchronization, data exchange and sharing, file parsing and entry, duplicate data and its comparison, resource catalogue construction. Governments adopt statistical analysis, time series analysis, extrapolation, monitoring analysis, value mining, scenario prediction in order to achieve pattern discovery, law verification, root cause analysis and public opinion monitoring. The result of Multi-source data fusion is to form a uniform central database, which includes people data, location data, object data, and institution data, business data and space data. We need to use meta data to be referred to and read when application needs to access, manipulate and display the data. A uniform meta data management ensures effectiveness and consistency of data in the process of data exchange, data modeling, data cleansing, data loading, data storing, data analysis, data search and data delivery.Keywords: multi-source data fusion, urban comprehensive management, information fusion, government data
Procedia PDF Downloads 39325092 Reviewing Privacy Preserving Distributed Data Mining
Authors: Sajjad Baghernezhad, Saeideh Baghernezhad
Abstract:
Nowadays considering human involved in increasing data development some methods such as data mining to extract science are unavoidable. One of the discussions of data mining is inherent distribution of the data usually the bases creating or receiving such data belong to corporate or non-corporate persons and do not give their information freely to others. Yet there is no guarantee to enable someone to mine special data without entering in the owner’s privacy. Sending data and then gathering them by each vertical or horizontal software depends on the type of their preserving type and also executed to improve data privacy. In this study it was attempted to compare comprehensively preserving data methods; also general methods such as random data, coding and strong and weak points of each one are examined.Keywords: data mining, distributed data mining, privacy protection, privacy preserving
Procedia PDF Downloads 52525091 The Right to Data Portability and Its Influence on the Development of Digital Services
Authors: Roman Bieda
Abstract:
The General Data Protection Regulation (GDPR) will come into force on 25 May 2018 which will create a new legal framework for the protection of personal data in the European Union. Article 20 of GDPR introduces a right to data portability. This right allows for data subjects to receive the personal data which they have provided to a data controller, in a structured, commonly used and machine-readable format, and to transmit this data to another data controller. The right to data portability, by facilitating transferring personal data between IT environments (e.g.: applications), will also facilitate changing the provider of services (e.g. changing a bank or a cloud computing service provider). Therefore, it will contribute to the development of competition and the digital market. The aim of this paper is to discuss the right to data portability and its influence on the development of new digital services.Keywords: data portability, digital market, GDPR, personal data
Procedia PDF Downloads 473