Search results for: file structure.

2732 Analysis of Long-Term File System Activities on Cluster Systems

Authors: Hyeyoung Cho, Sungho Kim, Sik Lee

Abstract:

I/O workload is a critical and important factor to analyze I/O pattern and to maximize file system performance. However to measure I/O workload on running distributed parallel file system is non-trivial due to collection overhead and large volume of data. In this paper, we measured and analyzed file system activities on two large-scale cluster systems which had TFlops level high performance computation resources. By comparing file system activities of 2009 with those of 2006, we analyzed the change of I/O workloads by the development of system performance and high-speed network technology.

Keywords: I/O workload, Lustre, GPFS, Cluster File System

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1431

2731 A File Splitting Technique for Reducing the Entropy of Text Files

Authors: Abdel-Rahman M. Jaradat, , Mansour I. Irshid, Talha T. Nassar

Abstract:

A novel file splitting technique for the reduction of the nth-order entropy of text files is proposed. The technique is based on mapping the original text file into a non-ASCII binary file using a new codeword assignment method and then the resulting binary file is split into several subfiles each contains one or more bits from each codeword of the mapped binary file. The statistical properties of the subfiles are studied and it is found that they reflect the statistical properties of the original text file which is not the case when the ASCII code is used as a mapper. The nth-order entropy of these subfiles are determined and it is found that the sum of their entropies is less than that of the original text file for the same values of extensions. These interesting statistical properties of the resulting subfiles can be used to achieve better compression ratios when conventional compression techniques are applied to these subfiles individually and on a bit-wise basis rather than on character-wise basis.

Keywords: Bit-wise compression, entropy, file splitting, source mapping.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1412

2730 An Automatic Bayesian Classification System for File Format Selection

Authors: Roman Graf, Sergiu Gordea, Heather M. Ryan

Abstract:

This paper presents an approach for the classification of an unstructured format description for identification of file formats. The main contribution of this work is the employment of data mining techniques to support file format selection with just the unstructured text description that comprises the most important format features for a particular organisation. Subsequently, the file format indentification method employs file format classifier and associated configurations to support digital preservation experts with an estimation of required file format. Our goal is to make use of a format specification knowledge base aggregated from a different Web sources in order to select file format for a particular institution. Using the naive Bayes method, the decision support system recommends to an expert, the file format for his institution. The proposed methods facilitate the selection of file format and the quality of a digital preservation process. The presented approach is meant to facilitate decision making for the preservation of digital content in libraries and archives using domain expert knowledge and specifications of file formats. To facilitate decision-making, the aggregated information about the file formats is presented as a file format vocabulary that comprises most common terms that are characteristic for all researched formats. The goal is to suggest a particular file format based on this vocabulary for analysis by an expert. The sample file format calculation and the calculation results including probabilities are presented in the evaluation section.

Keywords: Data mining, digital libraries, digital preservation, file format.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1633

2729 Performance Trade-Off of File System between Overwriting and Dynamic Relocation on a Solid State Drive

Authors: Choulseung Hyun, Hunki Kwon, Jaeho Kim, Eujoon Byun, Jongmoo Choi, Donghee Lee, Sam H. Noh

Abstract:

Most file systems overwrite modified file data and metadata in their original locations, while the Log-structured File System (LFS) dynamically relocates them to other locations. We design and implement the Evergreen file system that can select between overwriting or relocation for each block of a file or metadata. Therefore, the Evergreen file system can achieve superior write performance by sequentializing write requests (similar to LFS-style relocation) when space utilization is low and overwriting when utilization is high. Another challenging issue is identifying performance benefits of LFS-style relocation over overwriting on a newly introduced SSD (Solid State Drive) which has only Flash-memory chips and control circuits without mechanical parts. Our experimental results measured on a SSD show that relocation outperforms overwriting when space utilization is below 80% and vice versa.

Keywords: Evergreen File System, Overwrite, Relocation, Solid State Drive.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1457

2728 Application of LSB Based Steganographic Technique for 8-bit Color Images

Authors: Mamta Juneja, Parvinder S. Sandhu, Ekta Walia

Abstract:

Steganography is the process of hiding one file inside another such that others can neither identify the meaning of the embedded object, nor even recognize its existence. Current trends favor using digital image files as the cover file to hide another digital file that contains the secret message or information. One of the most common methods of implementation is Least Significant Bit Insertion, in which the least significant bit of every byte is altered to form the bit-string representing the embedded file. Altering the LSB will only cause minor changes in color, and thus is usually not noticeable to the human eye. While this technique works well for 24-bit color image files, steganography has not been as successful when using an 8-bit color image file, due to limitations in color variations and the use of a colormap. This paper presents the results of research investigating the combination of image compression and steganography. The technique developed starts with a 24-bit color bitmap file, then compresses the file by organizing and optimizing an 8-bit colormap. After the process of compression, a text message is hidden in the final, compressed image. Results indicate that the final technique has potential of being useful in the steganographic world.

Keywords: Compression, Colormap, Encryption, Steganographyand LSB Insertion.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2978

2727 Information Retrieval: A Comparative Study of Textual Indexing Using an Oriented Object Database (db4o) and the Inverted File

Authors: Mohammed Erritali

Abstract:

The growth in the volume of text data such as books and articles in libraries for centuries has imposed to establish effective mechanisms to locate them. Early techniques such as abstraction, indexing and the use of classification categories have marked the birth of a new field of research called "Information Retrieval". Information Retrieval (IR) can be defined as the task of defining models and systems whose purpose is to facilitate access to a set of documents in electronic form (corpus) to allow a user to find the relevant ones for him, that is to say, the contents which matches with the information needs of the user. Most of the models of information retrieval use a specific data structure to index a corpus which is called "inverted file" or "reverse index". This inverted file collects information on all terms over the corpus documents specifying the identifiers of documents that contain the term in question, the frequency of each term in the documents of the corpus, the positions of the occurrences of the word... In this paper we use an oriented object database (db4o) instead of the inverted file, that is to say, instead to search a term in the inverted file, we will search it in the db4o database. The purpose of this work is to make a comparative study to see if the oriented object databases may be competing for the inverse index in terms of access speed and resource consumption using a large volume of data.

Keywords: Information Retrieval, indexation, oriented object database (db4o), inverted file.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1711

2726 A Single-Phase Register File with Complementary Pass-Transistor Adiabatic Logic

Authors: Jianping Hu, Xiaolei Sheng

Abstract:

This paper introduces an adiabatic register file based on two-phase CPAL (Complementary Pass-Transistor Adiabatic Logic circuits) with power-gating scheme, which can operate on a single-phase power clock. A 32×32 single-phase adiabatic register file with power-gating scheme has been implemented with TSMC 0.18μm CMOS technology. All the circuits except for the storage cells employ two-phase CPAL circuits, and the storage cell is based on the conventional memory one. The two-phase non-overlap power-clock generator with power-gating scheme is used to supply the proposed adiabatic register file. Full-custom layouts are drawn. The energy and functional simulations have been performed using the net-list extracted from their layouts. Compared with the traditional static CMOS register file, HSPICE simulations show that the proposed adiabatic register file can work very well, and it attains about 73% energy savings at 100 MHz.

Keywords: Low power, Register file, Complementarypass-transistor logic, Adiabatic logic, Single-phase power clock.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1934

2725 Developing an Online Library for Faster Retrieval of Mold Base and Standard Parts of Injection Molding

Authors: Alan C. Lin, Ricky N. Joevan

Abstract:

This paper focuses on developing a system to transfer mold base plates and standard parts faster during the stage of injection mold design. This system not only provides a way to compare the file version, but also it utilizes Siemens NX 10 to isolate the updated information into a single executable file (.dll), and then, the file can be transferred without the need of transferring the whole file. By this way, the system can help the user to download only necessary mold base plates and standard parts, and those parts downloaded are only the updated portions.

Keywords: CAD, injection molding, mold base, data retrieval.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1145

2724 Implementation of a Reed-Solomon Code as an ECC in Yet Another Flash File System

Authors: Sungjoon Sim, Soongyu Kwon, Dongjae Song, Jong Tae Kim

Abstract:

Flash memory has become an important storage device in many embedded systems because of its high performance, low power consumption and shock resistance. Multi-level cell (MLC) is developed as an effective solution for reducing the cost and increasing the storage density in recent years. However, most of flash file system cannot handle the error correction sufficiently. To correct more errors for MLC, we implement Reed-Solomon (RS) code to YAFFS, what is widely used for flash-based file system. RS code has longer computing time but the correcting ability is much higher than that of Hamming code.

Keywords: Reed-Solomon, NAND flash memory, YAFFS, ErrorCorrecting Code, Flash File System

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2473

2723 High Securing Cover-File of Hidden Data Using Statistical Technique and AES Encryption Algorithm

Authors: A. A. Zaidan, Anas Majeed, B. B. Zaidan

Abstract:

Nowadays, the rapid development of multimedia and internet allows for wide distribution of digital media data. It becomes much easier to edit, modify and duplicate digital information Besides that, digital documents are also easy to copy and distribute, therefore it will be faced by many threatens. It-s a big security and privacy issue with the large flood of information and the development of the digital format, it become necessary to find appropriate protection because of the significance, accuracy and sensitivity of the information. Nowadays protection system classified with more specific as hiding information, encryption information, and combination between hiding and encryption to increase information security, the strength of the information hiding science is due to the non-existence of standard algorithms to be used in hiding secret messages. Also there is randomness in hiding methods such as combining several media (covers) with different methods to pass a secret message. In addition, there are no formal methods to be followed to discover the hidden data. For this reason, the task of this research becomes difficult. In this paper, a new system of information hiding is presented. The proposed system aim to hidden information (data file) in any execution file (EXE) and to detect the hidden file and we will see implementation of steganography system which embeds information in an execution file. (EXE) files have been investigated. The system tries to find a solution to the size of the cover file and making it undetectable by anti-virus software. The system includes two main functions; first is the hiding of the information in a Portable Executable File (EXE), through the execution of four process (specify the cover file, specify the information file, encryption of the information, and hiding the information) and the second function is the extraction of the hiding information through three process (specify the steno file, extract the information, and decryption of the information). The system has achieved the main goals, such as make the relation of the size of the cover file and the size of information independent and the result file does not make any conflict with anti-virus software.

Keywords: Cryptography, Steganography, Portable ExecutableFile.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1774

2722 File Format of Flow Chart Simulation Software - CFlow

Authors: Syahanim Mohd Salleh, Zaihosnita Hood, Hairulliza Mohd Judi, Marini Abu Bakar

Abstract:

CFlow is a flow chart software, it contains facilities to draw and evaluate a flow chart. A flow chart evaluation applies a simulation method to enable presentation of work flow in a flow chart solution. Flow chart simulation of CFlow is executed by manipulating the CFlow data file which is saved in a graphical vector format. These text-based data are organised by using a data classification technic based on a Library classification-scheme. This paper describes the file format for flow chart simulation software of CFlow.

Keywords: CFlow, flow chart, file format.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2519

2721 Facilitating a Cyber-Enabled Fraud Using the O.MG Cable to Incriminate the Victim

Authors: Damola O. Lawal, David W. Gresty, Diane E. Gan, Louise Hewitt

Abstract:

This paper investigates the feasibility of using a programmable USB such as the O.MG Cable to perform a file tampering attack. Here, the O.MG Cable, an apparently harmless mobile device charger is used in an unauthorised way, to alter the content of a file (an accounts record-January_Contributions.xlsx). The aim is to determine if a forensics analyst can reliably determine who has altered the target file; the O.MG Cable or the user of the machine. This work highlights some of the traces of the O.MG Cable left behind on the target computer itself such as the Product ID (PID) and Vendor ID (ID). Also discussed is the O.MG Cable’s behaviour during the experiments. We determine if a forensics analyst could identify if any evidence has been left behind by the programmable device on the target file once it has been removed from the computer to establish if the analyst would be able to link the traces left by the O.MG Cable to the file tampering. It was discovered that the forensic analyst might mistake the actions of the O.MG Cable for the computer users. Experiments carried out in this work could further the discussion as to whether an innocent user could be punished for the unauthorised changes made by a programmable device.

Keywords: O.MG Cable, programmable USB, file tampering attack, digital evidence credibility, miscarriage of justice, cyber fraud.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 590

2720 Stego Machine – Video Steganography using Modified LSB Algorithm

Authors: Mritha Ramalingam

Abstract:

Computer technology and the Internet have made a breakthrough in the existence of data communication. This has opened a whole new way of implementing steganography to ensure secure data transfer. Steganography is the fine art of hiding the information. Hiding the message in the carrier file enables the deniability of the existence of any message at all. This paper designs a stego machine to develop a steganographic application to hide data containing text in a computer video file and to retrieve the hidden information. This can be designed by embedding text file in a video file in such away that the video does not loose its functionality using Least Significant Bit (LSB) modification method. This method applies imperceptible modifications. This proposed method strives for high security to an eavesdropper-s inability to detect hidden information.

Keywords: Data hiding, LSB, Stego machine, VideoSteganography

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4225

2719 Management Software for the Elaboration of an Electronic File in the Pharmaceutical Industry Following Mexican Regulations

Authors: M. Peña Aguilar Juan, Ríos Hernández Ezequiel, R. Valencia Luis

Abstract:

For certification, certain goods of public interest, such as medicines and food, it is required the preparation and delivery of a dossier. For its elaboration, legal and administrative knowledge must be taken, as well as organization of the documents of the process, and an order that allows the file verification. Therefore, a virtual platform was developed to support the process of management and elaboration of the dossier, providing accessibility to the information and interfaces that allow the user to know the status of projects. The development of dossier system on the cloud allows the inclusion of the technical requirements for the software management, including the validation and the manufacturing in the field industry. The platform guides and facilitates the dossier elaboration (report, file or history), considering Mexican legislation and regulations, it also has auxiliary tools for its management. This technological alternative provides organization support for documents and accessibility to the information required to specify the successful development of a dossier. The platform divides into the following modules: System control, catalog, dossier and enterprise management. The modules are designed per the structure required in a dossier in those areas. However, the structure allows for flexibility, as its goal is to become a tool that facilitates and does not obstruct processes. The architecture and development of the software allows flexibility for future work expansion to other fields, this would imply feeding the system with new regulations.

Keywords: Electronic dossier, technologies for management, web software, dossier elaboration, pharmaceutical industry.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1179

2718 Developing NAND Flash-Memory SSD-Based File System Design

Authors: Jaechun No

Abstract:

This paper focuses on I/O optimizations of N-hybrid (New-Form of hybrid), which provides a hybrid file system space constructed on SSD and HDD. Although the promising potentials of SSD, such as the absence of mechanical moving overhead and high random I/O throughput, have drawn a lot of attentions from IT enterprises, its high ratio of cost/capacity makes it less desirable to build a large-scale data storage subsystem composed of only SSDs. In this paper, we present N-hybrid that attempts to integrate the strengths of SSD and HDD, to offer a single, large hybrid file system space. Several experiments were conducted to verify the performance of N-hybrid.

Keywords: SSD, data section, I/O optimizations.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1779

2717 Dynamic Decompression for Text Files

Authors: Ananth Kamath, Ankit Kant, Aravind Srivatsa, Harisha J.A

Abstract:

Compression algorithms reduce the redundancy in data representation to decrease the storage required for that data. Lossless compression researchers have developed highly sophisticated approaches, such as Huffman encoding, arithmetic encoding, the Lempel-Ziv (LZ) family, Dynamic Markov Compression (DMC), Prediction by Partial Matching (PPM), and Burrows-Wheeler Transform (BWT) based algorithms. Decompression is also required to retrieve the original data by lossless means. A compression scheme for text files coupled with the principle of dynamic decompression, which decompresses only the section of the compressed text file required by the user instead of decompressing the entire text file. Dynamic decompressed files offer better disk space utilization due to higher compression ratios compared to most of the currently available text file formats.

Keywords: Compression, Dynamic Decompression, Text file format, Portable Document Format, Compression Ratio.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1734

2716 Enhance Security in XML Databases: XLog File for Severity-Aware Trust-Based Access Control

Authors: Asmawi A., Affendey L. S., Udzir N. I., Mahmod R.

Abstract:

The topic of enhancing security in XML databases is important as it includes protecting sensitive data and providing a secure environment to users. In order to improve security and provide dynamic access control for XML databases, we presented XLog file to calculate user trust values by recording users’ bad transaction, errors and query severities. Severity-aware trust-based access control for XML databases manages the access policy depending on users' trust values and prevents unauthorized processes, malicious transactions and insider threats. Privileges are automatically modified and adjusted over time depending on user behaviour and query severity. Logging in database is an important process and is used for recovery and security purposes. In this paper, the Xlog file is presented as a dynamic and temporary log file for XML databases to enhance the level of security.

Keywords: XML database, trust-based access control, severity-aware, trust values, log file.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1829

2715 Design and Implementation of Shared Memory based Parallel File System Logging Method for High Performance Computing

Authors: Hyeyoung Cho, Sungho Kim, SangDong Lee

Abstract:

I/O workload is a critical and important factor to analyze I/O pattern and file system performance. However tracing I/O operations on the fly distributed parallel file system is non-trivial due to collection overhead and a large volume of data. In this paper, we design and implement a parallel file system logging method for high performance computing using shared memory-based multi-layer scheme. It minimizes the overhead with reduced logging operation response time and provides efficient post-processing scheme through shared memory. Separated logging server can collect sequential logs from multiple clients in a cluster through packet communication. Implementation and evaluation result shows low overhead and high scalability of this architecture for high performance parallel logging analysis.

Keywords: I/O workload, PVFS, I/O Trace.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1536

2714 Mounting Time Reduction using Content-Based Block Management for NAND Flash File System

Authors: Won-Hee Cho, GeunHyung Lee, Deok-Hwan Kim

Abstract:

The flash memory has many advantages such as low power consumption, strong shock resistance, fast I/O and non-volatility. And it is increasingly used in the mobile storage device. The YAFFS, one of the NAND flash file system, is widely used in the embedded device. However, the existing YAFFS takes long time to mount the file system because it scans whole spare areas in all pages of NAND flash memory. In order to solve this problem, we propose a new content-based flash file system using a mounting time reduction technique. The proposed method only scans partial spare areas of some special pages by using content-based block management. The experimental results show that the proposed method reduces the average mounting time by 87.2% comparing with JFFS2 and 69.9% comparing with YAFFS.

Keywords: NAND Flash Memory, Mounting Time, YAFFS, JFFS2, Content-based Block management

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1655

2713 PP-FSM: Peer to Peer File Share for Multimedia

Authors: Arsalan Ali Shah, Zafar I. Malik, Shaukat Ali

Abstract:

Peer-to-Peer (P2P) is a self-organizing resource sharing network with no centralized authority or infrastructure, which makes it unpredictable and vulnerable. In this paper, we propose architecture to make the peer-to-peer network more centralized, predictable, and safer to use by implementing trust and stopping free riding.

Keywords: File Share, Free Riding, Peer-to-Peer, Trust.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2016

2712 Design and Implementation a Virtualization Platform for Providing Smart Tourism Services

Authors: Nam Don Kim, Jungho Moon, Tae Yun Chung

Abstract:

This paper proposes an Internet of Things (IoT) based virtualization platform for providing smart tourism services. The virtualization platform provides a consistent access interface to various types of data by naming IoT devices and legacy information systems as pathnames in a virtual file system. In the other words, the IoT virtualization platform functions as a middleware which uses the metadata for underlying collected data. The proposed platform makes it easy to provide customized tourism information by using tourist locations collected by IoT devices and additionally enables to create new interactive smart tourism services focused on the tourist locations. The proposed platform is very efficient so that the provided tourism services are isolated from changes in raw data and the services can be modified or expanded without changing the underlying data structure.

Keywords: Internet of Things, IoT platform, service platform, virtual file system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1074

2711 Searching for Forensic Evidence in a Compromised Virtual Web Server against SQL Injection Attacks and PHP Web Shell

Authors: Gigih Supriyatno

Abstract:

SQL injection is one of the most common types of attacks and has a very critical impact on web servers. In the worst case, an attacker can perform post-exploitation after a successful SQL injection attack. In the case of forensics web servers, web server analysis is closely related to log file analysis. But sometimes large file sizes and different log types make it difficult for investigators to look for traces of attackers on the server. The purpose of this paper is to help investigator take appropriate steps to investigate when the web server gets attacked. We use attack scenarios using SQL injection attacks including PHP backdoor injection as post-exploitation. We perform post-mortem analysis of web server logs based on Hypertext Transfer Protocol (HTTP) POST and HTTP GET method approaches that are characteristic of SQL injection attacks. In addition, we also propose structured analysis method between the web server application log file, database application, and other additional logs that exist on the webserver. This method makes the investigator more structured to analyze the log file so as to produce evidence of attack with acceptable time. There is also the possibility that other attack techniques can be detected with this method. On the other side, it can help web administrators to prepare their systems for the forensic readiness.

Keywords: Web forensic, SQL injection, web shell, investigation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1222

2710 Discovering User Behaviour Patterns from Web Log Analysis to Enhance the Accessibility and Usability of Website

Authors: Harpreet Singh

Abstract:

Finding relevant information on the World Wide Web is becoming highly challenging day by day. Web usage mining is used for the extraction of relevant and useful knowledge, such as user behaviour patterns, from web access log records. Web access log records all the requests for individual files that the users have requested from the website. Web usage mining is important for Customer Relationship Management (CRM), as it can ensure customer satisfaction as far as the interaction between the customer and the organization is concerned. Web usage mining is helpful in improving website structure or design as per the user’s requirement by analyzing the access log file of a website through a log analyzer tool. The focus of this paper is to enhance the accessibility and usability of a guitar selling web site by analyzing their access log through Deep Log Analyzer tool. The results show that the maximum number of users is from the United States and that they use Opera 9.8 web browser and the Windows XP operating system.

Keywords: Web usage mining, log file, web mining, data mining, deep log analyser.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1036

2709 Data Extraction of XML Files using Searching and Indexing Techniques

Authors: Sushma Satpute, Vaishali Katkar, Nilesh Sahare

Abstract:

XML files contain data which is in well formatted manner. By studying the format or semantics of the grammar it will be helpful for fast retrieval of the data. There are many algorithms which describes about searching the data from XML files. There are no. of approaches which uses data structure or are related to the contents of the document. In these cases user must know about the structure of the document and information retrieval techniques using NLPs is related to content of the document. Hence the result may be irrelevant or not so successful and may take more time to search.. This paper presents fast XML retrieval techniques by using new indexing technique and the concept of RXML. When indexing an XML document, the system takes into account both the document content and the document structure and assigns the value to each tag from file. To query the system, a user is not constrained about fixed format of query.

Keywords: XML Retrieval, Indexed Search, Information Retrieval.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1758

2708 Optimizing Hadoop Block Placement Policy and Cluster Blocks Distribution

Authors: Nchimbi Edward Pius, Liu Qin, Fion Yang, Zhu Hong Ming

Abstract:

The current Hadoop block placement policy do not fairly and evenly distributes replicas of blocks written to datanodes in a Hadoop cluster.

This paper presents a new solution that helps to keep the cluster in a balanced state while an HDFS client is writing data to a file in Hadoop cluster. The solution had been implemented, and test had been conducted to evaluate its contribution to Hadoop distributed file system.

It has been found that, the solution has lowered global execution time taken by Hadoop balancer to 22 percent. It also has been found that, Hadoop balancer respectively over replicate 1.75 and 3.3 percent of all re-distributed blocks in the modified and original Hadoop clusters.

The feature that keeps the cluster in a balanced state works as a core part to Hadoop system and not just as a utility like traditional balancer. This is one of the significant achievements and uniqueness of the solution developed during the course of this research work.

Keywords: Balancer, Datanode, Distributed file system, Hadoop, Replicas.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4917

2707 Comparative Survey of Object Serialization Techniques and the Programming Supports

Authors: Kazuaki Maeda

Abstract:

This paper compares six approaches of object serialization from qualitative and quantitative aspects. Those are object serialization in Java, IDL, XStream, Protocol Buffers, Apache Avro, and MessagePack. Using each approach, a common example is serialized to a file and the size of the file is measured. The qualitative comparison works are investigated in the way of checking whether schema definition is required or not, whether schema compiler is required or not, whether serialization is based on ascii or binary, and which programming languages are supported. It is clear that there is no best solution. Each solution makes good in the context it was developed.

Keywords: structured data, serialization, programming

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2472

2706 String Searching in Dispersed Files using MDS Convolutional Codes

Authors: A. S. Poornima, R. Aparna, B. B. Amberker, Prashant Koulgi

Abstract:

In this paper, we propose use of convolutional codes for file dispersal. The proposed method is comparable in complexity to the information Dispersal Algorithm proposed by M.Rabin and for particular choices of (non-binary) convolutional codes, is almost as efficient as that algorithm in terms of controlling expansion in the total storage. Further, our proposed dispersal method allows string search.

Keywords: Convolutional codes, File dispersal, Filereconstruction, Information Dispersal Algorithm, String search.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1254

2705 A P2P File Sharing Technique by Indexed-Priority Metric

Authors: Toshinori Takabatake, Yoshikazu Komano

Abstract:

Recently, the improvements in processing performance of a computer and in high speed communication of an optical fiber have been achieved, so that the amount of data which are processed by a computer and flowed on a network has been increasing greatly. However, in a client-server system, since the server receives and processes the amount of data from the clients through the network, a load on the server is increasing. Thus, there are needed to introduce a server with high processing ability and to have a line with high bandwidth. In this paper, concerning to P2P networks to resolve the load on a specific server, a criterion called an Indexed-Priority Metric is proposed and its performance is evaluated. The proposed metric is to allocate some files to each node. As a result, the load on a specific server can distribute them to each node equally well. A P2P file sharing system using the proposed metric is implemented. Simulation results show that the proposed metric can make it distribute files on the specific server.

Keywords: peer-to-peer, file-sharing system, load-balancing, dependability

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1361

2704 An Architecture for High Performance File SystemI/O

Authors: Mikulas Patocka

Abstract:

This paper presents an architecture of current filesystem implementations as well as our new filesystem SpadFS and operating system Spad with rewritten VFS layer targeted at high performance I/O applications. The paper presents microbenchmarks and real-world benchmarks of different filesystems on the same kernel as well as benchmarks of the same filesystem on different kernels – enabling the reader to make conclusion how much is the performance of various tasks affected by operating system and how much by physical layout of data on disk. The paper describes our novel features–most notably continuous allocation of directories and cross-file readahead – and shows their impact on performance.

Keywords: Filesystem, operating system, VFS, performance, readahead

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1422

2703 Comprehensive Evaluation on China-s Industrial Structure Optimization from the Perspective of Coordination

Authors: Ying Wang

Abstract:

From the perspective of industrial structure coordination and based on an explicit definition for the connotation of industrial structure coordination, the synergetic coefficients are used to measure the coordination degree between three industries' input structure and output structure, and then the efficacy function method is employed to comprehensively evaluate the level of China-s industrial structure optimization. It is showed that Chinese industrial structure presented a "v-shaped" variation tendency between 1996 and 2008, and its industrial structure adjustment got obvious achievements after 2003, with the industrial structure optimization level increasing continuously. However in 2009, the level of China-s industrial structure optimization declined sharply due to the decreasing contribution degree of value added structure and energy structure coordination and the lower coordination degree of value added structure and capital structure.

Keywords: China's industrial structure, Coordination degree, Efficacy function, Synergetic coefficients

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1358