Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2353

Search results for: Distributed query processing

2353 Database Placement on Large-Scale Systems

Authors: Cherif Haddad, Faouzi Ben Charrada

Abstract:

Large-scale systems such as Grids offer infrastructures for both data distribution and parallel processing. The use of Grid infrastructures is a more recent issue that is already impacting the Distributed Database Management System industry. In DBMS, distributed query processing has emerged as a fundamental technique for ensuring high performance in distributed databases. Database placement is particularly important in large-scale systems because it reduces communication costs and improves resource usage. In this paper, we propose a dynamic database placement policy that depends on query patterns and Grid sites capabilities. We evaluate the performance of the proposed database placement policy using simulations. The obtained results show that dynamic database placement can significantly improve the performance of distributed query processing.

Keywords: Large-scale systems, Grid environment, Distributed Databases, Distributed query processing, Database placement

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1252
2352 A Query Optimization Strategy for Autonomous Distributed Database Systems

Authors: Dina K. Badawy, Dina M. Ibrahim, Alsayed A. Sallam

Abstract:

Distributed database is a collection of logically related databases that cooperate in a transparent manner. Query processing uses a communication network for transmitting data between sites. It refers to one of the challenges in the database world. The development of sophisticated query optimization technology is the reason for the commercial success of database systems, which complexity and cost increase with increasing number of relations in the query. Mariposa, query trading and query trading with processing task-trading strategies developed for autonomous distributed database systems, but they cause high optimization cost because of involvement of all nodes in generating an optimal plan. In this paper, we proposed a modification on the autonomous strategy K-QTPT that make the seller’s nodes with the lowest cost have gradually high priorities to reduce the optimization time. We implement our proposed strategy and present the results and analysis based on those results.

Keywords: Autonomous strategies, distributed database systems, high priority, query optimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 668
2351 A Delay-Tolerant Distributed Query Processing Architecture for Mobile Environment

Authors: T.P. Andamuthu, Dr. P. Balasubramanie

Abstract:

The intermittent connectivity modifies the “always on" network assumption made by all the distributed query processing systems. In modern- day systems, the absence of network connectivity is considered as a fault. Since the last upload, it might not be feasible to transmit all the data accumulated right away over the available connection. It is possible that vital information may be delayed excessively when the less important information takes place of the vital information. Owing to the restricted and uneven bandwidth, it is vital that the mobile nodes make the most advantageous use of the connectivity when it arrives. Hence, in order to select the data that needs to be transmitted first, some sort of data prioritization is essential. A continuous query processing system for intermittently connected mobile networks that comprises of a delaytolerant continuous query processor distributed across the mobile hosts has been proposed in this paper. In addition, a mechanism for prioritizing query results has been designed that guarantees enhanced accuracy and reduced delay. It is illustrated that our architecture reduces the client power consumption, increases query efficiency by the extensive simulation results.

Keywords: Broadcast, Location, Mobile host, Mobility, Query.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1222
2350 Topological Queries on Graph-structured XML Data: Models and Implementations

Authors: Hongzhi Wang, Jianzhong Li, Jizhou Luo

Abstract:

In many applications, data is in graph structure, which can be naturally represented as graph-structured XML. Existing queries defined on tree-structured and graph-structured XML data mainly focus on subgraph matching, which can not cover all the requirements of querying on graph. In this paper, a new kind of queries, topological query on graph-structured XML is presented. This kind of queries consider not only the structure of subgraph but also the topological relationship between subgraphs. With existing subgraph query processing algorithms, efficient algorithms for topological query processing are designed. Experimental results show the efficiency of implementation algorithms.

Keywords: XML, Graph Structure, Topological query.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1111
2349 Parallel Querying of Distributed Ontologies with Shared Vocabulary

Authors: Sharjeel Aslam, Vassil Vassilev, Karim Ouazzane

Abstract:

Ontologies and various semantic repositories became a convenient approach for implementing model-driven architectures of distributed systems on the Web. SPARQL is the standard query language for querying such. However, although SPARQL is well-established standard for querying semantic repositories in RDF and OWL format and there are commonly used APIs which supports it, like Jena for Java, its parallel option is not incorporated in them. This article presents a complete framework consisting of an object algebra for parallel RDF and an index-based implementation of the parallel query engine capable of dealing with the distributed RDF ontologies which share common vocabulary. It has been implemented in Java, and for validation of the algorithms has been applied to the problem of organizing virtual exhibitions on the Web.

Keywords: Distributed ontologies, parallel querying, semantic indexing, shared vocabulary, SPARQL.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 347
2348 Re-Optimization MVPP Using Common Subexpression for Materialized View Selection

Authors: Boontita Suchyukorn, Raweewan Auepanwiriyakul

Abstract:

A Data Warehouses is a repository of information integrated from source data. Information stored in data warehouse is the form of materialized in order to provide the better performance for answering the queries. Deciding which appropriated views to be materialized is one of important problem. In order to achieve this requirement, the constructing search space close to optimal is a necessary task. It will provide effective result for selecting view to be materialized. In this paper we have proposed an approach to reoptimize Multiple View Processing Plan (MVPP) by using global common subexpressions. The merged queries which have query processing cost not close to optimal would be rewritten. The experiment shows that our approach can help to improve the total query processing cost of MVPP and sum of query processing cost and materialized view maintenance cost is reduced as well after views are selected to be materialized.

Keywords: Data Warehouse, materialized views, query rewriting, common subexpressions.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1397
2347 Weight-Based Query Optimization System Using Buffer

Authors: Kashif Irfan, Fahad Shahbaz Khan, Tehseen Zia, M. A. Anwar

Abstract:

Fast retrieval of data has been a need of user in any database application. This paper introduces a buffer based query optimization technique in which queries are assigned weights according to their number of execution in a query bank. These queries and their optimized executed plans are loaded into the buffer at the start of the database application. For every query the system searches for a match in the buffer and executes the plan without creating new plans.

Keywords: Query Bank, Query Matcher, Weight Manager.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1025
2346 Query Optimization Techniques for XML Databases

Authors: Su Cheng Haw, G. S. V. Radha Krishna Rao

Abstract:

Over the past few years, XML (eXtensible Mark-up Language) has emerged as the standard for information representation and data exchange over the Internet. This paper provides a kick-start for new researches venturing in XML databases field. We survey the storage representation for XML document, review the XML query processing and optimization techniques with respect to the particular storage instance. Various optimization technologies have been developed to solve the query retrieval and updating problems. Towards the later year, most researchers proposed hybrid optimization techniques. Hybrid system opens the possibility of covering each technology-s weakness by its strengths. This paper reviews the advantages and limitations of optimization techniques.

Keywords: indexing, labeling scheme, query optimization, XML storage.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1676
2345 Contribution to the Query Optimization in the Object-Oriented Databases

Authors: Minyar Sassi, Amel Grissa-Touzi

Abstract:

Appeared toward 1986, the object-oriented databases management systems had not known successes knew five years after their birth. One of the major difficulties is the query optimization. We propose in this paper a new approach that permits to enrich techniques of query optimization existing in the object-oriented databases. Seen success that knew the query optimization in the relational model, our approach inspires itself of these optimization techniques and enriched it so that they can support the new concepts introduced by the object databases.

Keywords: Query, query optimization, relational databases, object-oriented databases.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1248
2344 Evolutionary Query Optimization for Heterogeneous Distributed Database Systems

Authors: Reza Ghaemi, Amin Milani Fard, Hamid Tabatabaee, Mahdi Sadeghizadeh

Abstract:

Due to new distributed database applications such as huge deductive database systems, the search complexity is constantly increasing and we need better algorithms to speedup traditional relational database queries. An optimal dynamic programming method for such high dimensional queries has the big disadvantage of its exponential order and thus we are interested in semi-optimal but faster approaches. In this work we present a multi-agent based mechanism to meet this demand and also compare the result with some commonly used query optimization algorithms.

Keywords: Information retrieval systems, list fusion methods, document score, multi-agent systems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3137
2343 An Efficient Graph Query Algorithm Based on Important Vertices and Decision Features

Authors: Xiantong Li, Jianzhong Li

Abstract:

Graph has become increasingly important in modeling complicated structures and schemaless data such as proteins, chemical compounds, and XML documents. Given a graph query, it is desirable to retrieve graphs quickly from a large database via graph-based indices. Different from the existing methods, our approach, called VFM (Vertex to Frequent Feature Mapping), makes use of vertices and decision features as the basic indexing feature. VFM constructs two mappings between vertices and frequent features to answer graph queries. The VFM approach not only provides an elegant solution to the graph indexing problem, but also demonstrates how database indexing and query processing can benefit from data mining, especially frequent pattern mining. The results show that the proposed method not only avoids the enumeration method of getting subgraphs of query graph, but also effectively reduces the subgraph isomorphism tests between the query graph and graphs in candidate answer set in verification stage.

Keywords: Decision Feature, Frequent Feature, Graph Dataset, Graph Query

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1577
2342 An Enhanced Distributed System to improve theTime Complexity of Binary Indexed Trees

Authors: Ahmed M. Elhabashy, A. Baes Mohamed, Abou El Nasr Mohamad

Abstract:

Distributed Computing Systems are usually considered the most suitable model for practical solutions of many parallel algorithms. In this paper an enhanced distributed system is presented to improve the time complexity of Binary Indexed Trees (BIT). The proposed system uses multi-uniform processors with identical architectures and a specially designed distributed memory system. The analysis of this system has shown that it has reduced the time complexity of the read query to O(Log(Log(N))), and the update query to constant complexity, while the naive solution has a time complexity of O(Log(N)) for both queries. The system was implemented and simulated using VHDL and Verilog Hardware Description Languages, with xilinx ISE 10.1, as the development environment and ModelSim 6.1c, similarly as the simulation tool. The simulation has shown that the overhead resulting by the wiring and communication between the system fragments could be fairly neglected, which makes it applicable to practically reach the maximum speed up offered by the proposed model.

Keywords: Binary Index Tree (BIT), Least Significant Bit (LSB), Parallel Adder (PA), Very High Speed Integrated Circuits HardwareDescription Language (VHDL), Distributed Parallel Computing System(DPCS).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1490
2341 Materialized View Effect on Query Performance

Authors: Yusuf Ziya Ayık, Ferhat Kahveci

Abstract:

Currently, database management systems have various tools such as backup and maintenance, and also provide statistical information such as resource usage and security. In terms of query performance, this paper covers query optimization, views, indexed tables, pre-computation materialized view, query performance analysis in which query plan alternatives can be created and the least costly one selected to optimize a query. Indexes and views can be created for related table columns. The literature review of this study showed that, in the course of time, despite the growing capabilities of the database management system, only database administrators are aware of the need for dealing with archival and transactional data types differently. These data may be constantly changing data used in everyday life, and also may be from the completed questionnaire whose data input was completed. For both types of data, the database uses its capabilities; but as shown in the findings section, instead of repeating similar heavy calculations which are carrying out same results with the same query over a survey results, using materialized view results can be in a more simple way. In this study, this performance difference was observed quantitatively considering the cost of the query.

Keywords: Materialized view, pre-computation, query cost, query performance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 806
2340 XML Data Management in Compressed Relational Database

Authors: Hongzhi Wang, Jianzhong Li, Hong Gao

Abstract:

XML is an important standard of data exchange and representation. As a mature database system, using relational database to support XML data may bring some advantages. But storing XML in relational database has obvious redundancy that wastes disk space, bandwidth and disk I/O when querying XML data. For the efficiency of storage and query XML, it is necessary to use compressed XML data in relational database. In this paper, a compressed relational database technology supporting XML data is presented. Original relational storage structure is adaptive to XPath query process. The compression method keeps this feature. Besides traditional relational database techniques, additional query process technologies on compressed relations and for special structure for XML are presented. In this paper, technologies for XQuery process in compressed relational database are presented..

Keywords: XML, compression, query processing

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1504
2339 Exploiting Query Feedback for Efficient Query Routing in Unstructured Peer-to-peer Networks

Authors: Iskandar Ishak, Naomie Salim

Abstract:

Unstructured peer-to-peer networks are popular due to its robustness and scalability. Query schemes that are being used in unstructured peer-to-peer such as the flooding and interest-based shortcuts suffer various problems such as using large communication overhead long delay response. The use of routing indices has been a popular approach for peer-to-peer query routing. It helps the query routing processes to learn the routing based on the feedbacks collected. In an unstructured network where there is no global information available, efficient and low cost routing approach is needed for routing efficiency. In this paper, we propose a novel mechanism for query-feedback oriented routing indices to achieve routing efficiency in unstructured network at a minimal cost. The approach also applied information retrieval technique to make sure the content of the query is understandable and will make the routing process not just based to the query hits but also related to the query content. Experiments have shown that the proposed mechanism performs more efficient than flood-based routing.

Keywords: Unstructured peer-to-peer, Searching, Retrieval, Internet.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1299
2338 Free-Form Query for Cell Phones

Authors: R. Ahmad, S. Abdul-Kareem

Abstract:

It is a challenge to provide a wide range of queries to database query systems for small mobile devices, such as the PDAs and cell phones. Currently, due to the physical and resource limitations of these devices, most reported database querying systems developed for them are only offering a small set of pre-determined queries for users to possibly pose. The above can be resolved by allowing free-form queries to be entered on the devices. Hence, a query language that does not restrict the combination of query terms entered by users is proposed. This paper presents the free-form query language and the method used in translating free-form queries to their equivalent SQL statements.

Keywords: Cell phone, database query language, free-formqueries, unplanned queries.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1147
2337 Selection of Relevant Servers in Distributed Information Retrieval System

Authors: Benhamouda Sara, Guezouli Larbi

Abstract:

Nowadays, the dissemination of information touches the distributed world, where selecting the relevant servers to a user request is an important problem in distributed information retrieval. During the last decade, several research studies on this issue have been launched to find optimal solutions and many approaches of collection selection have been proposed. In this paper, we propose a new collection selection approach that takes into consideration the number of documents in a collection that contains terms of the query and the weights of those terms in these documents. We tested our method and our studies show that this technique can compete with other state-of-the-art algorithms that we choose to test the performance of our approach.

Keywords: Distributed information retrieval, relevance, server selection, collection selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1058
2336 An Algebra for Protein Structure Data

Authors: Yanchao Wang, Rajshekhar Sunderraman

Abstract:

This paper presents an algebraic approach to optimize queries in domain-specific database management system for protein structure data. The approach involves the introduction of several protein structure specific algebraic operators to query the complex data stored in an object-oriented database system. The Protein Algebra provides an extensible set of high-level Genomic Data Types and Protein Data Types along with a comprehensive collection of appropriate genomic and protein functions. The paper also presents a query translator that converts high-level query specifications in algebra into low-level query specifications in Protein-QL, a query language designed to query protein structure data. The query transformation process uses a Protein Ontology that serves the purpose of a dictionary.

Keywords: Domain-Specific Data Management, Protein Algebra, Protein Ontology, Protein Structure Data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1275
2335 Data Annotation Models and Annotation Query Language

Authors: Neerja Bhatnagar, Benjoe A. Juliano, Renee S. Renner

Abstract:

This paper presents data annotation models at five levels of granularity (database, relation, column, tuple, and cell) of relational data to address the problem of unsuitability of most relational databases to express annotations. These models do not require any structural and schematic changes to the underlying database. These models are also flexible, extensible, customizable, database-neutral, and platform-independent. This paper also presents an SQL-like query language, named Annotation Query Language (AnQL), to query annotation documents. AnQL is simple to understand and exploits the already-existent wide knowledge and skill set of SQL.

Keywords: annotation query language, data annotations, data annotation models, semantic data annotations

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2094
2334 Optimized Approach for Secure Data Sharing in Distributed Database

Authors: Ahmed Mateen, Zhu Qingsheng, Ahmad Bilal

Abstract:

In the current age of technology, information is the most precious asset of a company. Today, companies have a large amount of data. As the data become larger, access to data for some particular information is becoming slower day by day. Faster data processing to shape it in the form of information is the biggest issue. The major problems in distributed databases are the efficiency of data distribution and response time of data distribution. The security of data distribution is also a big issue. For these problems, we proposed a strategy that can maximize the efficiency of data distribution and also increase its response time. This technique gives better results for secure data distribution from multiple heterogeneous sources. The newly proposed technique facilitates the companies for secure data sharing efficiently and quickly.

Keywords: ER-schema, electronic record, P2P framework, API, query formulation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 482
2333 Natural Language Database Interface for Selection of Data Using Grammar and Parsing

Authors: N. D. Karande, G. A. Patil

Abstract:

Databases have become ubiquitous. Almost all IT applications are storing into and retrieving information from databases. Retrieving information from the database requires knowledge of technical languages such as Structured Query Language (SQL). However majority of the users who interact with the databases do not have a technical background and are intimidated by the idea of using languages such as SQL. This has led to the development of a few Natural Language Database Interfaces (NLDBIs). A NLDBI allows the user to query the database in a natural language. This paper highlights on architecture of new NLDBI system, its implementation and discusses on results obtained. In most of the typical NLDBI systems the natural language statement is converted into an internal representation based on the syntactic and semantic knowledge of the natural language. This representation is then converted into queries using a representation converter. A natural language query is translated to an equivalent SQL query after processing through various stages. The work has been experimented on primitive database queries with certain constraints.

Keywords: Natural language database interface, representation converter, syntactic and semantic knowledge

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2384
2332 AnQL: A Query Language for Annotation Documents

Authors: Neerja Bhatnagar, Ben A. Juliano, Renee S. Renner

Abstract:

This paper presents data annotation models at five levels of granularity (database, relation, column, tuple, and cell) of relational data to address the problem of unsuitability of most relational databases to express annotations. These models do not require any structural and schematic changes to the underlying database. These models are also flexible, extensible, customizable, database-neutral, and platform-independent. This paper also presents an SQL-like query language, named Annotation Query Language (AnQL), to query annotation documents. AnQL is simple to understand and exploits the already-existent wide knowledge and skill set of SQL.

Keywords: Annotation query language, data annotations, data annotation models, semantic data annotations.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1480
2331 UB-Tree Indexing for Semantic Query Optimization of Range Queries

Authors: S. Housseno, A. Simonet, M. Simonet

Abstract:

Semantic query optimization consists in restricting the search space in order to reduce the set of objects of interest for a query. This paper presents an indexing method based on UB-trees and a static analysis of the constraints associated to the views of the database and to any constraint expressed on attributes. The result of the static analysis is a partitioning of the object space into disjoint blocks. Through Space Filling Curve (SFC) techniques, each fragment (block) of the partition is assigned a unique identifier, enabling the efficient indexing of fragments by UB-trees. The search space corresponding to a range query is restricted to a subset of the blocks of the partition. This approach has been developed in the context of a KB-DBMS but it can be applied to any relational system.

Keywords: Index, Range query, UB-tree, Space Filling Curve, Query optimization, Views, Database, Integrity Constraint, Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1231
2330 A Hybrid Nature Inspired Algorithm for Generating Optimal Query Plan

Authors: R. Gomathi, D. Sharmila

Abstract:

The emergence of the Semantic Web technology increases day by day due to the rapid growth of multiple web pages. Many standard formats are available to store the semantic web data. The most popular format is the Resource Description Framework (RDF). Querying large RDF graphs becomes a tedious procedure with a vast increase in the amount of data. The problem of query optimization becomes an issue in querying large RDF graphs. Choosing the best query plan reduces the amount of query execution time. To address this problem, nature inspired algorithms can be used as an alternative to the traditional query optimization techniques. In this research, the optimal query plan is generated by the proposed SAPSO algorithm which is a hybrid of Simulated Annealing (SA) and Particle Swarm Optimization (PSO) algorithms. The proposed SAPSO algorithm has the ability to find the local optimistic result and it avoids the problem of local minimum. Experiments were performed on different datasets by changing the number of predicates and the amount of data. The proposed algorithm gives improved results compared to existing algorithms in terms of query execution time.

Keywords: Semantic web, RDF, Query optimization, Nature inspired algorithms, PSO, SA.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1996
2329 Searching k-Nearest Neighbors to be Appropriate under Gamming Environments

Authors: Jae Moon Lee

Abstract:

In general, algorithms to find continuous k-nearest neighbors have been researched on the location based services, monitoring periodically the moving objects such as vehicles and mobile phone. Those researches assume the environment that the number of query points is much less than that of moving objects and the query points are not moved but fixed. In gaming environments, this problem is when computing the next movement considering the neighbors such as flocking, crowd and robot simulations. In this case, every moving object becomes a query point so that the number of query point is same to that of moving objects and the query points are also moving. In this paper, we analyze the performance of the existing algorithms focused on location based services how they operate under gaming environments.

Keywords: Flocking behavior, heterogeneous agents, similarity, simulation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1287
2328 Accelerating Side Channel Analysis with Distributed and Parallelized Processing

Authors: Kyunghee Oh, Dooho Choi

Abstract:

Although there is no theoretical weakness in a cryptographic algorithm, Side Channel Analysis can find out some secret data from the physical implementation of a cryptosystem. The analysis is based on extra information such as timing information, power consumption, electromagnetic leaks or even sound which can be exploited to break the system. Differential Power Analysis is one of the most popular analyses, as computing the statistical correlations of the secret keys and power consumptions. It is usually necessary to calculate huge data and takes a long time. It may take several weeks for some devices with countermeasures. We suggest and evaluate the methods to shorten the time to analyze cryptosystems. Our methods include distributed computing and parallelized processing.

Keywords: DPA, distributed computing, parallelized processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1621
2327 A Two-Step Approach for Tree-structured XPath Query Reduction

Authors: Minsoo Lee, Yun-mi Kim, Yoon-kyung Lee

Abstract:

XML data consists of a very flexible tree-structure which makes it difficult to support the storing and retrieving of XML data. The node numbering scheme is one of the most popular approaches to store XML in relational databases. Together with the node numbering storage scheme, structural joins can be used to efficiently process the hierarchical relationships in XML. However, in order to process a tree-structured XPath query containing several hierarchical relationships and conditional sentences on XML data, many structural joins need to be carried out, which results in a high query execution cost. This paper introduces mechanisms to reduce the XPath queries including branch nodes into a much more efficient form with less numbers of structural joins. A two step approach is proposed. The first step merges duplicate nodes in the tree-structured query and the second step divides the query into sub-queries, shortens the paths and then merges the sub-queries back together. The proposed approach can highly contribute to the efficient execution of XML queries. Experimental results show that the proposed scheme can reduce the query execution cost by up to an order of magnitude of the original execution cost.

Keywords: XML, Xpath, tree-structured query, query reduction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1285
2326 Query Reformulation Guided by External Resource for Information Retrieval

Authors: Mohammed El Amine Abderrahim

Abstract:

Reformulating the user query is a technique that aims to improve the performance of an Information Retrieval System (IRS) in terms of precision and recall. This paper tries to evaluate the technique of query reformulation guided by an external resource for Arabic texts. To do this, various precision and recall measures were conducted and two corpora with different external resources like Arabic WordNet (AWN) and the Arabic Dictionary (thesaurus) of Meaning (ADM) were used. Examination of the obtained results will allow us to measure the real contribution of this reformulation technique in improving the IRS performance.

Keywords: Arabic NLP, Arabic Information Retrieval, Arabic WordNet, Query Expansion.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1138
2325 Deep Web Content Mining

Authors: Shohreh Ajoudanian, Mohammad Davarpanah Jazi

Abstract:

The rapid expansion of the web is causing the constant growth of information, leading to several problems such as increased difficulty of extracting potentially useful knowledge. Web content mining confronts this problem gathering explicit information from different web sites for its access and knowledge discovery. Query interfaces of web databases share common building blocks. After extracting information with parsing approach, we use a new data mining algorithm to match a large number of schemas in databases at a time. Using this algorithm increases the speed of information matching. In addition, instead of simple 1:1 matching, they do complex (m:n) matching between query interfaces. In this paper we present a novel correlation mining algorithm that matches correlated attributes with smaller cost. This algorithm uses Jaccard measure to distinguish positive and negative correlated attributes. After that, system matches the user query with different query interfaces in special domain and finally chooses the nearest query interface with user query to answer to it.

Keywords: Content mining, complex matching, correlation mining, information extraction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2037
2324 A Hybrid Ontology Based Approach for Ranking Documents

Authors: Sarah Motiee, Azadeh Nematzadeh, Mehrnoush Shamsfard

Abstract:

Increasing growth of information volume in the internet causes an increasing need to develop new (semi)automatic methods for retrieval of documents and ranking them according to their relevance to the user query. In this paper, after a brief review on ranking models, a new ontology based approach for ranking HTML documents is proposed and evaluated in various circumstances. Our approach is a combination of conceptual, statistical and linguistic methods. This combination reserves the precision of ranking without loosing the speed. Our approach exploits natural language processing techniques to extract phrases from documents and the query and doing stemming on words. Then an ontology based conceptual method will be used to annotate documents and expand the query. To expand a query the spread activation algorithm is improved so that the expansion can be done flexible and in various aspects. The annotated documents and the expanded query will be processed to compute the relevance degree exploiting statistical methods. The outstanding features of our approach are (1) combining conceptual, statistical and linguistic features of documents, (2) expanding the query with its related concepts before comparing to documents, (3) extracting and using both words and phrases to compute relevance degree, (4) improving the spread activation algorithm to do the expansion based on weighted combination of different conceptual relationships and (5) allowing variable document vector dimensions. A ranking system called ORank is developed to implement and test the proposed model. The test results will be included at the end of the paper.

Keywords: Document ranking, Ontology, Spread activation algorithm, Annotation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1340