Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 27540

Search results for: ontology based retrieval

27510 Pushing the Boundary of Parallel Tractability for Ontology Materialization via Boolean Circuits

Abstract:

Materialization is an important reasoning service for applications built on the Web Ontology Language (OWL). To make materialization efficient in practice, current research focuses on deciding tractability of an ontology language and designing parallel reasoning algorithms. However, some well-known large-scale ontologies, such as YAGO, have been shown to have good performance for parallel reasoning, but they are expressed in ontology languages that are not parallelly tractable, i.e., the reasoning is inherently sequential in the worst case. This motivates us to study the problem of parallel tractability of ontology materialization from a theoretical perspective. That is we aim to identify the ontologies for which materialization is parallelly tractable, i.e., in the NC complexity. Since the NC complexity is defined based on Boolean circuit that is widely used to investigate parallel computing problems, we first transform the problem of materialization to evaluation of Boolean circuits, and then study the problem of parallel tractability based on circuits. In this work, we focus on datalog rewritable ontology languages. We use Boolean circuits to identify two classes of datalog rewritable ontologies (called parallelly tractable classes) such that materialization over them is parallelly tractable. We further investigate the parallel tractability of materialization of a datalog rewritable OWL fragment DHL (Description Horn Logic). Based on the above results, we analyze real-world datasets and show that many ontologies expressed in DHL belong to the parallelly tractable classes.

Keywords: ontology materialization, parallel reasoning, datalog, Boolean circuit

Procedia PDF Downloads 240

27509 An Ontology for Semantic Enrichment of RFID Systems

Authors: Haitham S. Hamza, Mohamed Maher, Shourok Alaa, Aya Khattab, Hadeal Ismail, Kamilia Hosny

Abstract:

Radio Frequency Identification (RFID) has become a key technology in the margining concept of Internet of Things (IoT). Naturally, business applications would require the deployment of various RFID systems that are developed by different vendors and use various data formats. This heterogeneity poses a real challenge in developing large-scale IoT systems with RFID as integration is becoming very complex and challenging. Semantic integration is a key approach to deal with this challenge. To do so, ontology for RFID systems need to be developed in order to annotated semantically RFID systems, and hence, facilitate their integration. Accordingly, in this paper, we propose ontology for RFID systems. The proposed ontology can be used to semantically enrich RFID systems, and hence, improve their usage and reasoning. The usage of the proposed ontology is explained through a simple scenario in the health care domain.

Keywords: RFID, semantic technology, ontology, sparql query language, heterogeneity

Procedia PDF Downloads 440

27508 Comparison of Crossover Types to Obtain Optimal Queries Using Adaptive Genetic Algorithm

Authors: Wafa’ Alma'Aitah, Khaled Almakadmeh

Abstract:

this study presents an information retrieval system of using genetic algorithm to increase information retrieval efficiency. Using vector space model, information retrieval is based on the similarity measurement between query and documents. Documents with high similarity to query are judge more relevant to the query and should be retrieved first. Using genetic algorithms, each query is represented by a chromosome; these chromosomes are fed into genetic operator process: selection, crossover, and mutation until an optimized query chromosome is obtained for document retrieval. Results show that information retrieval with adaptive crossover probability and single point type crossover and roulette wheel as selection type give the highest recall. The proposed approach is verified using (242) proceedings abstracts collected from the Saudi Arabian national conference.

Keywords: genetic algorithm, information retrieval, optimal queries, crossover

Procedia PDF Downloads 262

27507 Consolidating Service Engineering Ontologies Building Service Ontology from SOA Modeling Language (SoaML)

Authors: Purnomo Yustianto, Robin Doss, Suhardi, Novianto Budi Kurniawan

Abstract:

As a term for characterizing a process of devising a service system, the term ‘service engineering’ is still regarded as an ‘open’ research challenge due to unspecified details and conflicting perspectives. This paper presents consolidated service engineering ontologies in collecting, specifying and defining relationship between components pertinent within the context of service engineering. The ontologies are built by way of literature surveys from the collected conceptual works by collating various concepts into an integrated ontology. Two ontologies are produced: general service ontology and software service ontology. The software-service ontology is drawn from the informatics domain, while the generalized ontology of a service system is built from both a business management and the information system perspective. The produced ontologies are verified by exercising conceptual operationalizations of the ontologies in adopting several service orientation features and service system patterns. The proposed ontologies are demonstrated to be sufficient to serve as a basis for a service engineering framework.

Keywords: engineering, ontology, service, SoaML

Procedia PDF Downloads 156

27506 Development of Medical Intelligent Process Model Using Ontology Based Technique

Authors: Emmanuel Chibuogu Asogwa, Tochukwu Sunday Belonwu

Abstract:

An urgent demand for creative solutions has been created by the rapid expansion of medical knowledge, the complexity of patient care, and the requirement for more precise decision-making. As a solution to this problem, the creation of a Medical Intelligent Process Model (MIPM) utilizing ontology-based appears as a promising way to overcome this obstacle and unleash the full potential of healthcare systems. The development of a Medical Intelligent Process Model (MIPM) using ontology-based techniques is motivated by a lack of quick access to relevant medical information and advanced tools for treatment planning and clinical decision-making, which ontology-based techniques can provide. The aim of this work is to develop a structured and knowledge-driven framework that leverages ontology, a formal representation of domain knowledge, to enhance various aspects of healthcare. Object-Oriented Analysis and Design Methodology (OOADM) were adopted in the design of the system as we desired to build a usable and evolvable application. For effective implementation of this work, we used the following materials/methods/tools: the medical dataset for the test of our model in this work was obtained from Kaggle. The ontology-based technique was used with Confusion Matrix, MySQL, Python, Hypertext Markup Language (HTML), Hypertext Preprocessor (PHP), Cascaded Style Sheet (CSS), JavaScript, Dreamweaver, and Fireworks. According to test results on the new system using Confusion Matrix, both the accuracy and overall effectiveness of the medical intelligent process significantly improved by 20% compared to the previous system. Therefore, using the model is recommended for healthcare professionals.

Keywords: ontology-based, model, database, OOADM, healthcare

Procedia PDF Downloads 44

27505 An Intensional Conceptualization Model for Ontology-Based Semantic Integration

Authors: Fateh Adhnouss, Husam El-Asfour, Kenneth McIsaac, AbdulMutalib Wahaishi, Idris El-Feghia

Abstract:

Conceptualization is an essential component of semantic ontology-based approaches. There have been several approaches that rely on extensional structure and extensional reduction structure in order to construct conceptualization. In this paper, several limitations are highlighted relating to their applicability to the construction of conceptualizations in dynamic and open environments. These limitations arise from a number of strong assumptions that do not apply to such environments. An intensional structure is strongly argued to be a natural and adequate modeling approach. This paper presents a conceptualization structure based on property relations and propositions theory (PRP) to the model ontology that is suitable for open environments. The model extends the First-Order Logic (FOL) notation and defines the formal representation that enables interoperability between software systems and supports semantic integration for software systems in open, dynamic environments.

Keywords: conceptualization, ontology, extensional structure, intensional structure

Procedia PDF Downloads 75

27504 Ontology-Based Backpropagation Neural Network Classification and Reasoning Strategy for NoSQL and SQL Databases

Authors: Hao-Hsiang Ku, Ching-Ho Chi

Abstract:

Big data applications have become an imperative for many fields. Many researchers have been devoted into increasing correct rates and reducing time complexities. Hence, the study designs and proposes an Ontology-based backpropagation neural network classification and reasoning strategy for NoSQL big data applications, which is called ON4NoSQL. ON4NoSQL is responsible for enhancing the performances of classifications in NoSQL and SQL databases to build up mass behavior models. Mass behavior models are made by MapReduce techniques and Hadoop distributed file system based on Hadoop service platform. The reference engine of ON4NoSQL is the ontology-based backpropagation neural network classification and reasoning strategy. Simulation results indicate that ON4NoSQL can efficiently achieve to construct a high performance environment for data storing, searching, and retrieving.

Keywords: Hadoop, NoSQL, ontology, back propagation neural network, high distributed file system

Procedia PDF Downloads 235

27503 A Survey of Response Generation of Dialogue Systems

Authors: Yifan Fan, Xudong Luo, Pingping Lin

Abstract:

An essential task in the field of artificial intelligence is to allow computers to interact with people through natural language. Therefore, researches such as virtual assistants and dialogue systems have received widespread attention from industry and academia. The response generation plays a crucial role in dialogue systems, so to push forward the research on this topic, this paper surveys various methods for response generation. We sort out these methods into three categories. First one includes finite state machine methods, framework methods, and instance methods. The second contains full-text indexing methods, ontology methods, vast knowledge base method, and some other methods. The third covers retrieval methods and generative methods. We also discuss some hybrid methods based knowledge and deep learning. We compare their disadvantages and advantages and point out in which ways these studies can be improved further. Our discussion covers some studies published in leading conferences such as IJCAI and AAAI in recent years.

Keywords: deep learning, generative, knowledge, response generation, retrieval

Procedia PDF Downloads 105

27502 Content Based Video Retrieval System Using Principal Object Analysis

Authors: Van Thinh Bui, Anh Tuan Tran, Quoc Viet Ngo, The Bao Pham

Abstract:

Video retrieval is a searching problem on videos or clips based on content in which they are relatively close to an input image or video. The application of this retrieval consists of selecting video in a folder or recognizing a human in security camera. However, some recent approaches have been in challenging problem due to the diversity of video types, frame transitions and camera positions. Besides, that an appropriate measures is selected for the problem is a question. In order to overcome all obstacles, we propose a content-based video retrieval system in some main steps resulting in a good performance. From a main video, we process extracting keyframes and principal objects using Segmentation of Aggregating Superpixels (SAS) algorithm. After that, Speeded Up Robust Features (SURF) are selected from those principal objects. Then, the model “Bag-of-words” in accompanied by SVM classification are applied to obtain the retrieval result. Our system is performed on over 300 videos in diversity from music, history, movie, sports, and natural scene to TV program show. The performance is evaluated in promising comparison to the other approaches.

Keywords: video retrieval, principal objects, keyframe, segmentation of aggregating superpixels, speeded up robust features, bag-of-words, SVM

Procedia PDF Downloads 275

27501 Information Retrieval for Kafficho Language

Authors: Mareye Zeleke Mekonen

Abstract:

The Kafficho language has distinct issues in information retrieval because of its restricted resources and dearth of standardized methods. In this endeavor, with the cooperation and support of linguists and native speakers, we investigate the creation of information retrieval systems specifically designed for the Kafficho language. The Kafficho information retrieval system allows Kafficho speakers to access information easily in an efficient and effective way. Our objective is to conduct an information retrieval experiment using 220 Kafficho text files, including fifteen sample questions. Tokenization, normalization, stop word removal, stemming, and other data pre-processing chores, together with additional tasks like term weighting, were prerequisites for the vector space model to represent each page and a particular query. The three well-known measurement metrics we used for our word were Precision, Recall, and and F-measure, with values of 87%, 28%, and 35%, respectively. This demonstrates how well the Kaffiho information retrieval system performed well while utilizing the vector space paradigm.

Keywords: Kafficho, information retrieval, stemming, vector space

Procedia PDF Downloads 19

27500 Local Texture and Global Color Descriptors for Content Based Image Retrieval

Authors: Tajinder Kaur, Anu Bala

Abstract:

An image retrieval system is a computer system for browsing, searching, and retrieving images from a large database of digital images a new algorithm meant for content-based image retrieval (CBIR) is presented in this paper. The proposed method combines the color and texture features which are extracted the global and local information of the image. The local texture feature is extracted by using local binary patterns (LBP), which are evaluated by taking into consideration of local difference between the center pixel and its neighbors. For the global color feature, the color histogram (CH) is used which is calculated by RGB (red, green, and blue) spaces separately. In this paper, the combination of color and texture features are proposed for content-based image retrieval. The performance of the proposed method is tested on Corel 1000 database which is the natural database. The results after being investigated show a significant improvement in terms of their evaluation measures as compared to LBP and CH.

Keywords: color, texture, feature extraction, local binary patterns, image retrieval

Procedia PDF Downloads 331

27499 A Case Study of Ontology-Based Sentiment Analysis for Fan Pages

Authors: C. -L. Huang, J. -H. Ho

Abstract:

Social media has become more and more important in our life. Many enterprises promote their services and products to fans via the social media. The positive or negative sentiment of feedbacks from fans is very important for enterprises to improve their products, services, and promotion activities. The purpose of this paper is to understand the sentiment of the fan’s responses by analyzing the responses posted by fans on Facebook. The entity and aspect of fan’s responses were analyzed based on a predefined ontology. The ontology for cell phone sentiment analysis consists of aspect categories on the top level as follows: overall, shape, hardware, brand, price, and service. Each category consists of several sub-categories. All aspects for a fan’s response were found based on the ontology, and their corresponding sentimental terms were found using lexicon-based approach. The sentimental scores for aspects of fan responses were obtained by summarizing the sentimental terms in responses. The frequency of 'like' was also weighted in the sentimental score calculation. Three famous cell phone fan pages on Facebook were selected as demonstration cases to evaluate performances of the proposed methodology. Human judgment by several domain experts was also built for performance comparison. The performances of proposed approach were as good as those of human judgment on precision, recall and F1-measure.

Keywords: opinion mining, ontology, sentiment analysis, text mining

Procedia PDF Downloads 213

27498 Content-Based Image Retrieval Using HSV Color Space Features

Authors: Hamed Qazanfari, Hamid Hassanpour, Kazem Qazanfari

Abstract:

In this paper, a method is provided for content-based image retrieval. Content-based image retrieval system searches query an image based on its visual content in an image database to retrieve similar images. In this paper, with the aim of simulating the human visual system sensitivity to image's edges and color features, the concept of color difference histogram (CDH) is used. CDH includes the perceptually color difference between two neighboring pixels with regard to colors and edge orientations. Since the HSV color space is close to the human visual system, the CDH is calculated in this color space. In addition, to improve the color features, the color histogram in HSV color space is also used as a feature. Among the extracted features, efficient features are selected using entropy and correlation criteria. The final features extract the content of images most efficiently. The proposed method has been evaluated on three standard databases Corel 5k, Corel 10k and UKBench. Experimental results show that the accuracy of the proposed image retrieval method is significantly improved compared to the recently developed methods.

Keywords: content-based image retrieval, color difference histogram, efficient features selection, entropy, correlation

Procedia PDF Downloads 222

27497 An Integrated Visualization Tool for Heat Map and Gene Ontology Graph

Authors: Somyung Oh, Jeonghyeon Ha, Kyungwon Lee, Sejong Oh

Abstract:

Microarray is a general scheme to find differentially expressed genes for target concept. The output is expressed by heat map, and biologists analyze related terms of gene ontology to find some characteristics of differentially expressed genes. In this paper, we propose integrated visualization tool for heat map and gene ontology graph. Previous two methods are used by static manner and separated way. Proposed visualization tool integrates them and users can interactively manage it. Users may easily find and confirm related terms of gene ontology for given differentially expressed genes. Proposed tool also visualize connections between genes on heat map and gene ontology graph. We expect biologists to find new meaningful topics by proposed tool.

Keywords: heat map, gene ontology, microarray, differentially expressed gene

Procedia PDF Downloads 280

27496 A Graph-Based Retrieval Model for Passage Search

Authors: Junjie Zhong, Kai Hong, Lei Wang

Abstract:

Passage Retrieval (PR) plays an important role in many Natural Language Processing (NLP) tasks. Traditional efficient retrieval models relying on exact term-matching, such as TF-IDF or BM25, have nowadays been exceeded by pre-trained language models which match by semantics. Though they gain effectiveness, deep language models often require large memory as well as time cost. To tackle the trade-off between efficiency and effectiveness in PR, this paper proposes Graph Passage Retriever (GraphPR), a graph-based model inspired by the development of graph learning techniques. Different from existing works, GraphPR is end-to-end and integrates both term-matching information and semantics. GraphPR constructs a passage-level graph from BM25 retrieval results and trains a GCN-like model on the graph with graph-based objectives. Passages were regarded as nodes in the constructed graph and were embedded in dense vectors. PR can then be implemented using embeddings and a fast vector-similarity search. Experiments on a variety of real-world retrieval datasets show that the proposed model outperforms related models in several evaluation metrics (e.g., mean reciprocal rank, accuracy, F1-scores) while maintaining a relatively low query latency and memory usage.

Keywords: efficiency, effectiveness, graph learning, language model, passage retrieval, term-matching model

Procedia PDF Downloads 90

27495 The Use of Ontology Framework for Automation Digital Forensics Investigation

Authors: Ahmad Luthfi

Abstract:

One of the main goals of a computer forensic analyst is to determine the cause and effect of the acquisition of a digital evidence in order to obtain relevant information on the case is being handled. In order to get fast and accurate results, this paper will discuss the approach known as ontology framework. This model uses a structured hierarchy of layers that create connectivity between the variant and searching investigation of activity that a computer forensic analysis activities can be carried out automatically. There are two main layers are used, namely analysis tools and operating system. By using the concept of ontology, the second layer is automatically designed to help investigator to perform the acquisition of digital evidence. The methodology of automation approach of this research is by utilizing forward chaining where the system will perform a search against investigative steps and atomically structured in accordance with the rules of the ontology.

Keywords: ontology, framework, automation, forensics

Procedia PDF Downloads 297

27494 Design an Algorithm for Software Development in CBSE Envrionment Using Feed Forward Neural Network

Authors: Amit Verma, Pardeep Kaur

Abstract:

In software development organizations, Component based Software engineering (CBSE) is emerging paradigm for software development and gained wide acceptance as it often results in increase quality of software product within development time and budget. In component reusability, main challenges are the right component identification from large repositories at right time. The major objective of this work is to provide efficient algorithm for storage and effective retrieval of components using neural network and parameters based on user choice through clustering. This research paper aims to propose an algorithm that provides error free and automatic process (for retrieval of the components) while reuse of the component. In this algorithm, keywords (or components) are extracted from software document, after by applying k mean clustering algorithm. Then weights assigned to those keywords based on their frequency and after assigning weights, ANN predicts whether correct weight is assigned to keywords (or components) or not, otherwise it back propagates in to initial step (re-assign the weights). In last, store those all keywords into repositories for effective retrieval. Proposed algorithm is very effective in the error correction and detection with user base choice while choice of component for reusability for efficient retrieval is there.

Keywords: component based development, clustering, back propagation algorithm, keyword based retrieval

Procedia PDF Downloads 358

27493 Sentiment Analysis: An Enhancement of Ontological-Based Features Extraction Techniques and Word Equations

Authors: Mohd Ridzwan Yaakub, Muhammad Iqbal Abu Latiffi

Abstract:

Online business has become popular recently due to the massive amount of information and medium available on the Internet. This has resulted in the huge number of reviews where the consumers share their opinion, criticisms, and satisfaction on the products they have purchased on the websites or the social media such as Facebook and Twitter. However, to analyze customer’s behavior has become very important for organizations to find new market trends and insights. The reviews from the websites or the social media are in structured and unstructured data that need a sentiment analysis approach in analyzing customer’s review. In this article, techniques used in will be defined. Definition of the ontology and description of its possible usage in sentiment analysis will be defined. It will lead to empirical research that related to mobile phones used in research and the ontology used in the experiment. The researcher also will explore the role of preprocessing data and feature selection methodology. As the result, ontology-based approach in sentiment analysis can help in achieving high accuracy for the classification task.

Keywords: feature selection, ontology, opinion, preprocessing data, sentiment analysis

Procedia PDF Downloads 175

27492 Merging of Results in Distributed Information Retrieval Systems

Authors: Larbi Guezouli, Imane Azzouz

Abstract:

This work is located in the domain of distributed information retrieval ‘DIR’. A simplified view of the DIR requires a multi-search in a set of collections, which forces the system to analyze results found in these collections, and merge results back before sending them to the user in a single list. Our work is to find a fusion method based on the relevance score of each result received from collections and the relevance of the local search engine of each collection.

Keywords: information retrieval, distributed IR systems, merging results, datamining

Procedia PDF Downloads 305

27491 Image Retrieval Based on Multi-Feature Fusion for Heterogeneous Image Databases

Authors: N. W. U. D. Chathurani, Shlomo Geva, Vinod Chandran, Proboda Rajapaksha

Abstract:

Selecting an appropriate image representation is the most important factor in implementing an effective Content-Based Image Retrieval (CBIR) system. This paper presents a multi-feature fusion approach for efficient CBIR, based on the distance distribution of features and relative feature weights at the time of query processing. It is a simple yet effective approach, which is free from the effect of features' dimensions, ranges, internal feature normalization and the distance measure. This approach can easily be adopted in any feature combination to improve retrieval quality. The proposed approach is empirically evaluated using two benchmark datasets for image classification (a subset of the Corel dataset and Oliva and Torralba) and compared with existing approaches. The performance of the proposed approach is confirmed with the significantly improved performance in comparison with the independently evaluated baseline of the previously proposed feature fusion approaches.

Keywords: feature fusion, image retrieval, membership function, normalization

Procedia PDF Downloads 321

27490 Graph Codes - 2D Projections of Multimedia Feature Graphs for Fast and Effective Retrieval

Authors: Stefan Wagenpfeil, Felix Engel, Paul McKevitt, Matthias Hemmje

Abstract:

Multimedia Indexing and Retrieval is generally designed and implemented by employing feature graphs. These graphs typically contain a significant number of nodes and edges to reflect the level of detail in feature detection. A higher level of detail increases the effectiveness of the results but also leads to more complex graph structures. However, graph-traversal-based algorithms for similarity are quite inefficient and computation intensive, especially for large data structures. To deliver fast and effective retrieval, an efficient similarity algorithm, particularly for large graphs, is mandatory. Hence, in this paper, we define a graph-projection into a 2D space (Graph Code) as well as the corresponding algorithms for indexing and retrieval. We show that calculations in this space can be performed more efficiently than graph-traversals due to a simpler processing model and a high level of parallelization. In consequence, we prove that the effectiveness of retrieval also increases substantially, as Graph Codes facilitate more levels of detail in feature fusion. Thus, Graph Codes provide a significant increase in efficiency and effectiveness (especially for Multimedia indexing and retrieval) and can be applied to images, videos, audio, and text information.

Keywords: indexing, retrieval, multimedia, graph algorithm, graph code

Procedia PDF Downloads 129

27489 Measuring the Resilience of e-Governments Using an Ontology

Authors: Onyekachi Onwudike, Russell Lock, Iain Phillips

Abstract:

The variability that exists across governments, her departments and the provisioning of services has been areas of concern in the E-Government domain. There is a need for reuse and integration across government departments which are accompanied by varying degrees of risks and threats. There is also the need for assessment, prevention, preparation, response and recovery when dealing with these risks or threats. The ability of a government to cope with the emerging changes that occur within it is known as resilience. In order to forge ahead with concerted efforts to manage reuse and integration induced risks or threats to governments, the ambiguities contained within resilience must be addressed. Enhancing resilience in the E-Government domain is synonymous with reducing risks governments face with provisioning of services as well as reuse of components across departments. Therefore, it can be said that resilience is responsible for the reduction in government’s vulnerability to changes. In this paper, we present the use of the ontology to measure the resilience of governments. This ontology is made up of a well-defined construct for the taxonomy of resilience. A specific class known as ‘Resilience Requirements’ is added to the ontology. This class embraces the concept of resilience into the E-Government domain ontology. Considering that the E-Government domain is a highly complex one made up of different departments offering different services, the reliability and resilience of the E-Government domain have become more complex and critical to understand. We present questions that can help a government access how prepared they are in the face of risks and what steps can be taken to recover from them. These questions can be asked with the use of queries. The ontology focuses on developing a case study section that is used to explore ways in which government departments can become resilient to the different kinds of risks and threats they may face. A collection of resilience tools and resources have been developed in our ontology to encourage governments to take steps to prepare for emergencies and risks that a government may face with the integration of departments and reuse of components across government departments. To achieve this, the ontology has been extended by rules. We present two tools for understanding resilience in the E-Government domain as a risk analysis target and the output of these tools when applied to resilience in the E-Government domain. We introduce the classification of resilience using the defined taxonomy and modelling of existent relationships based on the defined taxonomy. The ontology is constructed on formal theory and it provides a semantic reference framework for the concept of resilience. Key terms which fall under the purview of resilience with respect to E-Governments are defined. Terms are made explicit and the relationships that exist between risks and resilience are made explicit. The overall aim of the ontology is to use it within standards that would be followed by all governments for government-based resilience measures.

Keywords: E-Government, Ontology, Relationships, Resilience, Risks, Threats

Procedia PDF Downloads 316

27488 Unsupervised Domain Adaptive Text Retrieval with Query Generation

Authors: Rui Yin, Haojie Wang, Xun Li

Abstract:

Recently, mainstream dense retrieval methods have obtained state-of-the-art results on some datasets and tasks. However, they require large amounts of training data, which is not available in most domains. The severe performance degradation of dense retrievers on new data domains has limited the use of dense retrieval methods to only a few domains with large training datasets. In this paper, we propose an unsupervised domain-adaptive approach based on query generation. First, a generative model is used to generate relevant queries for each passage in the target corpus, and then the generated queries are used for mining negative passages. Finally, the query-passage pairs are labeled with a cross-encoder and used to train a domain-adapted dense retriever. Experiments show that our approach is more robust than previous methods in target domains that require less unlabeled data.

Keywords: dense retrieval, query generation, unsupervised training, text retrieval

Procedia PDF Downloads 38

27487 Leveraging Quality Metrics in Voting Model Based Thread Retrieval

Authors: Atefeh Heydari, Mohammadali Tavakoli, Zuriati Ismail, Naomie Salim

Abstract:

Seeking and sharing knowledge on online forums have made them popular in recent years. Although online forums are valuable sources of information, due to variety of sources of messages, retrieving reliable threads with high quality content is an issue. Majority of the existing information retrieval systems ignore the quality of retrieved documents, particularly, in the field of thread retrieval. In this research, we present an approach that employs various quality features in order to investigate the quality of retrieved threads. Different aspects of content quality, including completeness, comprehensiveness, and politeness, are assessed using these features, which lead to finding not only textual, but also conceptual relevant threads for a user query within a forum. To analyse the influence of the features, we used an adopted version of voting model thread search as a retrieval system. We equipped it with each feature solely and also various combinations of features in turn during multiple runs. The results show that incorporating the quality features enhances the effectiveness of the utilised retrieval system significantly.

Keywords: content quality, forum search, thread retrieval, voting techniques

Procedia PDF Downloads 181

27486 Towards an Intelligent Ontology Construction Cost Estimation System: Using BIM and New Rules of Measurement Techniques

Authors: F. H. Abanda, B. Kamsu-Foguem, J. H. M. Tah

Abstract:

Construction cost estimation is one of the most important aspects of construction project design. For generations, the process of cost estimating has been manual, time-consuming and error-prone. This has partly led to most cost estimates to be unclear and riddled with inaccuracies that at times lead to over- or under-estimation of construction cost. The development of standard set of measurement rules that are understandable by all those involved in a construction project, have not totally solved the challenges. Emerging Building Information Modelling (BIM) technologies can exploit standard measurement methods to automate cost estimation process and improves accuracies. This requires standard measurement methods to be structured in ontologically and machine readable format; so that BIM software packages can easily read them. Most standard measurement methods are still text-based in textbooks and require manual editing into tables or Spreadsheet during cost estimation. The aim of this study is to explore the development of an ontology based on New Rules of Measurement (NRM) commonly used in the UK for cost estimation. The methodology adopted is Methontology, one of the most widely used ontology engineering methodologies. The challenges in this exploratory study are also reported and recommendations for future studies proposed.

Keywords: BIM, construction projects, cost estimation, NRM, ontology

Procedia PDF Downloads 513

27485 Critical Review of Web Content Mining Extraction Mechanisms

Authors: Rabia Bashir, Sajjad Akbar

Abstract:

There is an inevitable demand of web mining due to rapid increase of huge information on the Internet, but the striking variety of web structures has made required content retrieval a difficult task. To counter this issue, Web Content Mining (WCM) emerges as a potential candidate which extracts and integrates suitable resources of data to users. In past few years, research has been done on several extraction techniques for WCM i.e. agent-based, template-based, assumption-based, statistic-based, wrapper-based and machine learning. However, it is still unclear that either these approaches are efficiently tackling the significant challenges of WCM or not. To answer this question, this paper identifies these challenges such as language independency, structure flexibility, performance, automation, dynamicity, redundancy handling, intelligence, relevant content retrieval, and privacy. Further, mapping of these challenges is done with existing extraction mechanisms which helps to adopt the most suitable WCM approach, given some conditions and characteristics at hand.

Keywords: content mining challenges, web content mining, web content extraction approaches, web information retrieval

Procedia PDF Downloads 509

27484 Ontology as Knowledge Capture Tool in Organizations: A Literature Review

Authors: Maria Margaretha, Dana Indra Sensuse, Lukman

Abstract:

Knowledge capture is a step in knowledge life cycle to get knowledge in the organization. Tacit and explicit knowledge are needed to organize in a path, so the organization will be easy to choose which knowledge will be use. There are many challenges to capture knowledge in the organization, such as researcher must know which knowledge has been validated by an expert, how to get tacit knowledge from experts and make it explicit knowledge, and so on. Besides that, the technology will be a reliable tool to help the researcher to capture knowledge. Some paper wrote how ontology in knowledge management can be used for proposed framework to capture and reuse knowledge. Organization has to manage their knowledge, process capture and share will decide their position in the business area. This paper will describe further from literature review about the tool of ontology that will help the organization to capture its knowledge.

Keywords: knowledge capture, ontology, technology, organization

Procedia PDF Downloads 566

27483 Algorithm for Information Retrieval Optimization

Authors: Kehinde K. Agbele, Kehinde Daniel Aruleba, Eniafe F. Ayetiran

Abstract:

When using Information Retrieval Systems (IRS), users often present search queries made of ad-hoc keywords. It is then up to the IRS to obtain a precise representation of the user’s information need and the context of the information. This paper investigates optimization of IRS to individual information needs in order of relevance. The study addressed development of algorithms that optimize the ranking of documents retrieved from IRS. This study discusses and describes a Document Ranking Optimization (DROPT) algorithm for information retrieval (IR) in an Internet-based or designated databases environment. Conversely, as the volume of information available online and in designated databases is growing continuously, ranking algorithms can play a major role in the context of search results. In this paper, a DROPT technique for documents retrieved from a corpus is developed with respect to document index keywords and the query vectors. This is based on calculating the weight (

Keywords: information retrieval, document relevance, performance measures, personalization

Procedia PDF Downloads 212

27482 Ontology for Cross-Site-Scripting (XSS) Attack in Cybersecurity

Authors: Jean Rosemond Dora, Karol Nemoga

Abstract:

In this work, we tackle a frequent problem that frequently occurs in the cybersecurity field which is the exploitation of websites by XSS attacks, which are nowadays considered a complicated attack. These types of attacks aim to execute malicious scripts in a web browser of the client by including code in a legitimate web page. A serious matter is when a website accepts the “user-input” option. Attackers can exploit the web application (if vulnerable), and then steal sensitive data (session cookies, passwords, credit cards, etc.) from the server and/or from the client. However, the difficulty of the exploitation varies from website to website. Our focus is on the usage of ontology in cybersecurity against XSS attacks, on the importance of the ontology, and its core meaning for cybersecurity. We explain how a vulnerable website can be exploited, and how different JavaScript payloads can be used to detect vulnerabilities. We also enumerate some tools to use for an efficient analysis. We present detailed reasoning on what can be done to improve the security of a website in order to resist attacks, and we provide supportive examples. Then, we apply an ontology model against XSS attacks to strengthen the protection of a web application. However, we note that the existence of ontology does not improve the security itself, but it has to be properly used and should require a maximum of security layers to be taken into account.

Keywords: cybersecurity, web application vulnerabilities, cyber threats, ontology model

Procedia PDF Downloads 137

27481 Mondoc: Informal Lightweight Ontology for Faceted Semantic Classification of Hypernymy

Authors: M. Regina Carreira-Lopez

Abstract:

Lightweight ontologies seek to concrete union relationships between a parent node, and a secondary node, also called "child node". This logic relation (L) can be formally defined as a triple ontological relation (LO) equivalent to LO in ⟨LN, LE, LC⟩, and where LN represents a finite set of nodes (N); LE is a set of entities (E), each of which represents a relationship between nodes to form a rooted tree of ⟨LN, LE⟩; and LC is a finite set of concepts (C), encoded in a formal language (FL). Mondoc enables more refined searches on semantic and classified facets for retrieving specialized knowledge about Atlantic migrations, from the Declaration of Independence of the United States of America (1776) and to the end of the Spanish Civil War (1939). The model looks forward to increasing documentary relevance by applying an inverse frequency of co-ocurrent hypernymy phenomena for a concrete dataset of textual corpora, with RMySQL package. Mondoc profiles archival utilities implementing SQL programming code, and allows data export to XML schemas, for achieving semantic and faceted analysis of speech by analyzing keywords in context (KWIC). The methodology applies random and unrestricted sampling techniques with RMySQL to verify the resonance phenomena of inverse documentary relevance between the number of co-occurrences of the same term (t) in more than two documents of a set of texts (D). Secondly, the research also evidences co-associations between (t) and their corresponding synonyms and antonyms (synsets) are also inverse. The results from grouping facets or polysemic words with synsets in more than two textual corpora within their syntagmatic context (nouns, verbs, adjectives, etc.) state how to proceed with semantic indexing of hypernymy phenomena for subject-heading lists and for authority lists for documentary and archival purposes. Mondoc contributes to the development of web directories and seems to achieve a proper and more selective search of e-documents (classification ontology). It can also foster on-line catalogs production for semantic authorities, or concepts, through XML schemas, because its applications could be used for implementing data models, by a prior adaptation of the based-ontology to structured meta-languages, such as OWL, RDF (descriptive ontology). Mondoc serves to the classification of concepts and applies a semantic indexing approach of facets. It enables information retrieval, as well as quantitative and qualitative data interpretation. The model reproduces a triple tuple ⟨LN, LE, LT, LCF L, BKF⟩ where LN is a set of entities that connect with other nodes to concrete a rooted tree in ⟨LN, LE⟩. LT specifies a set of terms, and LCF acts as a finite set of concepts, encoded in a formal language, L. Mondoc only resolves partial problems of linguistic ambiguity (in case of synonymy and antonymy), but neither the pragmatic dimension of natural language nor the cognitive perspective is addressed. To achieve this goal, forthcoming programming developments should target at oriented meta-languages with structured documents in XML.

Keywords: hypernymy, information retrieval, lightweight ontology, resonance

Procedia PDF Downloads 101