Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 11

information extraction Related Publications

11 Hybrid Knowledge Approach for Determining Health Care Provider Specialty from Patient Diagnoses

Authors: Erin Lynne Plettenberg, Jeremy Vickery

Abstract:

In an access-control situation, the role of a user determines whether a data request is appropriate. This paper combines vetted web mining and logic modeling to build a lightweight system for determining the role of a health care provider based only on their prior authorized requests. The model identifies provider roles with 100% recall from very little data. This shows the value of vetted web mining in AI systems, and suggests the impact of the ICD classification on medical practice.

Keywords: Ontology, information extraction, Electronic Medical Records, logic modeling, vetted web mining

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 507
10 Implementing a Database from a Requirement Specification

Authors: M. Omer, D. Wilson

Abstract:

Creating a database scheme is essentially a manual process. From a requirement specification the information contained within has to be analyzed and reduced into a set of tables, attributes and relationships. This is a time consuming process that has to go through several stages before an acceptable database schema is achieved. The purpose of this paper is to implement a Natural Language Processing (NLP) based tool to produce a relational database from a requirement specification. The Stanford CoreNLP version 3.3.1 and the Java programming were used to implement the proposed model. The outcome of this study indicates that a first draft of a relational database schema can be extracted from a requirement specification by using NLP tools and techniques with minimum user intervention. Therefore this method is a step forward in finding a solution that requires little or no user intervention.

Keywords: information extraction, relation extraction, Natural Language Processing

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1779
9 Enhanced Conference Organization Based On Correlation of Web Information and Ontology Based Expertise Search

Authors: Iman Jarkass, Maria Sokhn, Elena Mugellini, Omar Abou Khaled, Hassan Noureddine

Abstract:

From the importance of the conference and its constructive role in the studies discussion, there must be a strong organization that allows the exploitation of the discussions in opening new horizons. The vast amount of information scattered across the web, make it difficult to find experts, who can play a prominent role in organizing conferences. In this paper we proposed a new approach of extracting researchers- information from various Web resources and correlating them in order to confirm their correctness. As a validator of this approach, we propose a service that will be useful to set up a conference. Its main objective is to find appropriate experts, as well as the social events for a conference. For this application we us Semantic Web technologies like RDF and ontology to represent the confirmed information, which are linked to another ontology (skills ontology) that are used to present and compute the expertise.

Keywords: Semantic Web, Ontologies, information extraction, Expert finding, Social events

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1264
8 A Proposed Information Extraction Technique in Engineering Drawing for Reuse Design

Authors: Mohd Fahmi Mohamad Amran, Riza Sulaiman, Saliyah Kahar, Suziyanti Marjudi, Muhammad FairuzAbd Rauf

Abstract:

The extensive number of engineering drawing will be referred for planning process and the changes will produce a good engineering design to meet the demand in producing a new model. The advantage in reuse of engineering designs is to allow continuous product development to further improve the quality of product development, thus reduce the development costs. However, to retrieve the existing engineering drawing, it is time consuming, a complex process and are expose to errors. Engineering drawing file searching system will be proposed to solve this problem. It is essential for engineer and designer to have some sort of medium to enable them to search for drawing in the most effective way. This paper lays out the proposed research project under the area of information extraction in engineering drawing.

Keywords: information extraction, Computer Aided Design, engineering drawing, reuse design

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1758
7 Automatic Building an Extensive Arabic FA Terms Dictionary

Authors: El-Sayed Atlam, Masao Fuketa, Kazuhiro Morita, Jun-ichi Aoe

Abstract:

Field Association (FA) terms are a limited set of discriminating terms that give us the knowledge to identify document fields which are effective in document classification, similar file retrieval and passage retrieval. But the problem lies in the lack of an effective method to extract automatically relevant Arabic FA Terms to build a comprehensive dictionary. Moreover, all previous studies are based on FA terms in English and Japanese, and the extension of FA terms to other language such Arabic could be definitely strengthen further researches. This paper presents a new method to extract, Arabic FA Terms from domain-specific corpora using part-of-speech (POS) pattern rules and corpora comparison. Experimental evaluation is carried out for 14 different fields using 251 MB of domain-specific corpora obtained from Arabic Wikipedia dumps and Alhyah news selected average of 2,825 FA Terms (single and compound) per field. From the experimental results, recall and precision are 84% and 79% respectively. Therefore, this method selects higher number of relevant Arabic FA Terms at high precision and recall.

Keywords: Information Retrieval, information extraction, Document Classification, Arabic Field Association Terms

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1340
6 Use of Bayesian Network in Information Extraction from Unstructured Data Sources

Authors: Quratulain N. Rajput, Sajjad Haider

Abstract:

This paper applies Bayesian Networks to support information extraction from unstructured, ungrammatical, and incoherent data sources for semantic annotation. A tool has been developed that combines ontologies, machine learning, and information extraction and probabilistic reasoning techniques to support the extraction process. Data acquisition is performed with the aid of knowledge specified in the form of ontology. Due to the variable size of information available on different data sources, it is often the case that the extracted data contains missing values for certain variables of interest. It is desirable in such situations to predict the missing values. The methodology, presented in this paper, first learns a Bayesian network from the training data and then uses it to predict missing data and to resolve conflicts. Experiments have been conducted to analyze the performance of the presented methodology. The results look promising as the methodology achieves high degree of precision and recall for information extraction and reasonably good accuracy for predicting missing values.

Keywords: Machine Learning, Ontology, information extraction, Bayesian network

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1804
5 Information Extraction from Unstructured and Ungrammatical Data Sources for Semantic Annotation

Authors: Quratulain N. Rajput, Sajjad Haider, Nasir Touheed

Abstract:

The internet has become an attractive avenue for global e-business, e-learning, knowledge sharing, etc. Due to continuous increase in the volume of web content, it is not practically possible for a user to extract information by browsing and integrating data from a huge amount of web sources retrieved by the existing search engines. The semantic web technology enables advancement in information extraction by providing a suite of tools to integrate data from different sources. To take full advantage of semantic web, it is necessary to annotate existing web pages into semantic web pages. This research develops a tool, named OWIE (Ontology-based Web Information Extraction), for semantic web annotation using domain specific ontologies. The tool automatically extracts information from html pages with the help of pre-defined ontologies and gives them semantic representation. Two case studies have been conducted to analyze the accuracy of OWIE.

Keywords: Ontology, information extraction, Semantic Annotation, Wrapper

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1770
4 Semi-Automatic Approach for Semantic Annotation

Authors: Mohammad Yasrebi, Mehran Mohsenzadeh

Abstract:

The third phase of web means semantic web requires many web pages which are annotated with metadata. Thus, a crucial question is where to acquire these metadata. In this paper we propose our approach, a semi-automatic method to annotate the texts of documents and web pages and employs with a quite comprehensive knowledge base to categorize instances with regard to ontology. The approach is evaluated against the manual annotations and one of the most popular annotation tools which works the same as our tool. The approach is implemented in .net framework and uses the WordNet for knowledge base, an annotation tool for the Semantic Web.

Keywords: Semantic Web, Metadata, information extraction, Semantic Annotation, knowledge base

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1523
3 Deep Web Content Mining

Authors: Shohreh Ajoudanian, Mohammad Davarpanah Jazi

Abstract:

The rapid expansion of the web is causing the constant growth of information, leading to several problems such as increased difficulty of extracting potentially useful knowledge. Web content mining confronts this problem gathering explicit information from different web sites for its access and knowledge discovery. Query interfaces of web databases share common building blocks. After extracting information with parsing approach, we use a new data mining algorithm to match a large number of schemas in databases at a time. Using this algorithm increases the speed of information matching. In addition, instead of simple 1:1 matching, they do complex (m:n) matching between query interfaces. In this paper we present a novel correlation mining algorithm that matches correlated attributes with smaller cost. This algorithm uses Jaccard measure to distinguish positive and negative correlated attributes. After that, system matches the user query with different query interfaces in special domain and finally chooses the nearest query interface with user query to answer to it.

Keywords: information extraction, Content mining, complex matching, correlation mining

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1929
2 Ontology Population via NLP Techniques in Risk Management

Authors: Jawad Makki, Anne-Marie Alquier, Violaine Prince

Abstract:

In this paper we propose an NLP-based method for Ontology Population from texts and apply it to semi automatic instantiate a Generic Knowledge Base (Generic Domain Ontology) in the risk management domain. The approach is semi-automatic and uses a domain expert intervention for validation. The proposed approach relies on a set of Instances Recognition Rules based on syntactic structures, and on the predicative power of verbs in the instantiation process. It is not domain dependent since it heavily relies on linguistic knowledge. A description of an experiment performed on a part of the ontology of the PRIMA1 project (supported by the European community) is given. A first validation of the method is done by populating this ontology with Chemical Fact Sheets from Environmental Protection Agency2. The results of this experiment complete the paper and support the hypothesis that relying on the predicative power of verbs in the instantiation process improves the performance.

Keywords: Risk management, semantic analysis, information extraction, Ontology population, Instance Recognition Rules

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1230
1 Advanced Information Extraction with n-gram based LSI

Authors: Oya Kalıpsız, Ahmet Güven, Ö. Özgür Bozkurt

Abstract:

Number of documents being created increases at an increasing pace while most of them being in already known topics and little of them introducing new concepts. This fact has started a new era in information retrieval discipline where the requirements have their own specialties. That is digging into topics and concepts and finding out subtopics or relations between topics. Up to now IR researches were interested in retrieving documents about a general topic or clustering documents under generic subjects. However these conventional approaches can-t go deep into content of documents which makes it difficult for people to reach to right documents they were searching. So we need new ways of mining document sets where the critic point is to know much about the contents of the documents. As a solution we are proposing to enhance LSI, one of the proven IR techniques by supporting its vector space with n-gram forms of words. Positive results we have obtained are shown in two different application area of IR domain; querying a document database, clustering documents in the document database.

Keywords: Information Retrieval, information extraction, Document Clustering, LSI, n-gram

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1430