Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 12

Search results for: entity

12 Named Entity Recognition using Support Vector Machine: A Language Independent Approach

Authors: Asif Ekbal, Sivaji Bandyopadhyay

Abstract:

Named Entity Recognition (NER) aims to classify each word of a document into predefined target named entity classes and is now-a-days considered to be fundamental for many Natural Language Processing (NLP) tasks such as information retrieval, machine translation, information extraction, question answering systems and others. This paper reports about the development of a NER system for Bengali and Hindi using Support Vector Machine (SVM). Though this state of the art machine learning technique has been widely applied to NER in several well-studied languages, the use of this technique to Indian languages (ILs) is very new. The system makes use of the different contextual information of the words along with the variety of features that are helpful in predicting the four different named (NE) classes, such as Person name, Location name, Organization name and Miscellaneous name. We have used the annotated corpora of 122,467 tokens of Bengali and 502,974 tokens of Hindi tagged with the twelve different NE classes 1, defined as part of the IJCNLP-08 NER Shared Task for South and South East Asian Languages (SSEAL) 2. In addition, we have manually annotated 150K wordforms of the Bengali news corpus, developed from the web-archive of a leading Bengali newspaper. We have also developed an unsupervised algorithm in order to generate the lexical context patterns from a part of the unlabeled Bengali news corpus. Lexical patterns have been used as the features of SVM in order to improve the system performance. The NER system has been tested with the gold standard test sets of 35K, and 60K tokens for Bengali, and Hindi, respectively. Evaluation results have demonstrated the recall, precision, and f-score values of 88.61%, 80.12%, and 84.15%, respectively, for Bengali and 80.23%, 74.34%, and 77.17%, respectively, for Hindi. Results show the improvement in the f-score by 5.13% with the use of context patterns. Statistical analysis, ANOVA is also performed to compare the performance of the proposed NER system with that of the existing HMM based system for both the languages.

Keywords: Named Entity (NE), Named Entity Recognition (NER), Support Vector Machine (SVM), Bengali, Hindi.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
11 An Interference Reduction Strategy for TDD-OFDMA Cellular Systems

Authors: Koudjo M. Koumadi, Kester Quist-Aphetsi, Robert A. Sowah, Amevi Acakpovi

Abstract:

Downlink/Uplink (DL/UL) time slot allocation (TSA) in time division duplex (TDD) systems is generally uniform for all the cells. This TSA however is not efficient in case of different traffic asymmetry ratios in different cells. We first propose a new 3-coordinate architecture to identify cells in an orthogonal frequency division multiple access (OFDMA) system where each cell is divided into three sectors. Then, this coordinate system is used to derive a TSA for symmetric traffic. Mathematical analysis and simulations are used to show that the proposed TSA outperforms the traditional all uniform type of TSA in terms of total intercellular interference, even under uniform symmetrical traffic. Two adaptation strategies are further proposed to adjust the proposed TSA to asymmetrical traffic with different DL/UL traffic ratios in different cells. Further simulation results show that the adaptation strategies also yield higher signal-to-interference ratio (SIR).

Keywords: Crossed TSA, different-entity interference, same-entity interference, uniform TSA

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
10 Addressing Scalability Issues of Named Entity Recognition Using Multi-Class Support Vector Machines

Authors: Mona Soliman Habib

Abstract:

This paper explores the scalability issues associated with solving the Named Entity Recognition (NER) problem using Support Vector Machines (SVM) and high-dimensional features. The performance results of a set of experiments conducted using binary and multi-class SVM with increasing training data sizes are examined. The NER domain chosen for these experiments is the biomedical publications domain, especially selected due to its importance and inherent challenges. A simple machine learning approach is used that eliminates prior language knowledge such as part-of-speech or noun phrase tagging thereby allowing for its applicability across languages. No domain-specific knowledge is included. The accuracy measures achieved are comparable to those obtained using more complex approaches, which constitutes a motivation to investigate ways to improve the scalability of multiclass SVM in order to make the solution more practical and useable. Improving training time of multi-class SVM would make support vector machines a more viable and practical machine learning solution for real-world problems with large datasets. An initial prototype results in great improvement of the training time at the expense of memory requirements.

Keywords: Named entity recognition, support vector machines, language independence, bioinformatics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
9 Weighted-Distance Sliding Windows and Cooccurrence Graphs for Supporting Entity-Relationship Discovery in Unstructured Text

Authors: Paolo Fantozzi, Luigi Laura, Umberto Nanni

Abstract:

The problem of Entity relation discovery in structured data, a well covered topic in literature, consists in searching within unstructured sources (typically, text) in order to find connections among entities. These can be a whole dictionary, or a specific collection of named items. In many cases machine learning and/or text mining techniques are used for this goal. These approaches might be unfeasible in computationally challenging problems, such as processing massive data streams. A faster approach consists in collecting the cooccurrences of any two words (entities) in order to create a graph of relations - a cooccurrence graph. Indeed each cooccurrence highlights some grade of semantic correlation between the words because it is more common to have related words close each other than having them in the opposite sides of the text. Some authors have used sliding windows for such problem: they count all the occurrences within a sliding windows running over the whole text. In this paper we generalise such technique, coming up to a Weighted-Distance Sliding Window, where each occurrence of two named items within the window is accounted with a weight depending on the distance between items: a closer distance implies a stronger evidence of a relationship. We develop an experiment in order to support this intuition, by applying this technique to a data set consisting in the text of the Bible, split into verses.

Keywords: Cooccurrence graph, entity relation graph, unstructured text, weighted distance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
8 A Neuron Model of Facial Recognition and Detection of an Authorized Entity Using Machine Learning System

Authors: J. K. Adedeji, M. O. Oyekanmi

Abstract:

This paper has critically examined the use of Machine Learning procedures in curbing unauthorized access into valuable areas of an organization. The use of passwords, pin codes, user’s identification in recent times has been partially successful in curbing crimes involving identities, hence the need for the design of a system which incorporates biometric characteristics such as DNA and pattern recognition of variations in facial expressions. The facial model used is the OpenCV library which is based on the use of certain physiological features, the Raspberry Pi 3 module is used to compile the OpenCV library, which extracts and stores the detected faces into the datasets directory through the use of camera. The model is trained with 50 epoch run in the database and recognized by the Local Binary Pattern Histogram (LBPH) recognizer contained in the OpenCV. The training algorithm used by the neural network is back propagation coded using python algorithmic language with 200 epoch runs to identify specific resemblance in the exclusive OR (XOR) output neurons. The research however confirmed that physiological parameters are better effective measures to curb crimes relating to identities.

Keywords: Biometric characters, facial recognition, neural network, OpenCV.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
7 Personalization of Web Search Using Web Page Clustering Technique

Authors: Amol Bapuso Rajmane, Pradeep M. Patil, Prakash J. Kulkarni

Abstract:

The Information Retrieval community is facing the problem of effective representation of Web search results. When we organize web search results into clusters it becomes easy to the users to quickly browse through search results. The traditional search engines organize search results into clusters for ambiguous queries, representing each cluster for each meaning of the query. The clusters are obtained according to the topical similarity of the retrieved search results, but it is possible for results to be totally dissimilar and still correspond to the same meaning of the query. People search is also one of the most common tasks on the Web nowadays, but when a particular person’s name is queried the search engines return web pages which are related to different persons who have the same queried name. By placing the burden on the user of disambiguating and collecting pages relevant to a particular person, in this paper, we have developed an approach that clusters web pages based on the association of the web pages to the different people and clusters that are based on generic entity search.

Keywords: Entity resolution, information retrieval, graph based disambiguation, web people search, clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
6 A Web-Based Self-Learning Grammar for Spoken Language Understanding

Authors: S. M. Biondi, V. Catania, R. Di Natale, A. R. Intilisano, D. Panno

Abstract:

One of the major goals of Spoken Dialog Systems (SDS) is to understand what the user utters. In the SDS domain, the Spoken Language Understanding (SLU) Module classifies user utterances by means of a pre-definite conceptual knowledge. The SLU module is able to recognize only the meaning previously included in its knowledge base. Due the vastity of that knowledge, the information storing is a very expensive process. Updating and managing the knowledge base are time-consuming and error-prone processes because of the rapidly growing number of entities like proper nouns and domain-specific nouns. This paper proposes a solution to the problem of Name Entity Recognition (NER) applied to a SDS domain. The proposed solution attempts to automatically recognize the meaning associated with an utterance by using the PANKOW (Pattern based Annotation through Knowledge On the Web) method at runtime. The method being proposed extracts information from the Web to increase the SLU knowledge module and reduces the development effort. In particular, the Google Search Engine is used to extract information from the Facebook social network.

Keywords: Spoken Dialog System, Spoken Language Understanding, Web Semantic, Name Entity Recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
5 Opinion Mining Framework in the Education Domain

Authors: A. M. H. Elyasir, K. S. M. Anbananthen

Abstract:

The internet is growing larger and becoming the most popular platform for the people to share their opinion in different interests. We choose the education domain specifically comparing some Malaysian universities against each other. This comparison produces benchmark based on different criteria shared by the online users in various online resources including Twitter, Facebook and web pages. The comparison is accomplished using opinion mining framework to extract, process the unstructured text and classify the result to positive, negative or neutral (polarity). Hence, we divide our framework to three main stages; opinion collection (extraction), unstructured text processing and polarity classification. The extraction stage includes web crawling, HTML parsing, Sentence segmentation for punctuation classification, Part of Speech (POS) tagging, the second stage processes the unstructured text with stemming and stop words removal and finally prepare the raw text for classification using Named Entity Recognition (NER). Last phase is to classify the polarity and present overall result for the comparison among the Malaysian universities. The final result is useful for those who are interested to study in Malaysia, in which our final output declares clear winners based on the public opinions all over the web.

Keywords: Entity Recognition, Education Domain, Opinion Mining, Unstructured Text.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
4 Accounting for SMEs – How Important is Size in Choosing between Global and Local Standards?

Authors: Cătălin Nicolae Albu, Nadia Albu, Maria Mădălina Gîrbină

Abstract:

There is limited evidence from various countries about the possible impact of various criteria to be used to determine the scope of the IFRS for SMEs issued in 2009 and, research is needed in this area. We provide evidence from Romania, an emerging economy member of the European Union. The aim of this paper is to analyze in a local setting if size is a relevant factor for deciding between local and global standards for SMEs. Our results indicate that size is a moderate indicator of the existence of possible users interested in financial statements and that there is a difference between the scopes of the standard determined on various criteria.. Also, we suggest that the international exposure is quite reduced in the case of SMEs, but is sufficient to suggest that at least some SMEs would benefit from international comparability of financial statements

Keywords: SMEs, IFRS for SMEs, accounting regulation, entity's size.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
3 Modelling of Designing a Conceptual Schema for Multimodal Freight Transportation Information System

Authors: Gia Surguladze, Lily Petriashvili, Nino Topuria, Giorgi Surguladze

Abstract:

Modelling of building processes of a multimodal freight transportation support information system is discussed based on modern CASE technologies. Functional efficiencies of ports in the eastern part of the Black Sea are analyzed taking into account their ecological, seasonal, resource usage parameters. By resources, we mean capacities of berths, cranes, automotive transport, as well as work crews and neighbouring airports. For the purpose of designing database of computer support system for Managerial (Logistics) function, using Object-Role Modeling (ORM) tool (NORMA–Natural ORM Architecture) is proposed, after which Entity Relationship Model (ERM) is generated in automated process. Software is developed based on Process-Oriented and Service-Oriented architecture, in Visual Studio.NET environment.

Keywords: Seaport resources, business-processes, multimodal transportation, CASE technology, object-role model, entity relationship model, SOA.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
2 Data-organization Before Learning Multi-Entity Bayesian Networks Structure

Authors: H. Bouhamed, A. Rebai, T. Lecroq, M. Jaoua

Abstract:

The objective of our work is to develop a new approach for discovering knowledge from a large mass of data, the result of applying this approach will be an expert system that will serve as diagnostic tools of a phenomenon related to a huge information system. We first recall the general problem of learning Bayesian network structure from data and suggest a solution for optimizing the complexity by using organizational and optimization methods of data. Afterward we proposed a new heuristic of learning a Multi-Entities Bayesian Networks structures. We have applied our approach to biological facts concerning hereditary complex illnesses where the literatures in biology identify the responsible variables for those diseases. Finally we conclude on the limits arched by this work.

Keywords: Data-organization, data-optimization, automatic knowledge discovery, Multi-Entities Bayesian networks, score merging.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
1 A Methodology for Creating a Conceptual Model Under Uncertainty

Authors: Bogdan Walek, Jiri Bartos, Cyril Klimes

Abstract:

This article deals with the conceptual modeling under uncertainty. First, the division of information systems with their definition will be described, focusing on those where the construction of a conceptual model is suitable for the design of future information system database. Furthermore, the disadvantages of the traditional approach in creating a conceptual model and database design will be analyzed. A comprehensive methodology for the creation of a conceptual model based on analysis of client requirements and the selection of a suitable domain model is proposed here. This article presents the expert system used for the construction of a conceptual model and is a suitable tool for database designers to create a conceptual model.

Keywords: Conceptual model, conceptual modeling, database, methodology, uncertainty, information system, entity, attribute, relationship, conceptual domain model, fuzzy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF