Search results for: information retrieval
9573 Performance Evaluation of Content Based Image Retrieval Using Indexed Views
Authors: Tahir Iqbal, Mumtaz Ali, Syed Wajahat Kareem, Muhammad Harris
Abstract:Digital information is expanding in exponential order in our life. Information that is residing online and offline are stored in huge repositories relating to every aspect of our lives. Getting the required information is a task of retrieval systems. Content based image retrieval (CBIR) is a retrieval system that retrieves the required information from repositories on the basis of the contents of the image. Time is a critical factor in retrieval system and using indexed views with CBIR system improves the time efficiency of retrieved results.
Keywords: content based image retrieval (CBIR), indexed view, color, image retrieval, cross correlationProcedia PDF Downloads 408
9572 A Comparative Study of Approaches in User-Centred Health Information Retrieval
Authors: Harsh Thakkar, Ganesh Iyer
Abstract:In this paper, we survey various user-centered or context-based biomedical health information retrieval systems. We present and discuss the performance of systems submitted in CLEF eHealth 2014 Task 3 for this purpose. We classify and focus on comparing the two most prevalent retrieval models in biomedical information retrieval namely: Language Model (LM) and Vector Space Model (VSM). We also report on the effectiveness of using external medical resources and ontologies like MeSH, Metamap, UMLS, etc. We observed that the LM based retrieval systems outperform VSM based systems on various fronts. From the results we conclude that the state-of-art system scores for MAP was 0.4146, [email protected] was 0.7560 and [email protected] was 0.7445, respectively. All of these score were reported by systems built on language modeling approaches.
Keywords: clinical document retrieval, concept-based information retrieval, query expansion, language models, vector space modelsProcedia PDF Downloads 220
9571 Comparison of Crossover Types to Obtain Optimal Queries Using Adaptive Genetic Algorithm
Authors: Wafa’ Alma'Aitah, Khaled Almakadmeh
Abstract:this study presents an information retrieval system of using genetic algorithm to increase information retrieval efficiency. Using vector space model, information retrieval is based on the similarity measurement between query and documents. Documents with high similarity to query are judge more relevant to the query and should be retrieved first. Using genetic algorithms, each query is represented by a chromosome; these chromosomes are fed into genetic operator process: selection, crossover, and mutation until an optimized query chromosome is obtained for document retrieval. Results show that information retrieval with adaptive crossover probability and single point type crossover and roulette wheel as selection type give the highest recall. The proposed approach is verified using (242) proceedings abstracts collected from the Saudi Arabian national conference.
Keywords: genetic algorithm, information retrieval, optimal queries, crossoverProcedia PDF Downloads 231
9570 Retrieval-Induced Forgetting Effects in Retrospective and Prospective Memory in Normal Aging: An Experimental Study
Authors: Merve Akca
Abstract:Retrieval-induced forgetting (RIF) refers to the phenomenon that selective retrieval of some information impairs memory for related, but not previously retrieved information. Despite age differences in retrieval-induced forgetting regarding retrospective memory being documented, this research aimed to highlight age differences in RIF of the prospective memory tasks for the first time. By using retrieval-practice paradigm, this study comparatively examined RIF effects in retrospective memory and event-based prospective memory in young and old adults. In this experimental study, a mixed factorial design with age group (Young, Old) as a between-subject variable, and memory type (Prospective, Retrospective) and item type (Practiced, Non-practiced) as within-subject variables was employed. Retrieval-induced forgetting was observed in the retrospective but not in the prospective memory task. Therefore, the results indicated that selective retrieval of past events led to suppression of other related past events in both age groups but not the suppression of memory for future intentions.
Keywords: prospective memory, retrieval-induced forgetting, retrieval inhibition, retrospective memoryProcedia PDF Downloads 207
9569 Merging of Results in Distributed Information Retrieval Systems
Authors: Larbi Guezouli, Imane Azzouz
Abstract:This work is located in the domain of distributed information retrieval ‘DIR’. A simplified view of the DIR requires a multi-search in a set of collections, which forces the system to analyze results found in these collections, and merge results back before sending them to the user in a single list. Our work is to find a fusion method based on the relevance score of each result received from collections and the relevance of the local search engine of each collection.
Keywords: information retrieval, distributed IR systems, merging results, dataminingProcedia PDF Downloads 194
9568 Algorithm for Information Retrieval Optimization
Authors: Kehinde K. Agbele, Kehinde Daniel Aruleba, Eniafe F. Ayetiran
Abstract:When using Information Retrieval Systems (IRS), users often present search queries made of ad-hoc keywords. It is then up to the IRS to obtain a precise representation of the user’s information need and the context of the information. This paper investigates optimization of IRS to individual information needs in order of relevance. The study addressed development of algorithms that optimize the ranking of documents retrieved from IRS. This study discusses and describes a Document Ranking Optimization (DROPT) algorithm for information retrieval (IR) in an Internet-based or designated databases environment. Conversely, as the volume of information available online and in designated databases is growing continuously, ranking algorithms can play a major role in the context of search results. In this paper, a DROPT technique for documents retrieved from a corpus is developed with respect to document index keywords and the query vectors. This is based on calculating the weight (
Keywords: information retrieval, document relevance, performance measures, personalizationProcedia PDF Downloads 173
9567 Biomedical Definition Extraction Using Machine Learning with Synonymous Feature
Authors: Jian Qu, Akira Shimazu
Abstract:OOV (Out Of Vocabulary) terms are terms that cannot be found in many dictionaries. Although it is possible to translate such OOV terms, the translations do not provide any real information for a user. We present an OOV term definition extraction method by using information available from the Internet. We use features such as occurrence of the synonyms and location distances. We apply machine learning method to find the correct definitions for OOV terms. We tested our method on both biomedical type and name type OOV terms, our work outperforms existing work with an accuracy of 86.5%.
Keywords: information retrieval, definition retrieval, OOV (out of vocabulary), biomedical information retrievalProcedia PDF Downloads 431
9566 Selection of Relevant Servers in Distributed Information Retrieval System
Authors: Benhamouda Sara, Guezouli Larbi
Abstract:Nowadays, the dissemination of information touches the distributed world, where selecting the relevant servers to a user request is an important problem in distributed information retrieval. During the last decade, several research studies on this issue have been launched to find optimal solutions and many approaches of collection selection have been proposed. In this paper, we propose a new collection selection approach that takes into consideration the number of documents in a collection that contains terms of the query and the weights of those terms in these documents. We tested our method and our studies show that this technique can compete with other state-of-the-art algorithms that we choose to test the performance of our approach.
Keywords: distributed information retrieval, relevance, server selection, collection selectionProcedia PDF Downloads 215
9565 A Framework of Product Information Service System Using Mobile Image Retrieval and Text Mining Techniques
Authors: Mei-Yi Wu, Shang-Ming Huang
Abstract:The online shoppers nowadays often search the product information on the Internet using some keywords of products. To use this kind of information searching model, shoppers should have a preliminary understanding about their interesting products and choose the correct keywords. However, if the products are first contact (for example, the worn clothes or backpack of passengers which you do not have any idea about the brands), these products cannot be retrieved due to insufficient information. In this paper, we discuss and study the applications in E-commerce using image retrieval and text mining techniques. We design a reasonable E-commerce application system containing three layers in the architecture to provide users product information. The system can automatically search and retrieval similar images and corresponding web pages on Internet according to the target pictures which taken by users. Then text mining techniques are applied to extract important keywords from these retrieval web pages and search the prices on different online shopping stores with these keywords using a web crawler. Finally, the users can obtain the product information including photos and prices of their favorite products. The experiments shows the efficiency of proposed system.
Keywords: mobile image retrieval, text mining, product information service system, online marketingProcedia PDF Downloads 292
9564 Role of Natural Language Processing in Information Retrieval; Challenges and Opportunities
Authors: Khaled M. Alhawiti
Abstract:This paper aims to analyze the role of natural language processing (NLP). The paper will discuss the role in the context of automated data retrieval, automated question answer, and text structuring. NLP techniques are gaining wider acceptance in real life applications and industrial concerns. There are various complexities involved in processing the text of natural language that could satisfy the need of decision makers. This paper begins with the description of the qualities of NLP practices. The paper then focuses on the challenges in natural language processing. The paper also discusses major techniques of NLP. The last section describes opportunities and challenges for future research.
Keywords: data retrieval, information retrieval, natural language processing, text structuringProcedia PDF Downloads 273
9563 Secure Image Retrieval Based on Orthogonal Decomposition under Cloud Environment
Authors: Y. Xu, L. Xiong, Z. Xu
Abstract:In order to protect data privacy, image with sensitive or private information needs to be encrypted before being outsourced to the cloud. However, this causes difficulties in image retrieval and data management. A secure image retrieval method based on orthogonal decomposition is proposed in the paper. The image is divided into two different components, for which encryption and feature extraction are executed separately. As a result, cloud server can extract features from an encrypted image directly and compare them with the features of the queried images, so that the user can thus obtain the image. Different from other methods, the proposed method has no special requirements to encryption algorithms. Experimental results prove that the proposed method can achieve better security and better retrieval precision.
Keywords: secure image retrieval, secure search, orthogonal decomposition, secure cloud computingProcedia PDF Downloads 273
9562 Leveraging Quality Metrics in Voting Model Based Thread Retrieval
Authors: Atefeh Heydari, Mohammadali Tavakoli, Zuriati Ismail, Naomie Salim
Abstract:Seeking and sharing knowledge on online forums have made them popular in recent years. Although online forums are valuable sources of information, due to variety of sources of messages, retrieving reliable threads with high quality content is an issue. Majority of the existing information retrieval systems ignore the quality of retrieved documents, particularly, in the field of thread retrieval. In this research, we present an approach that employs various quality features in order to investigate the quality of retrieved threads. Different aspects of content quality, including completeness, comprehensiveness, and politeness, are assessed using these features, which lead to finding not only textual, but also conceptual relevant threads for a user query within a forum. To analyse the influence of the features, we used an adopted version of voting model thread search as a retrieval system. We equipped it with each feature solely and also various combinations of features in turn during multiple runs. The results show that incorporating the quality features enhances the effectiveness of the utilised retrieval system significantly.
Keywords: content quality, forum search, thread retrieval, voting techniquesProcedia PDF Downloads 146
9561 Enhanced Arabic Semantic Information Retrieval System Based on Arabic Text Classification
Authors: A. Elsehemy, M. Abdeen , T. Nazmy
Abstract:Since the appearance of the Semantic web, many semantic search techniques and models were proposed to exploit the information in ontology to enhance the traditional keyword-based search. Many advances were made in languages such as English, German, French and Spanish. However, other languages such as Arabic are not fully supported yet. In this paper we present a framework for ontology based information retrieval for Arabic language. Our system consists of four main modules, namely query parser, indexer, search and a ranking module. Our approach includes building a semantic index by linking ontology concepts to documents, including an annotation weight for each link, to be used in ranking the results. We also augmented the framework with an automatic document categorizer, which enhances the overall document ranking. We have built three Arabic domain ontologies: Sports, Economic and Politics as example for the Arabic language. We built a knowledge base that consists of 79 classes and more than 1456 instances. The system is evaluated using the precision and recall metrics. We have done many retrieval operations on a sample of 40,316 documents with a size 320 MB of pure text. The results show that the semantic search enhanced with text classification gives better performance results than the system without classification.
Keywords: Arabic text classification, ontology based retrieval, Arabic semantic web, information retrieval, Arabic ontologyProcedia PDF Downloads 468
9560 Audio Information Retrieval in Mobile Environment with Fast Audio Classifier
Authors: Bruno T. Gomes, José A. Menezes, Giordano Cabral
Abstract:With the popularity of smartphones, mobile apps emerge to meet the diverse needs, however the resources at the disposal are limited, either by the hardware, due to the low computing power, or the software, that does not have the same robustness of desktop environment. For example, in automatic audio classification (AC) tasks, musical information retrieval (MIR) subarea, is required a fast processing and a good success rate. However the mobile platform has limited computing power and the best AC tools are only available for desktop. To solve these problems the fast classifier suits, to mobile environments, the most widespread MIR technologies, seeking a balance in terms of speed and robustness. At the end we found that it is possible to enjoy the best of MIR for mobile environments. This paper presents the results obtained and the difficulties encountered.
Keywords: audio classification, audio extraction, environment mobile, musical information retrievalProcedia PDF Downloads 478
9559 Graph Codes - 2D Projections of Multimedia Feature Graphs for Fast and Effective Retrieval
Authors: Stefan Wagenpfeil, Felix Engel, Paul McKevitt, Matthias Hemmje
Abstract:Multimedia Indexing and Retrieval is generally designed and implemented by employing feature graphs. These graphs typically contain a significant number of nodes and edges to reflect the level of detail in feature detection. A higher level of detail increases the effectiveness of the results but also leads to more complex graph structures. However, graph-traversal-based algorithms for similarity are quite inefficient and computation intensive, especially for large data structures. To deliver fast and effective retrieval, an efficient similarity algorithm, particularly for large graphs, is mandatory. Hence, in this paper, we define a graph-projection into a 2D space (Graph Code) as well as the corresponding algorithms for indexing and retrieval. We show that calculations in this space can be performed more efficiently than graph-traversals due to a simpler processing model and a high level of parallelization. In consequence, we prove that the effectiveness of retrieval also increases substantially, as Graph Codes facilitate more levels of detail in feature fusion. Thus, Graph Codes provide a significant increase in efficiency and effectiveness (especially for Multimedia indexing and retrieval) and can be applied to images, videos, audio, and text information.
Keywords: indexing, retrieval, multimedia, graph algorithm, graph codeProcedia PDF Downloads 85
9558 Personalization of Context Information Retrieval Model via User Search Behaviours for Ranking Document Relevance
Authors: Kehinde Agbele, Longe Olumide, Daniel Ekong, Dele Seluwa, Akintoye Onamade
Abstract:One major problem of most existing information retrieval systems (IRS) is that they provide even access and retrieval results to individual users specially based on the query terms user issued to the system. When using IRS, users often present search queries made of ad-hoc keywords. It is then up to IRS to obtain a precise representation of user’s information need, and the context of the information. In effect, the volume and range of the Internet documents is growing exponentially and consequently causes difficulties for a user to obtain information that precisely matches the user interest. Diverse combination techniques are used to achieve the specific goal. This is due, firstly, to the fact that users often do not present queries to IRS that optimally represent the information they want, and secondly, the measure of a document's relevance is highly subjective between diverse users. In this paper, we address the problem by investigating the optimization of IRS to individual information needs in order of relevance. The paper addressed the development of algorithms that optimize the ranking of documents retrieved from IRS. This paper addresses this problem with a two-fold approach in order to retrieve domain-specific documents. Firstly, the design of context of information. The context of a query determines retrieved information relevance using personalization and context-awareness. Thus, executing the same query in diverse contexts often leads to diverse result rankings based on the user preferences. Secondly, the relevant context aspects should be incorporated in a way that supports the knowledge domain representing users’ interests. In this paper, the use of evolutionary algorithms is incorporated to improve the effectiveness of IRS. A context-based information retrieval system that learns individual needs from user-provided relevance feedback is developed whose retrieval effectiveness is evaluated using precision and recall metrics. The results demonstrate how to use attributes from user interaction behavior to improve the IR effectiveness.
Keywords: context, document relevance, information retrieval, personalization, user search behaviorsProcedia PDF Downloads 405
9557 Local Texture and Global Color Descriptors for Content Based Image Retrieval
Authors: Tajinder Kaur, Anu Bala
Abstract:An image retrieval system is a computer system for browsing, searching, and retrieving images from a large database of digital images a new algorithm meant for content-based image retrieval (CBIR) is presented in this paper. The proposed method combines the color and texture features which are extracted the global and local information of the image. The local texture feature is extracted by using local binary patterns (LBP), which are evaluated by taking into consideration of local difference between the center pixel and its neighbors. For the global color feature, the color histogram (CH) is used which is calculated by RGB (red, green, and blue) spaces separately. In this paper, the combination of color and texture features are proposed for content-based image retrieval. The performance of the proposed method is tested on Corel 1000 database which is the natural database. The results after being investigated show a significant improvement in terms of their evaluation measures as compared to LBP and CH.
Keywords: color, texture, feature extraction, local binary patterns, image retrievalProcedia PDF Downloads 285
9556 Content Based Face Sketch Images Retrieval in WHT, DCT, and DWT Transform Domain
Authors: W. S. Besbas, M. A. Artemi, R. M. Salman
Abstract:Content based face sketch retrieval can be used to find images of criminals from their sketches for 'Crime Prevention'. This paper investigates the problem of CBIR of face sketch images in transform domain. Face sketch images that are similar to the query image are retrieved from the face sketch database. Features of the face sketch image are extracted in the spectrum domain of a selected transforms. These transforms are Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), and Walsh Hadamard Transform (WHT). For the performance analyses of features selection methods three face images databases are used. These are 'Sheffield face database', 'Olivetti Research Laboratory (ORL) face database', and 'Indian face database'. The City block distance measure is used to evaluate the performance of the retrieval process. The investigation concludes that, the retrieval rate is database dependent. But in general, the DCT is the best. On the other hand, the WHT is the best with respect to the speed of retrieving images.
Keywords: Content Based Image Retrieval (CBIR), face sketch image retrieval, features selection for CBIR, image retrieval in transform domainProcedia PDF Downloads 427
9555 A Graph-Based Retrieval Model for Passage Search
Authors: Junjie Zhong, Kai Hong, Lei Wang
Abstract:Passage Retrieval (PR) plays an important role in many Natural Language Processing (NLP) tasks. Traditional efficient retrieval models relying on exact term-matching, such as TF-IDF or BM25, have nowadays been exceeded by pre-trained language models which match by semantics. Though they gain effectiveness, deep language models often require large memory as well as time cost. To tackle the trade-off between efficiency and effectiveness in PR, this paper proposes Graph Passage Retriever (GraphPR), a graph-based model inspired by the development of graph learning techniques. Different from existing works, GraphPR is end-to-end and integrates both term-matching information and semantics. GraphPR constructs a passage-level graph from BM25 retrieval results and trains a GCN-like model on the graph with graph-based objectives. Passages were regarded as nodes in the constructed graph and were embedded in dense vectors. PR can then be implemented using embeddings and a fast vector-similarity search. Experiments on a variety of real-world retrieval datasets show that the proposed model outperforms related models in several evaluation metrics (e.g., mean reciprocal rank, accuracy, F1-scores) while maintaining a relatively low query latency and memory usage.
Keywords: efficiency, effectiveness, graph learning, language model, passage retrieval, term-matching modelProcedia PDF Downloads 11
9554 Natural Language Processing; the Future of Clinical Record Management
Authors: Khaled M. Alhawiti
Abstract:This paper investigates the future of medicine and the use of Natural language processing. The importance of having correct clinical information available online is remarkable; improving patient care at affordable costs could be achieved using automated applications to use the online clinical information. The major challenge towards the retrieval of such vital information is to have it appropriately coded. Majority of the online patient reports are not found to be coded and not accessible as its recorded in natural language text. The use of Natural Language processing provides a feasible solution by retrieving and organizing clinical information, available in text and transforming clinical data that is available for use. Systems used in NLP are rather complex to construct, as they entail considerable knowledge, however significant development has been made. Newly formed NLP systems have been tested and have established performance that is promising and considered as practical clinical applications.
Keywords: clinical information, information retrieval, natural language processing, automated applicationsProcedia PDF Downloads 328
9553 Human Action Retrieval System Using Features Weight Updating Based Relevance Feedback Approach
Authors: Munaf Rashid
Abstract:For content-based human action retrieval systems, search accuracy is often inferior because of the following two reasons 1) global information pertaining to videos is totally ignored, only low level motion descriptors are considered as a significant feature to match the similarity between query and database videos, and 2) the semantic gap between the high level user concept and low level visual features. Hence, in this paper, we propose a method that will address these two issues and in doing so, this paper contributes in two ways. Firstly, we introduce a method that uses both global and local information in one framework for an action retrieval task. Secondly, to minimize the semantic gap, a user concept is involved by incorporating features weight updating (FWU) Relevance Feedback (RF) approach. We use statistical characteristics to dynamically update weights of the feature descriptors so that after every RF iteration feature space is modified accordingly. For testing and validation purpose two human action recognition datasets have been utilized, namely Weizmann and UCF. Results show that even with a number of visual challenges the proposed approach performs well.
Keywords: relevance feedback (RF), action retrieval, semantic gap, feature descriptor, codebookProcedia PDF Downloads 398
9552 Critical Review of Web Content Mining Extraction Mechanisms
Authors: Rabia Bashir, Sajjad Akbar
Abstract:There is an inevitable demand of web mining due to rapid increase of huge information on the Internet, but the striking variety of web structures has made required content retrieval a difficult task. To counter this issue, Web Content Mining (WCM) emerges as a potential candidate which extracts and integrates suitable resources of data to users. In past few years, research has been done on several extraction techniques for WCM i.e. agent-based, template-based, assumption-based, statistic-based, wrapper-based and machine learning. However, it is still unclear that either these approaches are efficiently tackling the significant challenges of WCM or not. To answer this question, this paper identifies these challenges such as language independency, structure flexibility, performance, automation, dynamicity, redundancy handling, intelligence, relevant content retrieval, and privacy. Further, mapping of these challenges is done with existing extraction mechanisms which helps to adopt the most suitable WCM approach, given some conditions and characteristics at hand.
Keywords: content mining challenges, web content mining, web content extraction approaches, web information retrievalProcedia PDF Downloads 469
9551 Improving Research by the Integration of a Collaborative Dimension in an Information Retrieval (IR) System
Authors: Amel Hannech, Mehdi Adda, Hamid Mcheick
Abstract:In computer science, the purpose of finding useful information is still one of the most active and important research topics. The most popular application of information retrieval (IR) are Search Engines, they meet users' specific needs and aim to locate the effective information in the web. However, these search engines have some limitations related to the relevancy of the results and the ease to explore those results. In this context, we proposed in previous works a Multi-Space Search Engine model that is based on a multidimensional interpretation universe. In the present paper, we integrate an additional dimension that allows to offer users new research experiences. The added component is based on creating user profiles and calculating the similarity between them that then allow the use of collaborative filtering in retrieving search results. To evaluate the effectiveness of the proposed model, a prototype is developed. The experiments showed that the additional dimension has improved the relevancy of results by predicting the interesting items of users based on their experiences and the experiences of other similar users. The offered personalization service allows users to approve the pertinent items, which allows to enrich their profiles and further improve research.
Keywords: information retrieval, v-facets, user behavior analysis, user profiles, topical ontology, association rules, data personalizationProcedia PDF Downloads 198
9550 Study of Evaluation Model Based on Information System Success Model and Flow Theory Using Web-scale Discovery System
Authors: June-Jei Kuo, Yi-Chuan Hsieh
Abstract:Because of the rapid growth of information technology, more and more libraries introduce the new information retrieval systems to enhance the users’ experience, improve the retrieval efficiency, and increase the applicability of the library resources. Nevertheless, few of them are discussed the usability from the users’ aspect. The aims of this study are to understand that the scenario of the information retrieval system utilization, and to know why users are willing to continuously use the web-scale discovery system to improve the web-scale discovery system and promote their use of university libraries. Besides of questionnaires, observations and interviews, this study employs both Information System Success Model introduced by DeLone and McLean in 2003 and the flow theory to evaluate the system quality, information quality, service quality, use, user satisfaction, flow, and continuing to use web-scale discovery system of students from National Chung Hsing University. Then, the results are analyzed through descriptive statistics and structural equation modeling using AMOS. The results reveal that in web-scale discovery system, the user’s evaluation of system quality, information quality, and service quality is positively related to the use and satisfaction; however, the service quality only affects user satisfaction. User satisfaction and the flow show a significant impact on continuing to use. Moreover, user satisfaction has a significant impact on user flow. According to the results of this study, to maintain the stability of the information retrieval system, to improve the information content quality, and to enhance the relationship between subject librarians and students are recommended for the academic libraries. Meanwhile, to improve the system user interface, to minimize layer from system-level, to strengthen the data accuracy and relevance, to modify the sorting criteria of the data, and to support the auto-correct function are required for system provider. Finally, to establish better communication with librariana commended for all users.
Keywords: web-scale discovery system, discovery system, information system success model, flow theory, academic libraryProcedia PDF Downloads 36
9549 Network Word Discovery Framework Based on Sentence Semantic Vector Similarity
Authors: Ganfeng Yu, Yuefeng Ma, Shanliang Yang
Abstract:The word discovery is a key problem in text information retrieval technology. Methods in new word discovery tend to be closely related to words because they generally obtain new word results by analyzing words. With the popularity of social networks, individual netizens and online self-media have generated various network texts for the convenience of online life, including network words that are far from standard Chinese expression. How detect network words is one of the important goals in the field of text information retrieval today. In this paper, we integrate the word embedding model and clustering methods to propose a network word discovery framework based on sentence semantic similarity (S³-NWD) to detect network words effectively from the corpus. This framework constructs sentence semantic vectors through a distributed representation model, uses the similarity of sentence semantic vectors to determine the semantic relationship between sentences, and finally realizes network word discovery by the meaning of semantic replacement between sentences. The experiment verifies that the framework not only completes the rapid discovery of network words but also realizes the standard word meaning of the discovery of network words, which reflects the effectiveness of our work.
Keywords: text information retrieval, natural language processing, new word discovery, information extractionProcedia PDF Downloads 28
9548 Morphological Analysis of Manipuri Language: Wahei-Neinarol
Authors: Y. Bablu Singh, B. S. Purkayashtha, Chungkham Yashawanta Singh
Abstract:Morphological analysis forms the basic foundation in NLP applications including syntax parsing Machine Translation (MT), Information Retrieval (IR) and automatic indexing in all languages. It is the field of the linguistics; it can provide valuable information for computer based linguistics task such as lemmatization and studies of internal structure of the words. Computational Morphology is the application of morphological rules in the field of computational linguistics, and it is the emerging area in AI, which studies the structure of words, which are formed by combining smaller units of linguistics information, called morphemes: the building blocks of words. Morphological analysis provides about semantic and syntactic role in a sentence. It analyzes the Manipuri word forms and produces several grammatical information associated with the words. The Morphological Analyzer for Manipuri has been tested on 3500 Manipuri words in Shakti Standard format (SSF) using Meitei Mayek as source; thereby an accuracy of 80% has been obtained on a manual check.
Keywords: morphological analysis, machine translation, computational morphology, information retrieval, SSFProcedia PDF Downloads 272
9547 Content Based Video Retrieval System Using Principal Object Analysis
Authors: Van Thinh Bui, Anh Tuan Tran, Quoc Viet Ngo, The Bao Pham
Abstract:Video retrieval is a searching problem on videos or clips based on content in which they are relatively close to an input image or video. The application of this retrieval consists of selecting video in a folder or recognizing a human in security camera. However, some recent approaches have been in challenging problem due to the diversity of video types, frame transitions and camera positions. Besides, that an appropriate measures is selected for the problem is a question. In order to overcome all obstacles, we propose a content-based video retrieval system in some main steps resulting in a good performance. From a main video, we process extracting keyframes and principal objects using Segmentation of Aggregating Superpixels (SAS) algorithm. After that, Speeded Up Robust Features (SURF) are selected from those principal objects. Then, the model “Bag-of-words” in accompanied by SVM classification are applied to obtain the retrieval result. Our system is performed on over 300 videos in diversity from music, history, movie, sports, and natural scene to TV program show. The performance is evaluated in promising comparison to the other approaches.
Keywords: video retrieval, principal objects, keyframe, segmentation of aggregating superpixels, speeded up robust features, bag-of-words, SVMProcedia PDF Downloads 244
9546 Text Data Preprocessing Library: Bilingual Approach
Authors: Kabil Boukhari
Abstract:In the context of information retrieval, the selection of the most relevant words is a very important step. In fact, the text cleaning allows keeping only the most representative words for a better use. In this paper, we propose a library for the purpose text preprocessing within an implemented application to facilitate this task. This study has two purposes. The first, is to present the related work of the various steps involved in text preprocessing, presenting the segmentation, stemming and lemmatization algorithms that could be efficient in the rest of study. The second, is to implement a developed tool for text preprocessing in French and English. This library accepts unstructured text as input and provides the preprocessed text as output, based on a set of rules and on a base of stop words for both languages. The proposed library has been made on diﬀerent corpora and gave an interesting result.
Keywords: text preprocessing, segmentation, knowledge extraction, normalization, text generation, information retrievalProcedia PDF Downloads 35
9545 Utilization of CD-ROM Database as a Storage and Retrieval System by Students of Nasarawa State University Keffi
Authors: Suleiman Musa
Abstract:The utilization of CD-ROM as a storage and retrieval system by Nasarawa State University Keffi (NSUK) Library is crucial in preserving and dissemination of information to students and staff. This study investigated the utilization of CD-ROM Database storage and retrieval system by students of NUSK. Data was generated using structure questionnaire. One thousand and fifty two (1052) respondents were randomly selected among post-graduate and under-graduate students. Eight hundred and ten (810) questionnaires were returned, but only five hundred and ninety three (593) questionnaires were well completed and useful. The study found that post-graduate students use CD-ROM Databases more often than the under-graduate students in NSUK. The result of the study revealed that knowledge about CD-ROM Database 33.22% got it through library staff. 29.69% use CD-ROM once a month. Large number of users 45.70% purposely uses CD-ROM Databases for study and research. In fact, lack of users’ orientation amount to 58.35% of problems faced, while 31.20% lack of trained staff make it more difficult for utilization of CD-ROM Database. Major numbers of users 38.28% are neither satisfied nor dissatisfied, while a good number of them 27.99% are satisfied. Then 1.52% is highly dissatisfied but could not give reasons why. However, to ensure effective utilization of CD-ROM Database storage and retrieval system by students of NSUK, the following recommendations are made: effort should be made to encourage under-graduate in using CD-ROM Database. The institution should conduct orientation/induction course for students on CD-ROM Databases in the library. There is need for NSUK to produce in house databases on their CD-ROM for easy access by users.
Keywords: utilization, CD-ROM databases, storage, retrieval, studentsProcedia PDF Downloads 380
9544 MapReduce Algorithm for Geometric and Topological Information Extraction from 3D CAD Models
Authors: Ahmed Fradi
Abstract:In a digital world in perpetual evolution and acceleration, data more and more voluminous, rich and varied, the new software solutions emerged with the Big Data phenomenon offer new opportunities to the company enabling it not only to optimize its business and to evolve its production model, but also to reorganize itself to increase competitiveness and to identify new strategic axes. Design and manufacturing industrial companies, like the others, face these challenges, data represent a major asset, provided that they know how to capture, refine, combine and analyze them. The objective of our paper is to propose a solution allowing geometric and topological information extraction from 3D CAD model (precisely STEP files) databases, with specific algorithm based on the programming paradigm MapReduce. Our proposal is the first step of our future approach to 3D CAD object retrieval.
Keywords: Big Data, MapReduce, 3D object retrieval, CAD, STEP formatProcedia PDF Downloads 484