Search results for: lexical retrieval
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 537

Search results for: lexical retrieval

507 Examining the Development of Complexity, Accuracy and Fluency in L2 Learners' Writing after L2 Instruction

Authors: Khaled Barkaoui

Abstract:

Research on second-language (L2) learning tends to focus on comparing students with different levels of proficiency at one point in time. However, to understand L2 development, we need more longitudinal research. In this study, we adopt a longitudinal approach to examine changes in three indicators of L2 ability, complexity, accuracy, and fluency (CAF), as reflected in the writing of L2 learners when writing on different tasks before and after a period L2 instruction. Each of 85 Chinese learners of English at three levels of English language proficiency responded to two writing tasks (independent and integrated) before and after nine months of English-language study in China. Each essay (N= 276) was analyzed in terms of numerous CAF indices using both computer coding and human rating: number of words written, number of errors per 100 words, ratings of error severity, global syntactic complexity (MLS), complexity by coordination (T/S), complexity by subordination (C/T), clausal complexity (MLC), phrasal complexity (NP density), syntactic variety, lexical density, lexical variation, lexical sophistication, and lexical bundles. Results were then compared statistically across tasks, L2 proficiency levels, and time. Overall, task type had significant effects on fluency and some syntactic complexity indices (complexity by coordination, structural variety, clausal complexity, phrase complexity) and lexical density, sophistication, and bundles, but not accuracy. L2 proficiency had significant effects on fluency, accuracy, and lexical variation, but not syntactic complexity. Finally, fluency, frequency of errors, but not accuracy ratings, syntactic complexity indices (clausal complexity, global complexity, complexity by subordination, phrase complexity, structural variety) and lexical complexity (lexical density, variation, and sophistication) exhibited significant changes after instruction, particularly for the independent task. We discuss the findings and their implications for assessment, instruction, and research on CAF in the context of L2 writing.

Keywords: second language writing, Fluency, accuracy, complexity, longitudinal

Procedia PDF Downloads 120
506 Local Texture and Global Color Descriptors for Content Based Image Retrieval

Authors: Tajinder Kaur, Anu Bala

Abstract:

An image retrieval system is a computer system for browsing, searching, and retrieving images from a large database of digital images a new algorithm meant for content-based image retrieval (CBIR) is presented in this paper. The proposed method combines the color and texture features which are extracted the global and local information of the image. The local texture feature is extracted by using local binary patterns (LBP), which are evaluated by taking into consideration of local difference between the center pixel and its neighbors. For the global color feature, the color histogram (CH) is used which is calculated by RGB (red, green, and blue) spaces separately. In this paper, the combination of color and texture features are proposed for content-based image retrieval. The performance of the proposed method is tested on Corel 1000 database which is the natural database. The results after being investigated show a significant improvement in terms of their evaluation measures as compared to LBP and CH.

Keywords: color, texture, feature extraction, local binary patterns, image retrieval

Procedia PDF Downloads 329
505 A Graph-Based Retrieval Model for Passage Search

Authors: Junjie Zhong, Kai Hong, Lei Wang

Abstract:

Passage Retrieval (PR) plays an important role in many Natural Language Processing (NLP) tasks. Traditional efficient retrieval models relying on exact term-matching, such as TF-IDF or BM25, have nowadays been exceeded by pre-trained language models which match by semantics. Though they gain effectiveness, deep language models often require large memory as well as time cost. To tackle the trade-off between efficiency and effectiveness in PR, this paper proposes Graph Passage Retriever (GraphPR), a graph-based model inspired by the development of graph learning techniques. Different from existing works, GraphPR is end-to-end and integrates both term-matching information and semantics. GraphPR constructs a passage-level graph from BM25 retrieval results and trains a GCN-like model on the graph with graph-based objectives. Passages were regarded as nodes in the constructed graph and were embedded in dense vectors. PR can then be implemented using embeddings and a fast vector-similarity search. Experiments on a variety of real-world retrieval datasets show that the proposed model outperforms related models in several evaluation metrics (e.g., mean reciprocal rank, accuracy, F1-scores) while maintaining a relatively low query latency and memory usage.

Keywords: efficiency, effectiveness, graph learning, language model, passage retrieval, term-matching model

Procedia PDF Downloads 85
504 Biomedical Definition Extraction Using Machine Learning with Synonymous Feature

Authors: Jian Qu, Akira Shimazu

Abstract:

OOV (Out Of Vocabulary) terms are terms that cannot be found in many dictionaries. Although it is possible to translate such OOV terms, the translations do not provide any real information for a user. We present an OOV term definition extraction method by using information available from the Internet. We use features such as occurrence of the synonyms and location distances. We apply machine learning method to find the correct definitions for OOV terms. We tested our method on both biomedical type and name type OOV terms, our work outperforms existing work with an accuracy of 86.5%.

Keywords: information retrieval, definition retrieval, OOV (out of vocabulary), biomedical information retrieval

Procedia PDF Downloads 461
503 Cross-Language Variation and the ‘Fused’ Zone in Bilingual Mental Lexicon: An Experimental Research

Authors: Yuliya E. Leshchenko, Tatyana S. Ostapenko

Abstract:

Language variation is a widespread linguistic phenomenon which can affect different levels of a language system: phonological, morphological, lexical, syntactic, etc. It is obvious that the scope of possible standard alternations within a particular language is limited by a variety of its norms and regulations which set more or less clear boundaries for what is possible and what is not possible for the speakers. The possibility of lexical variation (alternate usage of lexical items within the same contexts) is based on the fact that the meanings of words are not clearly and rigidly defined in the consciousness of the speakers. Therefore, lexical variation is usually connected with unstable relationship between words and their referents: a case when a particular lexical item refers to different types of referents, or when a particular referent can be named by various lexical items. We assume that the scope of lexical variation in bilingual speech is generally wider than that observed in monolingual speech due to the fact that, besides ‘lexical item – referent’ relations it involves the possibility of cross-language variation of L1 and L2 lexical items. We use the term ‘cross-language variation’ to denote a case when two equivalent words of different languages are treated by a bilingual speaker as freely interchangeable within the common linguistic context. As distinct from code-switching which is traditionally defined as the conscious use of more than one language within one communicative act, in case of cross-language lexical variation the speaker does not perceive the alternate lexical items as belonging to different languages and, therefore, does not realize the change of language code. In the paper, the authors present research of lexical variation of adult Komi-Permyak – Russian bilingual speakers. The two languages co-exist on the territory of the Komi-Permyak District in Russia (Komi-Permyak as the ethnic language and Russian as the official state language), are usually acquired from birth in natural linguistic environment and, according to the data of sociolinguistic surveys, are both identified by the speakers as coordinate mother tongues. The experimental research demonstrated that alternation of Komi-Permyak and Russian words within one utterance/phrase is highly frequent both in speech perception and production. Moreover, our participants estimated cross-language word combinations like ‘маленькая /Russian/ нывка /Komi-Permyak/’ (‘a little girl’) or ‘мунны /Komi-Permyak/ домой /Russian/’ (‘go home’) as regular/habitual, containing no violation of any linguistic rules and being equally possible in speech as the equivalent intra-language word combinations (‘учöтик нывка’ /Komi-Permyak/ or ‘идти домой’ /Russian/). All the facts considered, we claim that constant concurrent use of the two languages results in the fact that a large number of their words tend to be intuitively interpreted by the speakers as lexical variants not only related to the same referent, but also referring to both languages or, more precisely, to none of them in particular. Consequently, we can suppose that bilingual mental lexicon includes an extensive ‘fused’ zone of lexical representations that provide the basis for cross-language variation in bilingual speech.

Keywords: bilingualism, bilingual mental lexicon, code-switching, lexical variation

Procedia PDF Downloads 113
502 Domain Adaptive Dense Retrieval with Query Generation

Authors: Rui Yin, Haojie Wang, Xun Li

Abstract:

Recently, mainstream dense retrieval methods have obtained state-of-the-art results on some datasets and tasks. However, they require large amounts of training data, which is not available in most domains. The severe performance degradation of dense retrievers on new data domains has limited the use of dense retrieval methods to only a few domains with large training datasets. In this paper, we propose an unsupervised domain-adaptive approach based on query generation. First, a generative model is used to generate relevant queries for each passage in the target corpus, and then, the generated queries are used for mining negative passages. Finally, the query-passage pairs are labeled with a cross-encoder and used to train a domain-adapted dense retriever. We also explore contrastive learning as a method for training domain-adapted dense retrievers and show that it leads to strong performance in various retrieval settings. Experiments show that our approach is more robust than previous methods in target domains that require less unlabeled data.

Keywords: dense retrieval, query generation, contrastive learning, unsupervised training

Procedia PDF Downloads 59
501 Distinguishing Borrowings from Code Mixes: An Analysis of English Lexical Items Used in the Print Media in Sri Lanka

Authors: Chamindi Dilkushi Senaratne

Abstract:

Borrowing is the morphological, syntactic and (usually) phonological integration of lexical items from one language into the structure of another language. Borrowings show complete linguistic integration and due to the frequency of use become fossilized in the recipient language differentiating them from switches and mixes. Code mixes are different to borrowings. Code mixing takes place when speakers use lexical items in casual conversation to serve a variety of functions. This study presents an analysis of lexical items used in English newspapers in Sri Lanka in 2017 which reveal characteristics of borrowing or code mixes. Both phenomena arise due to language contact. The study will also use data from social media websites that comment on newspaper articles available on the web. The study reiterates that borrowings are distinguishable from code mixes and that they are two different phenomena that occur in language contact situations. The study also shows how existing morphological processes are used to create new vocabulary in language use. The study sheds light into how existing morphological processes are used by the bilingual to be creative, innovative and convey a bilingual identity.

Keywords: borrowing, code mixing, morphological processes

Procedia PDF Downloads 194
500 Working Memory and Phonological Short-Term Memory in the Acquisition of Academic Formulaic Language

Authors: Zhicheng Han

Abstract:

This study examines the correlation between knowledge of formulaic language, working memory (WM), and phonological short-term memory (PSTM) in Chinese L2 learners of English. This study investigates if WM and PSTM correlate differently to the acquisition of formulaic language, which may be relevant for the discourse around the conceptualization of formulas. Connectionist approaches have lead scholars to argue that formulas are form-meaning connections stored whole, making PSTM significant in the acquisitional process as it pertains to the storage and retrieval of chunk information. Generativist scholars, on the other hand, argued for active participation of interlanguage grammar in the acquisition and use of formulaic language, where formulas are represented in the mind but retain the internal structure built around a lexical core. This would make WM, especially the processing component of WM an important cognitive factor since it plays a role in processing and holding information for further analysis and manipulation. The current study asked L1 Chinese learners of English enrolled in graduate programs in China to complete a preference raking task where they rank their preference for formulas, grammatical non-formulaic expressions, and ungrammatical phrases with and without the lexical core in academic contexts. Participants were asked to rank the options in order of the likeliness of them encountering these phrases in the test sentences within academic contexts. Participants’ syntactic proficiency is controlled with a cloze test and grammar test. Regression analysis found a significant relationship between the processing component of WM and preference of formulaic expressions in the preference ranking task while no significant correlation is found for PSTM or syntactic proficiency. The correlational analysis found that WM, PSTM, and the two proficiency test scores have significant covariates. However, WM and PSTM have different predictor values for participants’ preference for formulaic language. Both storage and processing components of WM are significantly correlated with the preference for formulaic expressions while PSTM is not. These findings are in favor of the role of interlanguage grammar and syntactic knowledge in the acquisition of formulaic expressions. The differing effects of WM and PSTM suggest that selective attention to and processing of the input beyond simple retention play a key role in successfully acquiring formulaic language. Similar correlational patterns were found for preferring the ungrammatical phrase with the lexical core of the formula over the ones without the lexical core, attesting to learners’ awareness of the lexical core around which formulas are constructed. These findings support the view that formulaic phrases retain internal syntactic structures that are recognized and processed by the learners.

Keywords: formulaic language, working memory, phonological short-term memory, academic language

Procedia PDF Downloads 21
499 A Self-Built Corpus-Based Study of Four-Word Lexical Bundles in Native English Teachers’ EFL Classroom Discourse in Northeast China: The Significance of Stance

Authors: Fang Tan

Abstract:

This research focuses on the appropriate use of lexical bundles in spoken discourse, particularly in English as a Foreign Language (EFL) classrooms in Northeast China. While previous studies have mainly examined lexical bundles in written discourse, there is a need to investigate their usage in spoken discourse due to the limited availability of spoken discourse corpora. English teachers’ use of lexical bundles is crucial for effective teaching and communication in the EFL classroom. The aim of this study is to investigate the functions of four-word lexical bundles in native English teachers’ EFL oral English classes in Northeast China. Specifically, the research focuses on the usage of stance bundles, which were found to be the most significant type of bundle in the analyzed corpus. By comparing the self-built university spoken English classroom discourse corpus with the other self-built university English for General Purposes (EGP) corpus, the study aims to highlight the difference in bundle usage between native and non-native teachers in EFL classrooms. The research employs a corpus-based study. The observed corpus consists of more than 300,000 tokens, in which the data has been collected in the past five years. The reference corpus is composed of over 800,000 tokens, in which the data has been collected over 12 years. All the primary data collection involved transcribing and annotating spoken English classes taught by native English teachers. The analysis procedures included identifying and categorizing four-word lexical bundles, with specific emphasis on stance bundles. Frequency counts, and comparisons with the Chinese English teachers’ corpus were conducted to identify patterns and differences in bundle usage. The research addresses the following questions: 1) What are the functions of four-word lexical bundles in native English teachers’ EFL oral English classes? 2) How do stance bundles differ in usage between native and non-native English teachers’ classes? 3) What implications can be drawn for English teachers’ professional development based on the findings? In conclusion, this study provides valuable insights into the usage of four-word lexical bundles, particularly stance bundles, in native English teachers’ EFL oral English classes in Northeast China. The research highlights the difference in bundle usage between native and non-native English teachers’ classes and provides implications for English teachers’ professional development. The findings contribute to the understanding of lexical bundle usage in EFL classroom discourse and have theoretical importance for language teaching methodologies. The self-built university English classroom discourse corpus used in this research is a valuable resource for future studies in this field.

Keywords: EFL classroom discourse, four-word lexical bundles, stance, implication

Procedia PDF Downloads 32
498 Similarity Based Retrieval in Case Based Reasoning for Analysis of Medical Images

Authors: M. Dasgupta, S. Banerjee

Abstract:

Content Based Image Retrieval (CBIR) coupled with Case Based Reasoning (CBR) is a paradigm that is becoming increasingly popular in the diagnosis and therapy planning of medical ailments utilizing the digital content of medical images. This paper presents a survey of some of the promising approaches used in the detection of abnormalities in retina images as well in mammographic screening and detection of regions of interest in MRI scans of the brain. We also describe our proposed algorithm to detect hard exudates in fundus images of the retina of Diabetic Retinopathy patients.

Keywords: case based reasoning, exudates, retina image, similarity based retrieval

Procedia PDF Downloads 319
497 Contextual SenSe Model: Word Sense Disambiguation using Sense and Sense Value of Context Surrounding the Target

Authors: Vishal Raj, Noorhan Abbas

Abstract:

Ambiguity in NLP (Natural language processing) refers to the ability of a word, phrase, sentence, or text to have multiple meanings. This results in various kinds of ambiguities such as lexical, syntactic, semantic, anaphoric and referential am-biguities. This study is focused mainly on solving the issue of Lexical ambiguity. Word Sense Disambiguation (WSD) is an NLP technique that aims to resolve lexical ambiguity by determining the correct meaning of a word within a given context. Most WSD solutions rely on words for training and testing, but we have used lemma and Part of Speech (POS) tokens of words for training and testing. Lemma adds generality and POS adds properties of word into token. We have designed a novel method to create an affinity matrix to calculate the affinity be-tween any pair of lemma_POS (a token where lemma and POS of word are joined by underscore) of given training set. Additionally, we have devised an al-gorithm to create the sense clusters of tokens using affinity matrix under hierar-chy of POS of lemma. Furthermore, three different mechanisms to predict the sense of target word using the affinity/similarity value are devised. Each contex-tual token contributes to the sense of target word with some value and whichever sense gets higher value becomes the sense of target word. So, contextual tokens play a key role in creating sense clusters and predicting the sense of target word, hence, the model is named Contextual SenSe Model (CSM). CSM exhibits a noteworthy simplicity and explication lucidity in contrast to contemporary deep learning models characterized by intricacy, time-intensive processes, and chal-lenging explication. CSM is trained on SemCor training data and evaluated on SemEval test dataset. The results indicate that despite the naivety of the method, it achieves promising results when compared to the Most Frequent Sense (MFS) model.

Keywords: word sense disambiguation (wsd), contextual sense model (csm), most frequent sense (mfs), part of speech (pos), natural language processing (nlp), oov (out of vocabulary), lemma_pos (a token where lemma and pos of word are joined by underscore), information retrieval (ir), machine translation (mt)

Procedia PDF Downloads 63
496 A Framework of Product Information Service System Using Mobile Image Retrieval and Text Mining Techniques

Authors: Mei-Yi Wu, Shang-Ming Huang

Abstract:

The online shoppers nowadays often search the product information on the Internet using some keywords of products. To use this kind of information searching model, shoppers should have a preliminary understanding about their interesting products and choose the correct keywords. However, if the products are first contact (for example, the worn clothes or backpack of passengers which you do not have any idea about the brands), these products cannot be retrieved due to insufficient information. In this paper, we discuss and study the applications in E-commerce using image retrieval and text mining techniques. We design a reasonable E-commerce application system containing three layers in the architecture to provide users product information. The system can automatically search and retrieval similar images and corresponding web pages on Internet according to the target pictures which taken by users. Then text mining techniques are applied to extract important keywords from these retrieval web pages and search the prices on different online shopping stores with these keywords using a web crawler. Finally, the users can obtain the product information including photos and prices of their favorite products. The experiments shows the efficiency of proposed system.

Keywords: mobile image retrieval, text mining, product information service system, online marketing

Procedia PDF Downloads 327
495 Selection of Relevant Servers in Distributed Information Retrieval System

Authors: Benhamouda Sara, Guezouli Larbi

Abstract:

Nowadays, the dissemination of information touches the distributed world, where selecting the relevant servers to a user request is an important problem in distributed information retrieval. During the last decade, several research studies on this issue have been launched to find optimal solutions and many approaches of collection selection have been proposed. In this paper, we propose a new collection selection approach that takes into consideration the number of documents in a collection that contains terms of the query and the weights of those terms in these documents. We tested our method and our studies show that this technique can compete with other state-of-the-art algorithms that we choose to test the performance of our approach.

Keywords: distributed information retrieval, relevance, server selection, collection selection

Procedia PDF Downloads 255
494 Algorithm for Information Retrieval Optimization

Authors: Kehinde K. Agbele, Kehinde Daniel Aruleba, Eniafe F. Ayetiran

Abstract:

When using Information Retrieval Systems (IRS), users often present search queries made of ad-hoc keywords. It is then up to the IRS to obtain a precise representation of the user’s information need and the context of the information. This paper investigates optimization of IRS to individual information needs in order of relevance. The study addressed development of algorithms that optimize the ranking of documents retrieved from IRS. This study discusses and describes a Document Ranking Optimization (DROPT) algorithm for information retrieval (IR) in an Internet-based or designated databases environment. Conversely, as the volume of information available online and in designated databases is growing continuously, ranking algorithms can play a major role in the context of search results. In this paper, a DROPT technique for documents retrieved from a corpus is developed with respect to document index keywords and the query vectors. This is based on calculating the weight (

Keywords: information retrieval, document relevance, performance measures, personalization

Procedia PDF Downloads 210
493 Content-Based Image Retrieval Using HSV Color Space Features

Authors: Hamed Qazanfari, Hamid Hassanpour, Kazem Qazanfari

Abstract:

In this paper, a method is provided for content-based image retrieval. Content-based image retrieval system searches query an image based on its visual content in an image database to retrieve similar images. In this paper, with the aim of simulating the human visual system sensitivity to image's edges and color features, the concept of color difference histogram (CDH) is used. CDH includes the perceptually color difference between two neighboring pixels with regard to colors and edge orientations. Since the HSV color space is close to the human visual system, the CDH is calculated in this color space. In addition, to improve the color features, the color histogram in HSV color space is also used as a feature. Among the extracted features, efficient features are selected using entropy and correlation criteria. The final features extract the content of images most efficiently. The proposed method has been evaluated on three standard databases Corel 5k, Corel 10k and UKBench. Experimental results show that the accuracy of the proposed image retrieval method is significantly improved compared to the recently developed methods.

Keywords: content-based image retrieval, color difference histogram, efficient features selection, entropy, correlation

Procedia PDF Downloads 220
492 Shaping Lexical Concept of 'Mage' through Image Schemas in Dragon Age 'Origins'

Authors: Dean Raiyasmi, Elvi Citraresmana, Sutiono Mahdi

Abstract:

Language shapes the human mind and its concept toward things. Using image schemas, in nowadays technology, even AI (artificial intelligence) can concept things in response to their creator negativity or positivity. This is reflected inside one of the most selling game around the world in 2012 called Dragon Age Origins. The AI in form of NPC (Non-Playable Character) inside the game reflects on the creator of the game on negativity or positivity toward the lexical concept of mage. Through image schemas, shaping the lexical concept of mage deemed possible and proved the negativity or positivity creator of the game toward mage. This research analyses the cognitive-semantic process of image schema and shaping the concept of ‘mage’ by describing kinds of image schemas exist in the Dragon Age Origin Game. This research is also aimed to analyse kinds of image schemas and describing the image schemas which shaping the concept of ‘mage’ itself. The methodology used in this research is qualitative where participative observation is employed with five stages and documentation. The results shows that there are four image schemas exist in the game and those image schemas shaping the lexical concept of ‘mage’.

Keywords: cognitive semantic, image-schema, conceptual metaphor, video game

Procedia PDF Downloads 406
491 Urdu Text Extraction Method from Images

Authors: Samabia Tehsin, Sumaira Kausar

Abstract:

Due to the vast increase in the multimedia data in recent years, efficient and robust retrieval techniques are needed to retrieve and index images/ videos. Text embedded in the images can serve as the strong retrieval tool for images. This is the reason that text extraction is an area of research with increasing attention. English text extraction is the focus of many researchers but very less work has been done on other languages like Urdu. This paper is focusing on Urdu text extraction from video frames. This paper presents a text detection feature set, which has the ability to deal up with most of the problems connected with the text extraction process. To test the validity of the method, it is tested on Urdu news dataset, which gives promising results.

Keywords: caption text, content-based image retrieval, document analysis, text extraction

Procedia PDF Downloads 480
490 SIFT and Perceptual Zoning Applied to CBIR Systems

Authors: Simone B. K. Aires, Cinthia O. de A. Freitas, Luiz E. S. Oliveira

Abstract:

This paper contributes to the CBIR systems applied to trademark retrieval. The proposed model includes aspects from visual perception of the shapes, by means of feature extractor associated to a non-symmetrical perceptual zoning mechanism based on the Principles of Gestalt. Thus, the feature set were performed using Scale Invariant Feature Transform (SIFT). We carried out experiments using four different zonings strategies (Z = 4, 5H, 5V, 7) for matching and retrieval tasks. Our proposal method achieved the normalized recall (Rn) equal to 0.84. Experiments show that the non-symmetrical zoning could be considered as a tool to build more reliable trademark retrieval systems.

Keywords: CBIR, Gestalt, matching, non-symmetrical zoning, SIFT

Procedia PDF Downloads 281
489 Image Retrieval Based on Multi-Feature Fusion for Heterogeneous Image Databases

Authors: N. W. U. D. Chathurani, Shlomo Geva, Vinod Chandran, Proboda Rajapaksha

Abstract:

Selecting an appropriate image representation is the most important factor in implementing an effective Content-Based Image Retrieval (CBIR) system. This paper presents a multi-feature fusion approach for efficient CBIR, based on the distance distribution of features and relative feature weights at the time of query processing. It is a simple yet effective approach, which is free from the effect of features' dimensions, ranges, internal feature normalization and the distance measure. This approach can easily be adopted in any feature combination to improve retrieval quality. The proposed approach is empirically evaluated using two benchmark datasets for image classification (a subset of the Corel dataset and Oliva and Torralba) and compared with existing approaches. The performance of the proposed approach is confirmed with the significantly improved performance in comparison with the independently evaluated baseline of the previously proposed feature fusion approaches.

Keywords: feature fusion, image retrieval, membership function, normalization

Procedia PDF Downloads 320
488 Enhanced Arabic Semantic Information Retrieval System Based on Arabic Text Classification

Authors: A. Elsehemy, M. Abdeen , T. Nazmy

Abstract:

Since the appearance of the Semantic web, many semantic search techniques and models were proposed to exploit the information in ontology to enhance the traditional keyword-based search. Many advances were made in languages such as English, German, French and Spanish. However, other languages such as Arabic are not fully supported yet. In this paper we present a framework for ontology based information retrieval for Arabic language. Our system consists of four main modules, namely query parser, indexer, search and a ranking module. Our approach includes building a semantic index by linking ontology concepts to documents, including an annotation weight for each link, to be used in ranking the results. We also augmented the framework with an automatic document categorizer, which enhances the overall document ranking. We have built three Arabic domain ontologies: Sports, Economic and Politics as example for the Arabic language. We built a knowledge base that consists of 79 classes and more than 1456 instances. The system is evaluated using the precision and recall metrics. We have done many retrieval operations on a sample of 40,316 documents with a size 320 MB of pure text. The results show that the semantic search enhanced with text classification gives better performance results than the system without classification.

Keywords: Arabic text classification, ontology based retrieval, Arabic semantic web, information retrieval, Arabic ontology

Procedia PDF Downloads 497
487 Semantic Indexing Improvement for Textual Documents: Contribution of Classification by Fuzzy Association Rules

Authors: Mohsen Maraoui

Abstract:

In the aim of natural language processing applications improvement, such as information retrieval, machine translation, lexical disambiguation, we focus on statistical approach to semantic indexing for multilingual text documents based on conceptual network formalism. We propose to use this formalism as an indexing language to represent the descriptive concepts and their weighting. These concepts represent the content of the document. Our contribution is based on two steps. In the first step, we propose the extraction of index terms using the multilingual lexical resource Euro WordNet (EWN). In the second step, we pass from the representation of index terms to the representation of index concepts through conceptual network formalism. This network is generated using the EWN resource and pass by a classification step based on association rules model (in attempt to discover the non-taxonomic relations or contextual relations between the concepts of a document). These relations are latent relations buried in the text and carried by the semantic context of the co-occurrence of concepts in the document. Our proposed indexing approach can be applied to text documents in various languages because it is based on a linguistic method adapted to the language through a multilingual thesaurus. Next, we apply the same statistical process regardless of the language in order to extract the significant concepts and their associated weights. We prove that the proposed indexing approach provides encouraging results.

Keywords: concept extraction, conceptual network formalism, fuzzy association rules, multilingual thesaurus, semantic indexing

Procedia PDF Downloads 117
486 EEG Correlates of Trait and Mathematical Anxiety during Lexical and Numerical Error-Recognition Tasks

Authors: Alexander N. Savostyanov, Tatiana A. Dolgorukova, Elena A. Esipenko, Mikhail S. Zaleshin, Margherita Malanchini, Anna V. Budakova, Alexander E. Saprygin, Tatiana A. Golovko, Yulia V. Kovas

Abstract:

EEG correlates of mathematical and trait anxiety level were studied in 52 healthy Russian-speakers during execution of error-recognition tasks with lexical, arithmetic and algebraic conditions. Event-related spectral perturbations were used as a measure of brain activity. The ERSP plots revealed alpha/beta desynchronizations within a 500-3000 ms interval after task onset and slow-wave synchronization within an interval of 150-350 ms. Amplitudes of these intervals reflected the accuracy of error recognition, and were differently associated with the three conditions. The correlates of anxiety were found in theta (4-8 Hz) and beta2 (16-20 Hz) frequency bands. In theta band the effects of mathematical anxiety were stronger expressed in lexical, than in arithmetic and algebraic condition. The mathematical anxiety effects in theta band were associated with differences between anterior and posterior cortical areas, whereas the effects of trait anxiety were associated with inter-hemispherical differences. In beta1 and beta2 bands effects of trait and mathematical anxiety were directed oppositely. The trait anxiety was associated with increase of amplitude of desynchronization, whereas the mathematical anxiety was associated with decrease of this amplitude. The effect of mathematical anxiety in beta2 band was insignificant for lexical condition but was the strongest in algebraic condition. EEG correlates of anxiety in theta band could be interpreted as indexes of task emotionality, whereas the reaction in beta2 band is related to tension of intellectual resources.

Keywords: EEG, brain activity, lexical and numerical error-recognition tasks, mathematical and trait anxiety

Procedia PDF Downloads 534
485 Incorporating Lexical-Semantic Knowledge into Convolutional Neural Network Framework for Pediatric Disease Diagnosis

Authors: Xiaocong Liu, Huazhen Wang, Ting He, Xiaozheng Li, Weihan Zhang, Jian Chen

Abstract:

The utilization of electronic medical record (EMR) data to establish the disease diagnosis model has become an important research content of biomedical informatics. Deep learning can automatically extract features from the massive data, which brings about breakthroughs in the study of EMR data. The challenge is that deep learning lacks semantic knowledge, which leads to impracticability in medical science. This research proposes a method of incorporating lexical-semantic knowledge from abundant entities into a convolutional neural network (CNN) framework for pediatric disease diagnosis. Firstly, medical terms are vectorized into Lexical Semantic Vectors (LSV), which are concatenated with the embedded word vectors of word2vec to enrich the feature representation. Secondly, the semantic distribution of medical terms serves as Semantic Decision Guide (SDG) for the optimization of deep learning models. The study evaluate the performance of LSV-SDG-CNN model on four kinds of Chinese EMR datasets. Additionally, CNN, LSV-CNN, and SDG-CNN are designed as baseline models for comparison. The experimental results show that LSV-SDG-CNN model outperforms baseline models on four kinds of Chinese EMR datasets. The best configuration of the model yielded an F1 score of 86.20%. The results clearly demonstrate that CNN has been effectively guided and optimized by lexical-semantic knowledge, and LSV-SDG-CNN model improves the disease classification accuracy with a clear margin.

Keywords: convolutional neural network, electronic medical record, feature representation, lexical semantics, semantic decision

Procedia PDF Downloads 105
484 Design an Algorithm for Software Development in CBSE Envrionment Using Feed Forward Neural Network

Authors: Amit Verma, Pardeep Kaur

Abstract:

In software development organizations, Component based Software engineering (CBSE) is emerging paradigm for software development and gained wide acceptance as it often results in increase quality of software product within development time and budget. In component reusability, main challenges are the right component identification from large repositories at right time. The major objective of this work is to provide efficient algorithm for storage and effective retrieval of components using neural network and parameters based on user choice through clustering. This research paper aims to propose an algorithm that provides error free and automatic process (for retrieval of the components) while reuse of the component. In this algorithm, keywords (or components) are extracted from software document, after by applying k mean clustering algorithm. Then weights assigned to those keywords based on their frequency and after assigning weights, ANN predicts whether correct weight is assigned to keywords (or components) or not, otherwise it back propagates in to initial step (re-assign the weights). In last, store those all keywords into repositories for effective retrieval. Proposed algorithm is very effective in the error correction and detection with user base choice while choice of component for reusability for efficient retrieval is there.

Keywords: component based development, clustering, back propagation algorithm, keyword based retrieval

Procedia PDF Downloads 356
483 Augmented Reality Technology for a User Interface in an Automated Storage and Retrieval System

Authors: Wen-Jye Shyr, Chun-Yuan Chang, Bo-Lin Wei, Chia-Ming Lin

Abstract:

The task of creating an augmented reality technology was described in this study to give operators a user interface that might be a part of an automated storage and retrieval system. Its objective was to give graduate engineering and technology students a system of tools with which to experiment with the creation of augmented reality technologies. To collect and analyze data for maintenance applications, the students used augmented reality technology. Our findings support the evolution of artificial intelligence towards Industry 4.0 practices and the planned Industry 4.0 research stream. Important first insights into the study's effects on student learning were presented.

Keywords: augmented reality, storage and retrieval system, user interface, programmable logic controller

Procedia PDF Downloads 46
482 In-Fun-Mation: Putting the Fun in Information Retrieval at the Linnaeus University, Sweden

Authors: Aagesson, Ekstrand, Persson, Sallander

Abstract:

A description of how a team of librarians at Linnaeus University Library in Sweden utilizes a pedagogical approach to deliver engaging digital workshops on information retrieval. The team consists of four librarians supporting three different faculties. The paper discusses the challenges faced in engaging students who may perceive information retrieval as a boring and difficult subject. The paper emphasizes the importance of motivation, inclusivity, constructive feedback, and collaborative learning in enhancing student engagement. By employing a two-librarian teaching model, maintaining a lighthearted approach, and relating information retrieval to everyday experiences, the team aimed to create an enjoyable and meaningful learning experience. The authors describe their approach to increase student engagement and learning outcomes through a three-phase workshop structure: before, during, and after the workshops. The "flipped classroom" method was used, where students were provided with pre-workshop materials, including a short film on information search and encouraged to reflect on the topic using a digital collaboration tool. During the workshops, interactive elements such as quizzes, live demonstrations, and practical training were incorporated, along with opportunities for students to ask questions and provide feedback. The paper concludes by highlighting the benefits of the flipped classroom approach and the extended learning opportunities provided by the before and after workshop phases. The authors believe that their approach offers a sustainable alternative for enhancing information retrieval knowledge among students at Linnaeus University.

Keywords: digital workshop, flipped classroom, information retrieval, interactivity, LIS practitioner, student engagement

Procedia PDF Downloads 35
481 Lexical Collocations in Medical Articles of Non-Native vs Native English-Speaking Researchers

Authors: Waleed Mandour

Abstract:

This study presents multidimensional scrutiny of Benson et al.’s seven-category taxonomy of lexical collocations used by Egyptian medical authors and their peers of native-English speakers. It investigates 212 medical papers, all published during a span of 6 years (from 2013 to 2018). The comparison is held to the medical research articles submitted by native speakers of English (25,238 articles in total with over 103 million words) as derived from the Directory of Open Access Journals (a 2.7 billion-word corpus). The non-native speakers compiled corpus was properly annotated and marked-up manually by the researcher according to the standards of Weisser. In terms of statistical comparisons, though, deployed were the conventional frequency-based analysis besides the relevant criteria, such as association measures (AMs) in which LogDice is deployed as per the recommendation of Kilgariff et al. when comparing large corpora. Despite the terminological convergence in the subject corpora, comparison results confirm the previous literature of which the non-native speakers’ compositions reveal limited ranges of lexical collocations in terms of their distribution. However, there is a ubiquitous tendency of overusing the NS-high-frequency multi-words in all lexical categories investigated. Furthermore, Egyptian authors, conversely to their English-speaking peers, tend to embrace more collocations denoting quantitative rather than qualitative analyses in their produced papers. This empirical work, per se, contributes to the English for Academic Purposes (EAP) and English as a Lingua Franca in Academic settings (ELFA). In addition, there are pedagogical implications that would promote a better quality of medical research papers published in Egyptian universities.

Keywords: corpus linguistics, EAP, ELFA, lexical collocations, medical discourse

Procedia PDF Downloads 101
480 The Impact of the Lexical Quality Hypothesis and the Self-Teaching Hypothesis on Reading Ability

Authors: Anastasios Ntousas

Abstract:

The purpose of the following paper is to analyze the relationship between the lexical quality and the self-teaching hypothesis and their impact on the reading ability. The following questions emerged, is there a correlation between the effective reading experience that the lexical quality hypothesis proposes and the self-teaching hypothesis, would the ability to read by analogy facilitate and create stable, synchronized four-word representational, and would word morphological knowledge be a possible extension of the self-teaching hypothesis. The lexical quality hypothesis speculates that words include four representational attributes, phonology, orthography, morpho-syntax, and meaning. Those four-word representations work together to make word reading an effective task. A possible lack of knowledge in one of the representations might disrupt reading comprehension. The degree that the four-word features connect together makes high and low lexical word quality representations. When the four-word representational attributes connect together effectively, readers have a high lexical quality of words; however, when they hardly have a strong connection with each other, readers have a low lexical quality of words. Furthermore, the self-teaching hypothesis proposes that phonological recoding enables printed word learning. Phonological knowledge and reading experience facilitate the acquisition and consolidation of specific-word orthographies. The reading experience is related to strong reading comprehension. The more readers have contact with texts, the better readers they become. Therefore, their phonological knowledge, as the self-teaching hypothesis suggests, might have a facilitative impact on the consolidation of the orthographical, morphological-syntax and meaning representations of unknown words. The phonology of known words might activate effectively the rest of the representational features of words. Readers use their existing phonological knowledge of similarly spelt words to pronounce unknown words; a possible transference of this ability to read by analogy will appear with readers’ morphological knowledge. Morphemes might facilitate readers’ ability to pronounce and spell new unknown words in which they do not have lexical access. Readers will encounter unknown words with similarly phonemes and morphemes but with different meanings. Knowledge of phonology and morphology might support and increase reading comprehension. There was a careful selection, discussion of theoretical material and comparison of the two existing theories. Evidence shows that morphological knowledge improves reading ability and comprehension, so morphological knowledge might be a possible extension of the self-teaching hypothesis, the fundamental skill to read by analogy can be implemented to the consolidation of word – specific orthographies via readers’ morphological knowledge, and there is a positive correlation between effective reading experience and self-teaching hypothesis.

Keywords: morphology, orthography, reading ability, reading comprehension

Procedia PDF Downloads 92
479 Lexical-Semantic Deficits in Sinhala Speaking Persons with Post Stroke Aphasia: Evidence from Single Word Auditory Comprehension Task

Authors: D. W. M. S. Samarathunga, Isuru Dharmarathne

Abstract:

In aphasia, various levels of symbolic language processing (semantics) are affected. It is shown that Persons with Aphasia (PWA) often experience more problems comprehending some categories of words than others. The study aimed to determine lexical semantic deficits seen in Auditory Comprehension (AC) and to describe lexical-semantic deficits across six selected word categories. Thirteen (n =13) persons diagnosed with post-stroke aphasia (PSA) were recruited to perform an AC task. Foods, objects, clothes, vehicles, body parts and animals were selected as the six categories. As the test stimuli, black and white line drawings were adapted from a picture set developed for semantic studies by Snodgrass and Vanderwart. A pilot study was conducted with five (n=5) healthy nonbrain damaged Sinhala speaking adults to decide familiarity and applicability of the test material. In the main study, participants were scored based on the accuracy and number of errors shown. The results indicate similar trends of lexical semantic deficits identified in the literature confirming ‘animals’ to be the easiest category to comprehend. Mann-Whitney U test was performed to determine the association between the selected variables and the participants’ performance on AC task. No statistical significance was found between the errors and the type of aphasia reflecting similar patterns described in aphasia literature in other languages. The current study indicates the presence of selectivity of lexical semantic deficits in AC and a hierarchy was developed based on the complexity of the categories to comprehend by Sinhala speaking PWA, which might be clinically beneficial when improving language skills of Sinhala speaking persons with post-stroke aphasia. However, further studies on aphasia should be conducted with larger samples for a longer period to study deficits in Sinhala and other Sri Lankan languages (Tamil and Malay).

Keywords: aphasia, auditory comprehension, selective lexical-semantic deficits, semantic categories

Procedia PDF Downloads 223
478 Identification of Text Domains and Register Variation through the Analysis of Lexical Distribution in a Bangla Mass Media Text Corpus

Authors: Mahul Bhattacharyya, Niladri Sekhar Dash

Abstract:

The present research paper is an experimental attempt to investigate the nature of variation in the register in three major text domains, namely, social, cultural, and political texts collected from the corpus of Bangla printed mass media texts. This present study uses a corpus of a moderate amount of Bangla mass media text that contains nearly one million words collected from different media sources like newspapers, magazines, advertisements, periodicals, etc. The analysis of corpus data reveals that each text has certain lexical properties that not only control their identity but also mark their uniqueness across the domains. At first, the subject domains of the texts are classified into two parameters namely, ‘Genre' and 'Text Type'. Next, some empirical investigations are made to understand how the domains vary from each other in terms of lexical properties like both function and content words. Here the method of comparative-cum-contrastive matching of lexical load across domains is invoked through word frequency count to track how domain-specific words and terms may be marked as decisive indicators in the act of specifying the textual contexts and subject domains. The study shows that the common lexical stock that percolates across all text domains are quite dicey in nature as their lexicological identity does not have any bearing in the act of specifying subject domains. Therefore, it becomes necessary for language users to anchor upon certain domain-specific lexical items to recognize a text that belongs to a specific text domain. The eventual findings of this study confirm that texts belonging to different subject domains in Bangla news text corpus clearly differ on the parameters of lexical load, lexical choice, lexical clustering, lexical collocation. In fact, based on these parameters, along with some statistical calculations, it is possible to classify mass media texts into different types to mark their relation with regard to the domains they should actually belong. The advantage of this analysis lies in the proper identification of the linguistic factors which will give language users a better insight into the method they employ in text comprehension, as well as construct a systemic frame for designing text identification strategy for language learners. The availability of huge amount of Bangla media text data is useful for achieving accurate conclusions with a certain amount of reliability and authenticity. This kind of corpus-based analysis is quite relevant for a resource-poor language like Bangla, as no attempt has ever been made to understand how the structure and texture of Bangla mass media texts vary due to certain linguistic and extra-linguistic constraints that are actively operational to specific text domains. Since mass media language is assumed to be the most 'recent representation' of the actual use of the language, this study is expected to show how the Bangla news texts reflect the thoughts of the society and how they leave a strong impact on the thought process of the speech community.

Keywords: Bangla, corpus, discourse, domains, lexical choice, mass media, register, variation

Procedia PDF Downloads 150