Search results for: lexical retrieval
528 Examining the Development of Complexity, Accuracy and Fluency in L2 Learners' Writing after L2 Instruction
Authors: Khaled Barkaoui
Abstract:
Research on second-language (L2) learning tends to focus on comparing students with different levels of proficiency at one point in time. However, to understand L2 development, we need more longitudinal research. In this study, we adopt a longitudinal approach to examine changes in three indicators of L2 ability, complexity, accuracy, and fluency (CAF), as reflected in the writing of L2 learners when writing on different tasks before and after a period L2 instruction. Each of 85 Chinese learners of English at three levels of English language proficiency responded to two writing tasks (independent and integrated) before and after nine months of English-language study in China. Each essay (N= 276) was analyzed in terms of numerous CAF indices using both computer coding and human rating: number of words written, number of errors per 100 words, ratings of error severity, global syntactic complexity (MLS), complexity by coordination (T/S), complexity by subordination (C/T), clausal complexity (MLC), phrasal complexity (NP density), syntactic variety, lexical density, lexical variation, lexical sophistication, and lexical bundles. Results were then compared statistically across tasks, L2 proficiency levels, and time. Overall, task type had significant effects on fluency and some syntactic complexity indices (complexity by coordination, structural variety, clausal complexity, phrase complexity) and lexical density, sophistication, and bundles, but not accuracy. L2 proficiency had significant effects on fluency, accuracy, and lexical variation, but not syntactic complexity. Finally, fluency, frequency of errors, but not accuracy ratings, syntactic complexity indices (clausal complexity, global complexity, complexity by subordination, phrase complexity, structural variety) and lexical complexity (lexical density, variation, and sophistication) exhibited significant changes after instruction, particularly for the independent task. We discuss the findings and their implications for assessment, instruction, and research on CAF in the context of L2 writing.Keywords: second language writing, Fluency, accuracy, complexity, longitudinal
Procedia PDF Downloads 153527 Local Texture and Global Color Descriptors for Content Based Image Retrieval
Authors: Tajinder Kaur, Anu Bala
Abstract:
An image retrieval system is a computer system for browsing, searching, and retrieving images from a large database of digital images a new algorithm meant for content-based image retrieval (CBIR) is presented in this paper. The proposed method combines the color and texture features which are extracted the global and local information of the image. The local texture feature is extracted by using local binary patterns (LBP), which are evaluated by taking into consideration of local difference between the center pixel and its neighbors. For the global color feature, the color histogram (CH) is used which is calculated by RGB (red, green, and blue) spaces separately. In this paper, the combination of color and texture features are proposed for content-based image retrieval. The performance of the proposed method is tested on Corel 1000 database which is the natural database. The results after being investigated show a significant improvement in terms of their evaluation measures as compared to LBP and CH.Keywords: color, texture, feature extraction, local binary patterns, image retrieval
Procedia PDF Downloads 368526 A Graph-Based Retrieval Model for Passage Search
Authors: Junjie Zhong, Kai Hong, Lei Wang
Abstract:
Passage Retrieval (PR) plays an important role in many Natural Language Processing (NLP) tasks. Traditional efficient retrieval models relying on exact term-matching, such as TF-IDF or BM25, have nowadays been exceeded by pre-trained language models which match by semantics. Though they gain effectiveness, deep language models often require large memory as well as time cost. To tackle the trade-off between efficiency and effectiveness in PR, this paper proposes Graph Passage Retriever (GraphPR), a graph-based model inspired by the development of graph learning techniques. Different from existing works, GraphPR is end-to-end and integrates both term-matching information and semantics. GraphPR constructs a passage-level graph from BM25 retrieval results and trains a GCN-like model on the graph with graph-based objectives. Passages were regarded as nodes in the constructed graph and were embedded in dense vectors. PR can then be implemented using embeddings and a fast vector-similarity search. Experiments on a variety of real-world retrieval datasets show that the proposed model outperforms related models in several evaluation metrics (e.g., mean reciprocal rank, accuracy, F1-scores) while maintaining a relatively low query latency and memory usage.Keywords: efficiency, effectiveness, graph learning, language model, passage retrieval, term-matching model
Procedia PDF Downloads 155525 Biomedical Definition Extraction Using Machine Learning with Synonymous Feature
Authors: Jian Qu, Akira Shimazu
Abstract:
OOV (Out Of Vocabulary) terms are terms that cannot be found in many dictionaries. Although it is possible to translate such OOV terms, the translations do not provide any real information for a user. We present an OOV term definition extraction method by using information available from the Internet. We use features such as occurrence of the synonyms and location distances. We apply machine learning method to find the correct definitions for OOV terms. We tested our method on both biomedical type and name type OOV terms, our work outperforms existing work with an accuracy of 86.5%.Keywords: information retrieval, definition retrieval, OOV (out of vocabulary), biomedical information retrieval
Procedia PDF Downloads 496524 Cross-Language Variation and the ‘Fused’ Zone in Bilingual Mental Lexicon: An Experimental Research
Authors: Yuliya E. Leshchenko, Tatyana S. Ostapenko
Abstract:
Language variation is a widespread linguistic phenomenon which can affect different levels of a language system: phonological, morphological, lexical, syntactic, etc. It is obvious that the scope of possible standard alternations within a particular language is limited by a variety of its norms and regulations which set more or less clear boundaries for what is possible and what is not possible for the speakers. The possibility of lexical variation (alternate usage of lexical items within the same contexts) is based on the fact that the meanings of words are not clearly and rigidly defined in the consciousness of the speakers. Therefore, lexical variation is usually connected with unstable relationship between words and their referents: a case when a particular lexical item refers to different types of referents, or when a particular referent can be named by various lexical items. We assume that the scope of lexical variation in bilingual speech is generally wider than that observed in monolingual speech due to the fact that, besides ‘lexical item – referent’ relations it involves the possibility of cross-language variation of L1 and L2 lexical items. We use the term ‘cross-language variation’ to denote a case when two equivalent words of different languages are treated by a bilingual speaker as freely interchangeable within the common linguistic context. As distinct from code-switching which is traditionally defined as the conscious use of more than one language within one communicative act, in case of cross-language lexical variation the speaker does not perceive the alternate lexical items as belonging to different languages and, therefore, does not realize the change of language code. In the paper, the authors present research of lexical variation of adult Komi-Permyak – Russian bilingual speakers. The two languages co-exist on the territory of the Komi-Permyak District in Russia (Komi-Permyak as the ethnic language and Russian as the official state language), are usually acquired from birth in natural linguistic environment and, according to the data of sociolinguistic surveys, are both identified by the speakers as coordinate mother tongues. The experimental research demonstrated that alternation of Komi-Permyak and Russian words within one utterance/phrase is highly frequent both in speech perception and production. Moreover, our participants estimated cross-language word combinations like ‘маленькая /Russian/ нывка /Komi-Permyak/’ (‘a little girl’) or ‘мунны /Komi-Permyak/ домой /Russian/’ (‘go home’) as regular/habitual, containing no violation of any linguistic rules and being equally possible in speech as the equivalent intra-language word combinations (‘учöтик нывка’ /Komi-Permyak/ or ‘идти домой’ /Russian/). All the facts considered, we claim that constant concurrent use of the two languages results in the fact that a large number of their words tend to be intuitively interpreted by the speakers as lexical variants not only related to the same referent, but also referring to both languages or, more precisely, to none of them in particular. Consequently, we can suppose that bilingual mental lexicon includes an extensive ‘fused’ zone of lexical representations that provide the basis for cross-language variation in bilingual speech.Keywords: bilingualism, bilingual mental lexicon, code-switching, lexical variation
Procedia PDF Downloads 149523 Domain Adaptive Dense Retrieval with Query Generation
Authors: Rui Yin, Haojie Wang, Xun Li
Abstract:
Recently, mainstream dense retrieval methods have obtained state-of-the-art results on some datasets and tasks. However, they require large amounts of training data, which is not available in most domains. The severe performance degradation of dense retrievers on new data domains has limited the use of dense retrieval methods to only a few domains with large training datasets. In this paper, we propose an unsupervised domain-adaptive approach based on query generation. First, a generative model is used to generate relevant queries for each passage in the target corpus, and then, the generated queries are used for mining negative passages. Finally, the query-passage pairs are labeled with a cross-encoder and used to train a domain-adapted dense retriever. We also explore contrastive learning as a method for training domain-adapted dense retrievers and show that it leads to strong performance in various retrieval settings. Experiments show that our approach is more robust than previous methods in target domains that require less unlabeled data.Keywords: dense retrieval, query generation, contrastive learning, unsupervised training
Procedia PDF Downloads 105522 Distinguishing Borrowings from Code Mixes: An Analysis of English Lexical Items Used in the Print Media in Sri Lanka
Authors: Chamindi Dilkushi Senaratne
Abstract:
Borrowing is the morphological, syntactic and (usually) phonological integration of lexical items from one language into the structure of another language. Borrowings show complete linguistic integration and due to the frequency of use become fossilized in the recipient language differentiating them from switches and mixes. Code mixes are different to borrowings. Code mixing takes place when speakers use lexical items in casual conversation to serve a variety of functions. This study presents an analysis of lexical items used in English newspapers in Sri Lanka in 2017 which reveal characteristics of borrowing or code mixes. Both phenomena arise due to language contact. The study will also use data from social media websites that comment on newspaper articles available on the web. The study reiterates that borrowings are distinguishable from code mixes and that they are two different phenomena that occur in language contact situations. The study also shows how existing morphological processes are used to create new vocabulary in language use. The study sheds light into how existing morphological processes are used by the bilingual to be creative, innovative and convey a bilingual identity.Keywords: borrowing, code mixing, morphological processes
Procedia PDF Downloads 221521 Working Memory and Phonological Short-Term Memory in the Acquisition of Academic Formulaic Language
Authors: Zhicheng Han
Abstract:
This study examines the correlation between knowledge of formulaic language, working memory (WM), and phonological short-term memory (PSTM) in Chinese L2 learners of English. This study investigates if WM and PSTM correlate differently to the acquisition of formulaic language, which may be relevant for the discourse around the conceptualization of formulas. Connectionist approaches have lead scholars to argue that formulas are form-meaning connections stored whole, making PSTM significant in the acquisitional process as it pertains to the storage and retrieval of chunk information. Generativist scholars, on the other hand, argued for active participation of interlanguage grammar in the acquisition and use of formulaic language, where formulas are represented in the mind but retain the internal structure built around a lexical core. This would make WM, especially the processing component of WM an important cognitive factor since it plays a role in processing and holding information for further analysis and manipulation. The current study asked L1 Chinese learners of English enrolled in graduate programs in China to complete a preference raking task where they rank their preference for formulas, grammatical non-formulaic expressions, and ungrammatical phrases with and without the lexical core in academic contexts. Participants were asked to rank the options in order of the likeliness of them encountering these phrases in the test sentences within academic contexts. Participants’ syntactic proficiency is controlled with a cloze test and grammar test. Regression analysis found a significant relationship between the processing component of WM and preference of formulaic expressions in the preference ranking task while no significant correlation is found for PSTM or syntactic proficiency. The correlational analysis found that WM, PSTM, and the two proficiency test scores have significant covariates. However, WM and PSTM have different predictor values for participants’ preference for formulaic language. Both storage and processing components of WM are significantly correlated with the preference for formulaic expressions while PSTM is not. These findings are in favor of the role of interlanguage grammar and syntactic knowledge in the acquisition of formulaic expressions. The differing effects of WM and PSTM suggest that selective attention to and processing of the input beyond simple retention play a key role in successfully acquiring formulaic language. Similar correlational patterns were found for preferring the ungrammatical phrase with the lexical core of the formula over the ones without the lexical core, attesting to learners’ awareness of the lexical core around which formulas are constructed. These findings support the view that formulaic phrases retain internal syntactic structures that are recognized and processed by the learners.Keywords: formulaic language, working memory, phonological short-term memory, academic language
Procedia PDF Downloads 63520 A Self-Built Corpus-Based Study of Four-Word Lexical Bundles in Native English Teachers’ EFL Classroom Discourse in Northeast China: The Significance of Stance
Authors: Fang Tan
Abstract:
This research focuses on the appropriate use of lexical bundles in spoken discourse, particularly in English as a Foreign Language (EFL) classrooms in Northeast China. While previous studies have mainly examined lexical bundles in written discourse, there is a need to investigate their usage in spoken discourse due to the limited availability of spoken discourse corpora. English teachers’ use of lexical bundles is crucial for effective teaching and communication in the EFL classroom. The aim of this study is to investigate the functions of four-word lexical bundles in native English teachers’ EFL oral English classes in Northeast China. Specifically, the research focuses on the usage of stance bundles, which were found to be the most significant type of bundle in the analyzed corpus. By comparing the self-built university spoken English classroom discourse corpus with the other self-built university English for General Purposes (EGP) corpus, the study aims to highlight the difference in bundle usage between native and non-native teachers in EFL classrooms. The research employs a corpus-based study. The observed corpus consists of more than 300,000 tokens, in which the data has been collected in the past five years. The reference corpus is composed of over 800,000 tokens, in which the data has been collected over 12 years. All the primary data collection involved transcribing and annotating spoken English classes taught by native English teachers. The analysis procedures included identifying and categorizing four-word lexical bundles, with specific emphasis on stance bundles. Frequency counts, and comparisons with the Chinese English teachers’ corpus were conducted to identify patterns and differences in bundle usage. The research addresses the following questions: 1) What are the functions of four-word lexical bundles in native English teachers’ EFL oral English classes? 2) How do stance bundles differ in usage between native and non-native English teachers’ classes? 3) What implications can be drawn for English teachers’ professional development based on the findings? In conclusion, this study provides valuable insights into the usage of four-word lexical bundles, particularly stance bundles, in native English teachers’ EFL oral English classes in Northeast China. The research highlights the difference in bundle usage between native and non-native English teachers’ classes and provides implications for English teachers’ professional development. The findings contribute to the understanding of lexical bundle usage in EFL classroom discourse and have theoretical importance for language teaching methodologies. The self-built university English classroom discourse corpus used in this research is a valuable resource for future studies in this field.Keywords: EFL classroom discourse, four-word lexical bundles, stance, implication
Procedia PDF Downloads 66519 Similarity Based Retrieval in Case Based Reasoning for Analysis of Medical Images
Authors: M. Dasgupta, S. Banerjee
Abstract:
Content Based Image Retrieval (CBIR) coupled with Case Based Reasoning (CBR) is a paradigm that is becoming increasingly popular in the diagnosis and therapy planning of medical ailments utilizing the digital content of medical images. This paper presents a survey of some of the promising approaches used in the detection of abnormalities in retina images as well in mammographic screening and detection of regions of interest in MRI scans of the brain. We also describe our proposed algorithm to detect hard exudates in fundus images of the retina of Diabetic Retinopathy patients.Keywords: case based reasoning, exudates, retina image, similarity based retrieval
Procedia PDF Downloads 348518 Contextual SenSe Model: Word Sense Disambiguation using Sense and Sense Value of Context Surrounding the Target
Authors: Vishal Raj, Noorhan Abbas
Abstract:
Ambiguity in NLP (Natural language processing) refers to the ability of a word, phrase, sentence, or text to have multiple meanings. This results in various kinds of ambiguities such as lexical, syntactic, semantic, anaphoric and referential am-biguities. This study is focused mainly on solving the issue of Lexical ambiguity. Word Sense Disambiguation (WSD) is an NLP technique that aims to resolve lexical ambiguity by determining the correct meaning of a word within a given context. Most WSD solutions rely on words for training and testing, but we have used lemma and Part of Speech (POS) tokens of words for training and testing. Lemma adds generality and POS adds properties of word into token. We have designed a novel method to create an affinity matrix to calculate the affinity be-tween any pair of lemma_POS (a token where lemma and POS of word are joined by underscore) of given training set. Additionally, we have devised an al-gorithm to create the sense clusters of tokens using affinity matrix under hierar-chy of POS of lemma. Furthermore, three different mechanisms to predict the sense of target word using the affinity/similarity value are devised. Each contex-tual token contributes to the sense of target word with some value and whichever sense gets higher value becomes the sense of target word. So, contextual tokens play a key role in creating sense clusters and predicting the sense of target word, hence, the model is named Contextual SenSe Model (CSM). CSM exhibits a noteworthy simplicity and explication lucidity in contrast to contemporary deep learning models characterized by intricacy, time-intensive processes, and chal-lenging explication. CSM is trained on SemCor training data and evaluated on SemEval test dataset. The results indicate that despite the naivety of the method, it achieves promising results when compared to the Most Frequent Sense (MFS) model.Keywords: word sense disambiguation (wsd), contextual sense model (csm), most frequent sense (mfs), part of speech (pos), natural language processing (nlp), oov (out of vocabulary), lemma_pos (a token where lemma and pos of word are joined by underscore), information retrieval (ir), machine translation (mt)
Procedia PDF Downloads 110517 A Framework of Product Information Service System Using Mobile Image Retrieval and Text Mining Techniques
Authors: Mei-Yi Wu, Shang-Ming Huang
Abstract:
The online shoppers nowadays often search the product information on the Internet using some keywords of products. To use this kind of information searching model, shoppers should have a preliminary understanding about their interesting products and choose the correct keywords. However, if the products are first contact (for example, the worn clothes or backpack of passengers which you do not have any idea about the brands), these products cannot be retrieved due to insufficient information. In this paper, we discuss and study the applications in E-commerce using image retrieval and text mining techniques. We design a reasonable E-commerce application system containing three layers in the architecture to provide users product information. The system can automatically search and retrieval similar images and corresponding web pages on Internet according to the target pictures which taken by users. Then text mining techniques are applied to extract important keywords from these retrieval web pages and search the prices on different online shopping stores with these keywords using a web crawler. Finally, the users can obtain the product information including photos and prices of their favorite products. The experiments shows the efficiency of proposed system.Keywords: mobile image retrieval, text mining, product information service system, online marketing
Procedia PDF Downloads 360516 Selection of Relevant Servers in Distributed Information Retrieval System
Authors: Benhamouda Sara, Guezouli Larbi
Abstract:
Nowadays, the dissemination of information touches the distributed world, where selecting the relevant servers to a user request is an important problem in distributed information retrieval. During the last decade, several research studies on this issue have been launched to find optimal solutions and many approaches of collection selection have been proposed. In this paper, we propose a new collection selection approach that takes into consideration the number of documents in a collection that contains terms of the query and the weights of those terms in these documents. We tested our method and our studies show that this technique can compete with other state-of-the-art algorithms that we choose to test the performance of our approach.Keywords: distributed information retrieval, relevance, server selection, collection selection
Procedia PDF Downloads 314515 Algorithm for Information Retrieval Optimization
Authors: Kehinde K. Agbele, Kehinde Daniel Aruleba, Eniafe F. Ayetiran
Abstract:
When using Information Retrieval Systems (IRS), users often present search queries made of ad-hoc keywords. It is then up to the IRS to obtain a precise representation of the user’s information need and the context of the information. This paper investigates optimization of IRS to individual information needs in order of relevance. The study addressed development of algorithms that optimize the ranking of documents retrieved from IRS. This study discusses and describes a Document Ranking Optimization (DROPT) algorithm for information retrieval (IR) in an Internet-based or designated databases environment. Conversely, as the volume of information available online and in designated databases is growing continuously, ranking algorithms can play a major role in the context of search results. In this paper, a DROPT technique for documents retrieved from a corpus is developed with respect to document index keywords and the query vectors. This is based on calculating the weight (Keywords: information retrieval, document relevance, performance measures, personalization
Procedia PDF Downloads 242514 Content-Based Image Retrieval Using HSV Color Space Features
Authors: Hamed Qazanfari, Hamid Hassanpour, Kazem Qazanfari
Abstract:
In this paper, a method is provided for content-based image retrieval. Content-based image retrieval system searches query an image based on its visual content in an image database to retrieve similar images. In this paper, with the aim of simulating the human visual system sensitivity to image's edges and color features, the concept of color difference histogram (CDH) is used. CDH includes the perceptually color difference between two neighboring pixels with regard to colors and edge orientations. Since the HSV color space is close to the human visual system, the CDH is calculated in this color space. In addition, to improve the color features, the color histogram in HSV color space is also used as a feature. Among the extracted features, efficient features are selected using entropy and correlation criteria. The final features extract the content of images most efficiently. The proposed method has been evaluated on three standard databases Corel 5k, Corel 10k and UKBench. Experimental results show that the accuracy of the proposed image retrieval method is significantly improved compared to the recently developed methods.Keywords: content-based image retrieval, color difference histogram, efficient features selection, entropy, correlation
Procedia PDF Downloads 250513 Shaping Lexical Concept of 'Mage' through Image Schemas in Dragon Age 'Origins'
Authors: Dean Raiyasmi, Elvi Citraresmana, Sutiono Mahdi
Abstract:
Language shapes the human mind and its concept toward things. Using image schemas, in nowadays technology, even AI (artificial intelligence) can concept things in response to their creator negativity or positivity. This is reflected inside one of the most selling game around the world in 2012 called Dragon Age Origins. The AI in form of NPC (Non-Playable Character) inside the game reflects on the creator of the game on negativity or positivity toward the lexical concept of mage. Through image schemas, shaping the lexical concept of mage deemed possible and proved the negativity or positivity creator of the game toward mage. This research analyses the cognitive-semantic process of image schema and shaping the concept of ‘mage’ by describing kinds of image schemas exist in the Dragon Age Origin Game. This research is also aimed to analyse kinds of image schemas and describing the image schemas which shaping the concept of ‘mage’ itself. The methodology used in this research is qualitative where participative observation is employed with five stages and documentation. The results shows that there are four image schemas exist in the game and those image schemas shaping the lexical concept of ‘mage’.Keywords: cognitive semantic, image-schema, conceptual metaphor, video game
Procedia PDF Downloads 438512 Urdu Text Extraction Method from Images
Authors: Samabia Tehsin, Sumaira Kausar
Abstract:
Due to the vast increase in the multimedia data in recent years, efficient and robust retrieval techniques are needed to retrieve and index images/ videos. Text embedded in the images can serve as the strong retrieval tool for images. This is the reason that text extraction is an area of research with increasing attention. English text extraction is the focus of many researchers but very less work has been done on other languages like Urdu. This paper is focusing on Urdu text extraction from video frames. This paper presents a text detection feature set, which has the ability to deal up with most of the problems connected with the text extraction process. To test the validity of the method, it is tested on Urdu news dataset, which gives promising results.Keywords: caption text, content-based image retrieval, document analysis, text extraction
Procedia PDF Downloads 517511 SIFT and Perceptual Zoning Applied to CBIR Systems
Authors: Simone B. K. Aires, Cinthia O. de A. Freitas, Luiz E. S. Oliveira
Abstract:
This paper contributes to the CBIR systems applied to trademark retrieval. The proposed model includes aspects from visual perception of the shapes, by means of feature extractor associated to a non-symmetrical perceptual zoning mechanism based on the Principles of Gestalt. Thus, the feature set were performed using Scale Invariant Feature Transform (SIFT). We carried out experiments using four different zonings strategies (Z = 4, 5H, 5V, 7) for matching and retrieval tasks. Our proposal method achieved the normalized recall (Rn) equal to 0.84. Experiments show that the non-symmetrical zoning could be considered as a tool to build more reliable trademark retrieval systems.Keywords: CBIR, Gestalt, matching, non-symmetrical zoning, SIFT
Procedia PDF Downloads 314510 Image Retrieval Based on Multi-Feature Fusion for Heterogeneous Image Databases
Authors: N. W. U. D. Chathurani, Shlomo Geva, Vinod Chandran, Proboda Rajapaksha
Abstract:
Selecting an appropriate image representation is the most important factor in implementing an effective Content-Based Image Retrieval (CBIR) system. This paper presents a multi-feature fusion approach for efficient CBIR, based on the distance distribution of features and relative feature weights at the time of query processing. It is a simple yet effective approach, which is free from the effect of features' dimensions, ranges, internal feature normalization and the distance measure. This approach can easily be adopted in any feature combination to improve retrieval quality. The proposed approach is empirically evaluated using two benchmark datasets for image classification (a subset of the Corel dataset and Oliva and Torralba) and compared with existing approaches. The performance of the proposed approach is confirmed with the significantly improved performance in comparison with the independently evaluated baseline of the previously proposed feature fusion approaches.Keywords: feature fusion, image retrieval, membership function, normalization
Procedia PDF Downloads 346509 Enhanced Arabic Semantic Information Retrieval System Based on Arabic Text Classification
Authors: A. Elsehemy, M. Abdeen , T. Nazmy
Abstract:
Since the appearance of the Semantic web, many semantic search techniques and models were proposed to exploit the information in ontology to enhance the traditional keyword-based search. Many advances were made in languages such as English, German, French and Spanish. However, other languages such as Arabic are not fully supported yet. In this paper we present a framework for ontology based information retrieval for Arabic language. Our system consists of four main modules, namely query parser, indexer, search and a ranking module. Our approach includes building a semantic index by linking ontology concepts to documents, including an annotation weight for each link, to be used in ranking the results. We also augmented the framework with an automatic document categorizer, which enhances the overall document ranking. We have built three Arabic domain ontologies: Sports, Economic and Politics as example for the Arabic language. We built a knowledge base that consists of 79 classes and more than 1456 instances. The system is evaluated using the precision and recall metrics. We have done many retrieval operations on a sample of 40,316 documents with a size 320 MB of pure text. The results show that the semantic search enhanced with text classification gives better performance results than the system without classification.Keywords: Arabic text classification, ontology based retrieval, Arabic semantic web, information retrieval, Arabic ontology
Procedia PDF Downloads 526508 Semantic Indexing Improvement for Textual Documents: Contribution of Classification by Fuzzy Association Rules
Authors: Mohsen Maraoui
Abstract:
In the aim of natural language processing applications improvement, such as information retrieval, machine translation, lexical disambiguation, we focus on statistical approach to semantic indexing for multilingual text documents based on conceptual network formalism. We propose to use this formalism as an indexing language to represent the descriptive concepts and their weighting. These concepts represent the content of the document. Our contribution is based on two steps. In the first step, we propose the extraction of index terms using the multilingual lexical resource Euro WordNet (EWN). In the second step, we pass from the representation of index terms to the representation of index concepts through conceptual network formalism. This network is generated using the EWN resource and pass by a classification step based on association rules model (in attempt to discover the non-taxonomic relations or contextual relations between the concepts of a document). These relations are latent relations buried in the text and carried by the semantic context of the co-occurrence of concepts in the document. Our proposed indexing approach can be applied to text documents in various languages because it is based on a linguistic method adapted to the language through a multilingual thesaurus. Next, we apply the same statistical process regardless of the language in order to extract the significant concepts and their associated weights. We prove that the proposed indexing approach provides encouraging results.Keywords: concept extraction, conceptual network formalism, fuzzy association rules, multilingual thesaurus, semantic indexing
Procedia PDF Downloads 141507 Design an Algorithm for Software Development in CBSE Envrionment Using Feed Forward Neural Network
Authors: Amit Verma, Pardeep Kaur
Abstract:
In software development organizations, Component based Software engineering (CBSE) is emerging paradigm for software development and gained wide acceptance as it often results in increase quality of software product within development time and budget. In component reusability, main challenges are the right component identification from large repositories at right time. The major objective of this work is to provide efficient algorithm for storage and effective retrieval of components using neural network and parameters based on user choice through clustering. This research paper aims to propose an algorithm that provides error free and automatic process (for retrieval of the components) while reuse of the component. In this algorithm, keywords (or components) are extracted from software document, after by applying k mean clustering algorithm. Then weights assigned to those keywords based on their frequency and after assigning weights, ANN predicts whether correct weight is assigned to keywords (or components) or not, otherwise it back propagates in to initial step (re-assign the weights). In last, store those all keywords into repositories for effective retrieval. Proposed algorithm is very effective in the error correction and detection with user base choice while choice of component for reusability for efficient retrieval is there.Keywords: component based development, clustering, back propagation algorithm, keyword based retrieval
Procedia PDF Downloads 379506 EEG Correlates of Trait and Mathematical Anxiety during Lexical and Numerical Error-Recognition Tasks
Authors: Alexander N. Savostyanov, Tatiana A. Dolgorukova, Elena A. Esipenko, Mikhail S. Zaleshin, Margherita Malanchini, Anna V. Budakova, Alexander E. Saprygin, Tatiana A. Golovko, Yulia V. Kovas
Abstract:
EEG correlates of mathematical and trait anxiety level were studied in 52 healthy Russian-speakers during execution of error-recognition tasks with lexical, arithmetic and algebraic conditions. Event-related spectral perturbations were used as a measure of brain activity. The ERSP plots revealed alpha/beta desynchronizations within a 500-3000 ms interval after task onset and slow-wave synchronization within an interval of 150-350 ms. Amplitudes of these intervals reflected the accuracy of error recognition, and were differently associated with the three conditions. The correlates of anxiety were found in theta (4-8 Hz) and beta2 (16-20 Hz) frequency bands. In theta band the effects of mathematical anxiety were stronger expressed in lexical, than in arithmetic and algebraic condition. The mathematical anxiety effects in theta band were associated with differences between anterior and posterior cortical areas, whereas the effects of trait anxiety were associated with inter-hemispherical differences. In beta1 and beta2 bands effects of trait and mathematical anxiety were directed oppositely. The trait anxiety was associated with increase of amplitude of desynchronization, whereas the mathematical anxiety was associated with decrease of this amplitude. The effect of mathematical anxiety in beta2 band was insignificant for lexical condition but was the strongest in algebraic condition. EEG correlates of anxiety in theta band could be interpreted as indexes of task emotionality, whereas the reaction in beta2 band is related to tension of intellectual resources.Keywords: EEG, brain activity, lexical and numerical error-recognition tasks, mathematical and trait anxiety
Procedia PDF Downloads 561505 Augmented Reality Technology for a User Interface in an Automated Storage and Retrieval System
Authors: Wen-Jye Shyr, Chun-Yuan Chang, Bo-Lin Wei, Chia-Ming Lin
Abstract:
The task of creating an augmented reality technology was described in this study to give operators a user interface that might be a part of an automated storage and retrieval system. Its objective was to give graduate engineering and technology students a system of tools with which to experiment with the creation of augmented reality technologies. To collect and analyze data for maintenance applications, the students used augmented reality technology. Our findings support the evolution of artificial intelligence towards Industry 4.0 practices and the planned Industry 4.0 research stream. Important first insights into the study's effects on student learning were presented.Keywords: augmented reality, storage and retrieval system, user interface, programmable logic controller
Procedia PDF Downloads 89504 Incorporating Lexical-Semantic Knowledge into Convolutional Neural Network Framework for Pediatric Disease Diagnosis
Authors: Xiaocong Liu, Huazhen Wang, Ting He, Xiaozheng Li, Weihan Zhang, Jian Chen
Abstract:
The utilization of electronic medical record (EMR) data to establish the disease diagnosis model has become an important research content of biomedical informatics. Deep learning can automatically extract features from the massive data, which brings about breakthroughs in the study of EMR data. The challenge is that deep learning lacks semantic knowledge, which leads to impracticability in medical science. This research proposes a method of incorporating lexical-semantic knowledge from abundant entities into a convolutional neural network (CNN) framework for pediatric disease diagnosis. Firstly, medical terms are vectorized into Lexical Semantic Vectors (LSV), which are concatenated with the embedded word vectors of word2vec to enrich the feature representation. Secondly, the semantic distribution of medical terms serves as Semantic Decision Guide (SDG) for the optimization of deep learning models. The study evaluate the performance of LSV-SDG-CNN model on four kinds of Chinese EMR datasets. Additionally, CNN, LSV-CNN, and SDG-CNN are designed as baseline models for comparison. The experimental results show that LSV-SDG-CNN model outperforms baseline models on four kinds of Chinese EMR datasets. The best configuration of the model yielded an F1 score of 86.20%. The results clearly demonstrate that CNN has been effectively guided and optimized by lexical-semantic knowledge, and LSV-SDG-CNN model improves the disease classification accuracy with a clear margin.Keywords: convolutional neural network, electronic medical record, feature representation, lexical semantics, semantic decision
Procedia PDF Downloads 126503 Dynamic Log Parsing and Intelligent Anomaly Detection Method Combining Retrieval Augmented Generation and Prompt Engineering
Authors: Liu Linxin
Abstract:
As system complexity increases, log parsing and anomaly detection become more and more important in ensuring system stability. However, traditional methods often face the problems of insufficient adaptability and decreasing accuracy when dealing with rapidly changing log contents and unknown domains. To this end, this paper proposes an approach LogRAG, which combines RAG (Retrieval Augmented Generation) technology with Prompt Engineering for Large Language Models, applied to log analysis tasks to achieve dynamic parsing of logs and intelligent anomaly detection. By combining real-time information retrieval and prompt optimisation, this study significantly improves the adaptive capability of log analysis and the interpretability of results. Experimental results show that the method performs well on several public datasets, especially in the absence of training data, and significantly outperforms traditional methods. This paper provides a technical path for log parsing and anomaly detection, demonstrating significant theoretical value and application potential.Keywords: log parsing, anomaly detection, retrieval-augmented generation, prompt engineering, LLMs
Procedia PDF Downloads 31502 In-Fun-Mation: Putting the Fun in Information Retrieval at the Linnaeus University, Sweden
Authors: Aagesson, Ekstrand, Persson, Sallander
Abstract:
A description of how a team of librarians at Linnaeus University Library in Sweden utilizes a pedagogical approach to deliver engaging digital workshops on information retrieval. The team consists of four librarians supporting three different faculties. The paper discusses the challenges faced in engaging students who may perceive information retrieval as a boring and difficult subject. The paper emphasizes the importance of motivation, inclusivity, constructive feedback, and collaborative learning in enhancing student engagement. By employing a two-librarian teaching model, maintaining a lighthearted approach, and relating information retrieval to everyday experiences, the team aimed to create an enjoyable and meaningful learning experience. The authors describe their approach to increase student engagement and learning outcomes through a three-phase workshop structure: before, during, and after the workshops. The "flipped classroom" method was used, where students were provided with pre-workshop materials, including a short film on information search and encouraged to reflect on the topic using a digital collaboration tool. During the workshops, interactive elements such as quizzes, live demonstrations, and practical training were incorporated, along with opportunities for students to ask questions and provide feedback. The paper concludes by highlighting the benefits of the flipped classroom approach and the extended learning opportunities provided by the before and after workshop phases. The authors believe that their approach offers a sustainable alternative for enhancing information retrieval knowledge among students at Linnaeus University.Keywords: digital workshop, flipped classroom, information retrieval, interactivity, LIS practitioner, student engagement
Procedia PDF Downloads 67501 Lexical Collocations in Medical Articles of Non-Native vs Native English-Speaking Researchers
Authors: Waleed Mandour
Abstract:
This study presents multidimensional scrutiny of Benson et al.’s seven-category taxonomy of lexical collocations used by Egyptian medical authors and their peers of native-English speakers. It investigates 212 medical papers, all published during a span of 6 years (from 2013 to 2018). The comparison is held to the medical research articles submitted by native speakers of English (25,238 articles in total with over 103 million words) as derived from the Directory of Open Access Journals (a 2.7 billion-word corpus). The non-native speakers compiled corpus was properly annotated and marked-up manually by the researcher according to the standards of Weisser. In terms of statistical comparisons, though, deployed were the conventional frequency-based analysis besides the relevant criteria, such as association measures (AMs) in which LogDice is deployed as per the recommendation of Kilgariff et al. when comparing large corpora. Despite the terminological convergence in the subject corpora, comparison results confirm the previous literature of which the non-native speakers’ compositions reveal limited ranges of lexical collocations in terms of their distribution. However, there is a ubiquitous tendency of overusing the NS-high-frequency multi-words in all lexical categories investigated. Furthermore, Egyptian authors, conversely to their English-speaking peers, tend to embrace more collocations denoting quantitative rather than qualitative analyses in their produced papers. This empirical work, per se, contributes to the English for Academic Purposes (EAP) and English as a Lingua Franca in Academic settings (ELFA). In addition, there are pedagogical implications that would promote a better quality of medical research papers published in Egyptian universities.Keywords: corpus linguistics, EAP, ELFA, lexical collocations, medical discourse
Procedia PDF Downloads 132500 A Comparative Analysis of Lexical Bundles in Academic Writing: Insights from Persian and Native English Writers in Applied Linguistics
Authors: Elham Shahrjooi Haghighi
Abstract:
This research explores how lexical bundles are utilized in writing in the field of linguistics by comparing professional Persian writers with native English writers using corpus-based studies and advanced computational techniques to examine the occurrence and characteristics of lexical bundles in academic writings. The review of literature emphasizes how important lexical bundles are, in organizing discussions and conveying opinions in both spoken and written language contexts across genres and proficiency levels in fields of study. Previous research has indicated that native English writers tend to employ an array and diversity of bundles than non-native writers do; these bundles are essential elements in academic writing. In this study’s methodology section, the research utilizes a corpus-based method to analyze a collection of writings such as research papers and advanced theses at the doctoral and masters’ levels. The examination uncovers variances in the utilization of groupings between writers who are native speakers of Persian and those who are native English speakers with the latter group displaying a greater occurrence and variety, in types of groupings. Furthermore, the research delves into how these groupings contribute to aspects classifying them into categories based on their relevance to research text structure and individuals as outlined in Hyland’s framework. The results show that Persian authors employ phrases and demonstrate distinct structural and functional tendencies in comparison to native English writers. This variation is linked to differing language skills, levels, disciplinary norms and cultural factors. The study also highlights the pedagogical implications of these findings, suggesting that targeted instruction on the use of lexical bundles could enhance the academic writing skills of non-native speakers. In conclusion, this research contributes to the understanding of lexical bundles in academic writing by providing a detailed comparative analysis of their use by Persian and native English writers. The insights from this study have important implications for language education and the development of effective writing strategies for non-native English speakers in academic contexts.Keywords: lexical bundles, academic writing, comparative analysis, computational techniques
Procedia PDF Downloads 23499 The Impact of the Lexical Quality Hypothesis and the Self-Teaching Hypothesis on Reading Ability
Authors: Anastasios Ntousas
Abstract:
The purpose of the following paper is to analyze the relationship between the lexical quality and the self-teaching hypothesis and their impact on the reading ability. The following questions emerged, is there a correlation between the effective reading experience that the lexical quality hypothesis proposes and the self-teaching hypothesis, would the ability to read by analogy facilitate and create stable, synchronized four-word representational, and would word morphological knowledge be a possible extension of the self-teaching hypothesis. The lexical quality hypothesis speculates that words include four representational attributes, phonology, orthography, morpho-syntax, and meaning. Those four-word representations work together to make word reading an effective task. A possible lack of knowledge in one of the representations might disrupt reading comprehension. The degree that the four-word features connect together makes high and low lexical word quality representations. When the four-word representational attributes connect together effectively, readers have a high lexical quality of words; however, when they hardly have a strong connection with each other, readers have a low lexical quality of words. Furthermore, the self-teaching hypothesis proposes that phonological recoding enables printed word learning. Phonological knowledge and reading experience facilitate the acquisition and consolidation of specific-word orthographies. The reading experience is related to strong reading comprehension. The more readers have contact with texts, the better readers they become. Therefore, their phonological knowledge, as the self-teaching hypothesis suggests, might have a facilitative impact on the consolidation of the orthographical, morphological-syntax and meaning representations of unknown words. The phonology of known words might activate effectively the rest of the representational features of words. Readers use their existing phonological knowledge of similarly spelt words to pronounce unknown words; a possible transference of this ability to read by analogy will appear with readers’ morphological knowledge. Morphemes might facilitate readers’ ability to pronounce and spell new unknown words in which they do not have lexical access. Readers will encounter unknown words with similarly phonemes and morphemes but with different meanings. Knowledge of phonology and morphology might support and increase reading comprehension. There was a careful selection, discussion of theoretical material and comparison of the two existing theories. Evidence shows that morphological knowledge improves reading ability and comprehension, so morphological knowledge might be a possible extension of the self-teaching hypothesis, the fundamental skill to read by analogy can be implemented to the consolidation of word – specific orthographies via readers’ morphological knowledge, and there is a positive correlation between effective reading experience and self-teaching hypothesis.Keywords: morphology, orthography, reading ability, reading comprehension
Procedia PDF Downloads 129