Search results for: semantic textual similarity binary task
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3903

Search results for: semantic textual similarity binary task

3903 Evaluation and Compression of Different Language Transformer Models for Semantic Textual Similarity Binary Task Using Minority Language Resources

Authors: Ma. Gracia Corazon Cayanan, Kai Yuen Cheong, Li Sha

Abstract:

Training a language model for a minority language has been a challenging task. The lack of available corpora to train and fine-tune state-of-the-art language models is still a challenge in the area of Natural Language Processing (NLP). Moreover, the need for high computational resources and bulk data limit the attainment of this task. In this paper, we presented the following contributions: (1) we introduce and used a translation pair set of Tagalog and English (TL-EN) in pre-training a language model to a minority language resource; (2) we fine-tuned and evaluated top-ranking and pre-trained semantic textual similarity binary task (STSB) models, to both TL-EN and STS dataset pairs. (3) then, we reduced the size of the model to offset the need for high computational resources. Based on our results, the models that were pre-trained to translation pairs and STS pairs can perform well for STSB task. Also, having it reduced to a smaller dimension has no negative effect on the performance but rather has a notable increase on the similarity scores. Moreover, models that were pre-trained to a similar dataset have a tremendous effect on the model’s performance scores.

Keywords: semantic matching, semantic textual similarity binary task, low resource minority language, fine-tuning, dimension reduction, transformer models

Procedia PDF Downloads 177
3902 Semantic Textual Similarity on Contracts: Exploring Multiple Negative Ranking Losses for Sentence Transformers

Authors: Yogendra Sisodia

Abstract:

Researchers are becoming more interested in extracting useful information from legal documents thanks to the development of large-scale language models in natural language processing (NLP), and deep learning has accelerated the creation of powerful text mining models. Legal fields like contracts benefit greatly from semantic text search since it makes it quick and easy to find related clauses. After collecting sentence embeddings, it is relatively simple to locate sentences with a comparable meaning throughout the entire legal corpus. The author of this research investigated two pre-trained language models for this task: MiniLM and Roberta, and further fine-tuned them on Legal Contracts. The author used Multiple Negative Ranking Loss for the creation of sentence transformers. The fine-tuned language models and sentence transformers showed promising results.

Keywords: legal contracts, multiple negative ranking loss, natural language inference, sentence transformers, semantic textual similarity

Procedia PDF Downloads 70
3901 Quick Similarity Measurement of Binary Images via Probabilistic Pixel Mapping

Authors: Adnan A. Y. Mustafa

Abstract:

In this paper we present a quick technique to measure the similarity between binary images. The technique is based on a probabilistic mapping approach and is fast because only a minute percentage of the image pixels need to be compared to measure the similarity, and not the whole image. We exploit the power of the Probabilistic Matching Model for Binary Images (PMMBI) to arrive at an estimate of the similarity. We show that the estimate is a good approximation of the actual value, and the quality of the estimate can be improved further with increased image mappings. Furthermore, the technique is image size invariant; the similarity between big images can be measured as fast as that for small images. Examples of trials conducted on real images are presented.

Keywords: big images, binary images, image matching, image similarity

Procedia PDF Downloads 164
3900 Text Similarity in Vector Space Models: A Comparative Study

Authors: Omid Shahmirzadi, Adam Lugowski, Kenneth Younge

Abstract:

Automatic measurement of semantic text similarity is an important task in natural language processing. In this paper, we evaluate the performance of different vector space models to perform this task. We address the real-world problem of modeling patent-to-patent similarity and compare TFIDF (and related extensions), topic models (e.g., latent semantic indexing), and neural models (e.g., paragraph vectors). Contrary to expectations, the added computational cost of text embedding methods is justified only when: 1) the target text is condensed; and 2) the similarity comparison is trivial. Otherwise, TFIDF performs surprisingly well in other cases: in particular for longer and more technical texts or for making finer-grained distinctions between nearest neighbors. Unexpectedly, extensions to the TFIDF method, such as adding noun phrases or calculating term weights incrementally, were not helpful in our context.

Keywords: big data, patent, text embedding, text similarity, vector space model

Procedia PDF Downloads 142
3899 Measuring Text-Based Semantics Relatedness Using WordNet

Authors: Madiha Khan, Sidrah Ramzan, Seemab Khan, Shahzad Hassan, Kamran Saeed

Abstract:

Measuring semantic similarity between texts is calculating semantic relatedness between texts using various techniques. Our web application (Measuring Relatedness of Concepts-MRC) allows user to input two text corpuses and get semantic similarity percentage between both using WordNet. Our application goes through five stages for the computation of semantic relatedness. Those stages are: Preprocessing (extracts keywords from content), Feature Extraction (classification of words into Parts-of-Speech), Synonyms Extraction (retrieves synonyms against each keyword), Measuring Similarity (using keywords and synonyms, similarity is measured) and Visualization (graphical representation of similarity measure). Hence the user can measure similarity on basis of features as well. The end result is a percentage score and the word(s) which form the basis of similarity between both texts with use of different tools on same platform. In future work we look forward for a Web as a live corpus application that provides a simpler and user friendly tool to compare documents and extract useful information.

Keywords: Graphviz representation, semantic relatedness, similarity measurement, WordNet similarity

Procedia PDF Downloads 207
3898 Approximately Similarity Measurement of Web Sites Using Genetic Algorithms and Binary Trees

Authors: Doru Anastasiu Popescu, Dan Rădulescu

Abstract:

In this paper, we determine the similarity of two HTML web applications. We are going to use a genetic algorithm in order to determine the most significant web pages of each application (we are not going to use every web page of a site). Using these significant web pages, we will find the similarity value between the two applications. The algorithm is going to be efficient because we are going to use a reduced number of web pages for comparisons but it will return an approximate value of the similarity. The binary trees are used to keep the tags from the significant pages. The algorithm was implemented in Java language.

Keywords: Tag, HTML, web page, genetic algorithm, similarity value, binary tree

Procedia PDF Downloads 332
3897 Using Textual Pre-Processing and Text Mining to Create Semantic Links

Authors: Ricardo Avila, Gabriel Lopes, Vania Vidal, Jose Macedo

Abstract:

This article offers a approach to the automatic discovery of semantic concepts and links in the domain of Oil Exploration and Production (E&P). Machine learning methods combined with textual pre-processing techniques were used to detect local patterns in texts and, thus, generate new concepts and new semantic links. Even using more specific vocabularies within the oil domain, our approach has achieved satisfactory results, suggesting that the proposal can be applied in other domains and languages, requiring only minor adjustments.

Keywords: semantic links, data mining, linked data, SKOS

Procedia PDF Downloads 144
3896 Agglomerative Hierarchical Clustering Using the Tθ Family of Similarity Measures

Authors: Salima Kouici, Abdelkader Khelladi

Abstract:

In this work, we begin with the presentation of the Tθ family of usual similarity measures concerning multidimensional binary data. Subsequently, some properties of these measures are proposed. Finally, the impact of the use of different inter-elements measures on the results of the Agglomerative Hierarchical Clustering Methods is studied.

Keywords: binary data, similarity measure, Tθ measures, agglomerative hierarchical clustering

Procedia PDF Downloads 445
3895 Graph Planning Based Composition for Adaptable Semantic Web Services

Authors: Rihab Ben Lamine, Raoudha Ben Jemaa, Ikram Amous Ben Amor

Abstract:

This paper proposes a graph planning technique for semantic adaptable Web Services composition. First, we use an ontology based context model for extending Web Services descriptions with information about the most suitable context for its use. Then, we transform the composition problem into a semantic context aware graph planning problem to build the optimal service composition based on user's context. The construction of the planning graph is based on semantic context aware Web Service discovery that allows for each step to add most suitable Web Services in terms of semantic compatibility between the services parameters and their context similarity with the user's context. In the backward search step, semantic and contextual similarity scores are used to find best composed Web Services list. Finally, in the ranking step, a score is calculated for each best solution and a set of ranked solutions is returned to the user.

Keywords: semantic web service, web service composition, adaptation, context, graph planning

Procedia PDF Downloads 489
3894 Comparative Analysis of Dissimilarity Detection between Binary Images Based on Equivalency and Non-Equivalency of Image Inversion

Authors: Adnan A. Y. Mustafa

Abstract:

Image matching is a fundamental problem that arises frequently in many aspects of robot and computer vision. It can become a time-consuming process when matching images to a database consisting of hundreds of images, especially if the images are big. One approach to reducing the time complexity of the matching process is to reduce the search space in a pre-matching stage, by simply removing dissimilar images quickly. The Probabilistic Matching Model for Binary Images (PMMBI) showed that dissimilarity detection between binary images can be accomplished quickly by random pixel mapping and is size invariant. The model is based on the gamma binary similarity distance that recognizes an image and its inverse as containing the same scene and hence considers them to be the same image. However, in many applications, an image and its inverse are not treated as being the same but rather dissimilar. In this paper, we present a comparative analysis of dissimilarity detection between PMMBI based on the gamma binary similarity distance and a modified PMMBI model based on a similarity distance that does distinguish between an image and its inverse as being dissimilar.

Keywords: binary image, dissimilarity detection, probabilistic matching model for binary images, image mapping

Procedia PDF Downloads 118
3893 Hybrid Approximate Structural-Semantic Frequent Subgraph Mining

Authors: Montaceur Zaghdoud, Mohamed Moussaoui, Jalel Akaichi

Abstract:

Frequent subgraph mining refers usually to graph matching and it is widely used in when analyzing big data with large graphs. A lot of research works dealt with structural exact or inexact graph matching but a little attention is paid to semantic matching when graph vertices and/or edges are attributed and typed. Therefore, it seems very interesting to integrate background knowledge into the analysis and that extracted frequent subgraphs should become more pruned by applying a new semantic filter instead of using only structural similarity in graph matching process. Consequently, this paper focuses on developing a new hybrid approximate structuralsemantic graph matching to discover a set of frequent subgraphs. It uses simultaneously an approximate structural similarity function based on graph edit distance function and a possibilistic vertices similarity function based on affinity function. Both structural and semantic filters contribute together to prune extracted frequent set. Indeed, new hybrid structural-semantic frequent subgraph mining approach searches will be suitable to be applied to several application such as community detection in social networks.

Keywords: approximate graph matching, hybrid frequent subgraph mining, graph mining, possibility theory

Procedia PDF Downloads 370
3892 Learning to Translate by Learning to Communicate to an Entailment Classifier

Authors: Szymon Rutkowski, Tomasz Korbak

Abstract:

We present a reinforcement-learning-based method of training neural machine translation models without parallel corpora. The standard encoder-decoder approach to machine translation suffers from two problems we aim to address. First, it needs parallel corpora, which are scarce, especially for low-resource languages. Second, it lacks psychological plausibility of learning procedure: learning a foreign language is about learning to communicate useful information, not merely learning to transduce from one language’s 'encoding' to another. We instead pose the problem of learning to translate as learning a policy in a communication game between two agents: the translator and the classifier. The classifier is trained beforehand on a natural language inference task (determining the entailment relation between a premise and a hypothesis) in the target language. The translator produces a sequence of actions that correspond to generating translations of both the hypothesis and premise, which are then passed to the classifier. The translator is rewarded for classifier’s performance on determining entailment between sentences translated by the translator to disciple’s native language. Translator’s performance thus reflects its ability to communicate useful information to the classifier. In effect, we train a machine translation model without the need for parallel corpora altogether. While similar reinforcement learning formulations for zero-shot translation were proposed before, there is a number of improvements we introduce. While prior research aimed at grounding the translation task in the physical world by evaluating agents on an image captioning task, we found that using a linguistic task is more sample-efficient. Natural language inference (also known as recognizing textual entailment) captures semantic properties of sentence pairs that are poorly correlated with semantic similarity, thus enforcing basic understanding of the role played by compositionality. It has been shown that models trained recognizing textual entailment produce high-quality general-purpose sentence embeddings transferrable to other tasks. We use stanford natural language inference (SNLI) dataset as well as its analogous datasets for French (XNLI) and Polish (CDSCorpus). Textual entailment corpora can be obtained relatively easily for any language, which makes our approach more extensible to low-resource languages than traditional approaches based on parallel corpora. We evaluated a number of reinforcement learning algorithms (including policy gradients and actor-critic) to solve the problem of translator’s policy optimization and found that our attempts yield some promising improvements over previous approaches to reinforcement-learning based zero-shot machine translation.

Keywords: agent-based language learning, low-resource translation, natural language inference, neural machine translation, reinforcement learning

Procedia PDF Downloads 99
3891 Network Word Discovery Framework Based on Sentence Semantic Vector Similarity

Authors: Ganfeng Yu, Yuefeng Ma, Shanliang Yang

Abstract:

The word discovery is a key problem in text information retrieval technology. Methods in new word discovery tend to be closely related to words because they generally obtain new word results by analyzing words. With the popularity of social networks, individual netizens and online self-media have generated various network texts for the convenience of online life, including network words that are far from standard Chinese expression. How detect network words is one of the important goals in the field of text information retrieval today. In this paper, we integrate the word embedding model and clustering methods to propose a network word discovery framework based on sentence semantic similarity (S³-NWD) to detect network words effectively from the corpus. This framework constructs sentence semantic vectors through a distributed representation model, uses the similarity of sentence semantic vectors to determine the semantic relationship between sentences, and finally realizes network word discovery by the meaning of semantic replacement between sentences. The experiment verifies that the framework not only completes the rapid discovery of network words but also realizes the standard word meaning of the discovery of network words, which reflects the effectiveness of our work.

Keywords: text information retrieval, natural language processing, new word discovery, information extraction

Procedia PDF Downloads 63
3890 Selecting Answers for Questions with Multiple Answer Choices in Arabic Question Answering Based on Textual Entailment Recognition

Authors: Anes Enakoa, Yawei Liang

Abstract:

Question Answering (QA) system is one of the most important and demanding tasks in the field of Natural Language Processing (NLP). In QA systems, the answer generation task generates a list of candidate answers to the user's question, in which only one answer is correct. Answer selection is one of the main components of the QA, which is concerned with selecting the best answer choice from the candidate answers suggested by the system. However, the selection process can be very challenging especially in Arabic due to its particularities. To address this challenge, an approach is proposed to answer questions with multiple answer choices for Arabic QA systems based on Textual Entailment (TE) recognition. The developed approach employs a Support Vector Machine that considers lexical, semantic and syntactic features in order to recognize the entailment between the generated hypotheses (H) and the text (T). A set of experiments has been conducted for performance evaluation and the overall performance of the proposed method reached an accuracy of 67.5% with C@1 score of 80.46%. The obtained results are promising and demonstrate that the proposed method is effective for TE recognition task.

Keywords: information retrieval, machine learning, natural language processing, question answering, textual entailment

Procedia PDF Downloads 120
3889 Hit-Or-Miss Transform as a Tool for Similar Shape Detection

Authors: Osama Mohamed Elrajubi, Idris El-Feghi, Mohamed Abu Baker Saghayer

Abstract:

This paper describes an identification of specific shapes within binary images using the morphological Hit-or-Miss Transform (HMT). Hit-or-Miss transform is a general binary morphological operation that can be used in searching of particular patterns of foreground and background pixels in an image. It is actually a basic operation of binary morphology since almost all other binary morphological operators are derived from it. The input of this method is a binary image and a structuring element (a template which will be searched in a binary image) while the output is another binary image. In this paper a modification of Hit-or-Miss transform has been proposed. The accuracy of algorithm is adjusted according to the similarity of the template and the sought template. The implementation of this method has been done by C language. The algorithm has been tested on several images and the results have shown that this new method can be used for similar shape detection.

Keywords: hit-or-miss operator transform, HMT, binary morphological operation, shape detection, binary images processing

Procedia PDF Downloads 298
3888 A Study on Bilingual Semantic Processing: Category Effects and Age Effects

Authors: Lai Yi-Hsiu

Abstract:

The present study addressed the nature of bilingual semantic processing in Mandarin Chinese and Southern Min and examined category effects and age effects. Nineteen bilingual adults of Mandarin Chinese and Southern Min, nine monolingual seniors of Mandarin Chinese, and ten monolingual seniors of Southern Min in Taiwan individually completed two semantic tasks: Picture naming and category fluency tasks. The instruments for the naming task were sixty black-and-white pictures, including thirty-five object pictures and twenty-five action pictures. The category fluency task also consisted of two semantic categories – objects (or nouns) and actions (or verbs). The reaction time for each picture/question was additionally calculated and analyzed. Oral productions in Mandarin Chinese and in Southern Min were compared and discussed to examine the category effects and age effects. The results of the category fluency task indicated that the content of information of these seniors was comparatively deteriorated, and thus they produced a smaller number of semantic-lexical items. Significant group differences were also found in the reaction time results. Category effects were significant for both adults and seniors in the semantic fluency task. The findings of the present study will help characterize the nature of the bilingual semantic processing of adults and seniors, and contribute to the fields of contrastive and corpus linguistics.

Keywords: bilingual semantic processing, aging, Mandarin Chinese, Southern Min

Procedia PDF Downloads 543
3887 Resume Ranking Using Custom Word2vec and Rule-Based Natural Language Processing Techniques

Authors: Subodh Chandra Shakya, Rajendra Sapkota, Aakash Tamang, Shushant Pudasaini, Sujan Adhikari, Sajjan Adhikari

Abstract:

Lots of efforts have been made in order to measure the semantic similarity between the text corpora in the documents. Techniques have been evolved to measure the similarity of two documents. One such state-of-art technique in the field of Natural Language Processing (NLP) is word to vector models, which converts the words into their word-embedding and measures the similarity between the vectors. We found this to be quite useful for the task of resume ranking. So, this research paper is the implementation of the word2vec model along with other Natural Language Processing techniques in order to rank the resumes for the particular job description so as to automate the process of hiring. The research paper proposes the system and the findings that were made during the process of building the system.

Keywords: chunking, document similarity, information extraction, natural language processing, word2vec, word embedding

Procedia PDF Downloads 129
3886 Neural Graph Matching for Modification Similarity Applied to Electronic Document Comparison

Authors: Po-Fang Hsu, Chiching Wei

Abstract:

In this paper, we present a novel neural graph matching approach applied to document comparison. Document comparison is a common task in the legal and financial industries. In some cases, the most important differences may be the addition or omission of words, sentences, clauses, or paragraphs. However, it is a challenging task without recording or tracing the whole edited process. Under many temporal uncertainties, we explore the potentiality of our approach to proximate the accurate comparison to make sure which element blocks have a relation of edition with others. In the beginning, we apply a document layout analysis that combines traditional and modern technics to segment layouts in blocks of various types appropriately. Then we transform this issue into a problem of layout graph matching with textual awareness. Regarding graph matching, it is a long-studied problem with a broad range of applications. However, different from previous works focusing on visual images or structural layout, we also bring textual features into our model for adapting this domain. Specifically, based on the electronic document, we introduce an encoder to deal with the visual presentation decoding from PDF. Additionally, because the modifications can cause the inconsistency of document layout analysis between modified documents and the blocks can be merged and split, Sinkhorn divergence is adopted in our neural graph approach, which tries to overcome both these issues with many-to-many block matching. We demonstrate this on two categories of layouts, as follows., legal agreement and scientific articles, collected from our real-case datasets.

Keywords: document comparison, graph matching, graph neural network, modification similarity, multi-modal

Procedia PDF Downloads 151
3885 A Context-Sensitive Algorithm for Media Similarity Search

Authors: Guang-Ho Cha

Abstract:

This paper presents a context-sensitive media similarity search algorithm. One of the central problems regarding media search is the semantic gap between the low-level features computed automatically from media data and the human interpretation of them. This is because the notion of similarity is usually based on high-level abstraction but the low-level features do not sometimes reflect the human perception. Many media search algorithms have used the Minkowski metric to measure similarity between image pairs. However those functions cannot adequately capture the aspects of the characteristics of the human visual system as well as the nonlinear relationships in contextual information given by images in a collection. Our search algorithm tackles this problem by employing a similarity measure and a ranking strategy that reflect the nonlinearity of human perception and contextual information in a dataset. Similarity search in an image database based on this contextual information shows encouraging experimental results.

Keywords: context-sensitive search, image search, similarity ranking, similarity search

Procedia PDF Downloads 338
3884 Semantic Search Engine Based on Query Expansion with Google Ranking and Similarity Measures

Authors: Ahmad Shahin, Fadi Chakik, Walid Moudani

Abstract:

Our study is about elaborating a potential solution for a search engine that involves semantic technology to retrieve information and display it significantly. Semantic search engines are not used widely over the web as the majorities are still in Beta stage or under construction. Many problems face the current applications in semantic search, the major problem is to analyze and calculate the meaning of query in order to retrieve relevant information. Another problem is the ontology based index and its updates. Ranking results according to concept meaning and its relation with query is another challenge. In this paper, we are offering a light meta-engine (QESM) which uses Google search, and therefore Google’s index, with some adaptations to its returned results by adding multi-query expansion. The mission was to find a reliable ranking algorithm that involves semantics and uses concepts and meanings to rank results. At the beginning, the engine finds synonyms of each query term entered by the user based on a lexical database. Then, query expansion is applied to generate different semantically analogous sentences. These are generated randomly by combining the found synonyms and the original query terms. Our model suggests the use of semantic similarity measures between two sentences. Practically, we used this method to calculate semantic similarity between each query and the description of each page’s content generated by Google. The generated sentences are sent to Google engine one by one, and ranked again all together with the adapted ranking method (QESM). Finally, our system will place Google pages with higher similarities on the top of the results. We have conducted experimentations with 6 different queries. We have observed that most ranked results with QESM were altered with Google’s original generated pages. With our experimented queries, QESM generates frequently better accuracy than Google. In some worst cases, it behaves like Google.

Keywords: semantic search engine, Google indexing, query expansion, similarity measures

Procedia PDF Downloads 401
3883 A Method of the Semantic on Image Auto-Annotation

Authors: Lin Huo, Xianwei Liu, Jingxiong Zhou

Abstract:

Recently, due to the existence of semantic gap between image visual features and human concepts, the semantic of image auto-annotation has become an important topic. Firstly, by extract low-level visual features of the image, and the corresponding Hash method, mapping the feature into the corresponding Hash coding, eventually, transformed that into a group of binary string and store it, image auto-annotation by search is a popular method, we can use it to design and implement a method of image semantic auto-annotation. Finally, Through the test based on the Corel image set, and the results show that, this method is effective.

Keywords: image auto-annotation, color correlograms, Hash code, image retrieval

Procedia PDF Downloads 460
3882 Semantic Processing in Chinese: Category Effects, Task Effects and Age Effects

Authors: Yi-Hsiu Lai

Abstract:

The present study aimed to elucidate the nature of semantic processing in Chinese. Language and cognition related to the issue of aging are examined from the perspective of picture naming and category fluency tasks. Twenty Chinese-speaking adults (ranging from 25 to 45 years old) and twenty Chinese-speaking seniors (ranging from 65 to 75 years old) in Taiwan participated in this study. Each of them individually completed two tasks: a picture naming task and a category fluency task. Instruments for the naming task were sixty black-and-white pictures: thirty-five object and twenty-five action pictures. Category fluency task also consisted of two semantic categories – objects (or nouns) and actions (or verbs). Participants were asked to report as many items within a category as possible in one minute. Scores of action fluency and of object fluency were a summation of correct responses in these two categories. Category effects (actions vs. objects) and age effects were examined in these tasks. Objects were further divided into two major types: living objects and non-living objects. Actions were also categorized into two major types: action verbs and process verbs. Reaction time to each picture/question was additionally calculated and analyzed. Results of the category fluency task indicated that the content of information in Chinese seniors was comparatively deteriorated, thus producing smaller number of semantic-lexical items. Significant group difference was also found in the results of reaction time. Category Effect was significant for both Chinese adults and seniors in the semantic fluency task. Findings in the present study helped characterize the nature of semantic processing in Chinese-speaking adults and seniors and contributed to the issue of language and aging.

Keywords: semantic processing, aging, Chinese, category effects

Procedia PDF Downloads 335
3881 Lexical-Semantic Processing by Chinese as a Second Language Learners

Authors: Yi-Hsiu Lai

Abstract:

The present study aimed to elucidate the lexical-semantic processing for Chinese as second language (CSL) learners. Twenty L1 speakers of Chinese and twenty CSL learners in Taiwan participated in a picture naming task and a category fluency task. Based on their Chinese proficiency levels, these CSL learners were further divided into two sub-groups: ten CSL learners of elementary Chinese proficiency level and ten CSL learners of intermediate Chinese proficiency level. Instruments for the naming task were sixty black-and-white pictures: thirty-five object pictures and twenty-five action pictures. Object pictures were divided into two categories: living objects and non-living objects. Action pictures were composed of two categories: action verbs and process verbs. As in the naming task, the category fluency task consisted of two semantic categories – objects (i.e., living and non-living objects) and actions (i.e., action and process verbs). Participants were asked to report as many items within a category as possible in one minute. Oral productions were tape-recorded and transcribed for further analysis. Both error types and error frequency were calculated. Statistical analysis was further conducted to examine these error types and frequency made by CSL learners. Additionally, category effects, pictorial effects and L2 proficiency were discussed. Findings in the present study helped characterize the lexical-semantic process of Chinese naming in CSL learners of different Chinese proficiency levels and made contributions to Chinese vocabulary teaching and learning in the future.

Keywords: lexical-semantic processing, Mandarin Chinese, naming, category effects

Procedia PDF Downloads 435
3880 An AI-generated Semantic Communication Platform in HCI Course

Authors: Yi Yang, Jiasong Sun

Abstract:

Almost every aspect of our daily lives is now intertwined with some degree of human-computer interaction (HCI). HCI courses draw on knowledge from disciplines as diverse as computer science, psychology, design principles, anthropology, and more. Our HCI courses, named the Media and Cognition course, are constantly updated to reflect state-of-the-art technological advancements such as virtual reality, augmented reality, and artificial intelligence-based interactions. For more than a decade, our course has used an interest-based approach to teaching, in which students proactively propose some research-based questions and collaborate with teachers, using course knowledge to explore potential solutions. Semantic communication plays a key role in facilitating understanding and interaction between users and computer systems, ultimately enhancing system usability and user experience. The advancements in AI-generated technology, which have gained significant attention from both academia and industry in recent years, are exemplified by language models like GPT-3 that generate human-like dialogues from given prompts. Our latest version of the Human-Computer Interaction course practices a semantic communication platform based on AI-generated techniques. The purpose of this semantic communication is twofold: to extract and transmit task-specific information while ensuring efficient end-to-end communication with minimal latency. An AI-generated semantic communication platform evaluates the retention of signal sources and converts low-retain ability visual signals into textual prompts. These data are transmitted through AI-generated techniques and reconstructed at the receiving end; on the other hand, visual signals with a high retain ability rate are compressed and transmitted according to their respective regions. The platform and associated research are a testament to our students' growing ability to independently investigate state-of-the-art technologies.

Keywords: human-computer interaction, media and cognition course, semantic communication, retainability, prompts

Procedia PDF Downloads 76
3879 Lexico-Semantic and Contextual Analysis of the Concept of Joy in Modern English Fiction

Authors: Zarine Avetisyan

Abstract:

Concepts are part and parcel of everyday text and talk. Their ubiquity predetermines the topicality of the given research which aims at the semantic decomposition of concepts in general and the concept of joy in particular, as well as the study of lexico-semantic variants as means of realization of a certain concept in different “semantic settings”, namely in a certain context. To achieve the stated aim, the given research departs from the methods of componential and contextual analysis, studying lexico-semantic variants /LSVs/ of the concept of joy and the semantic signs embedded in those LSVs, such as the semantic sign of intensity, supporting emotions, etc. in the context of Modern English fiction.

Keywords: concept, context, lexico-semantic variant, semantic sign

Procedia PDF Downloads 329
3878 Static vs. Stream Mining Trajectories Similarity Measures

Authors: Musaab Riyadh, Norwati Mustapha, Dina Riyadh

Abstract:

Trajectory similarity can be defined as the cost of transforming one trajectory into another based on certain similarity method. It is the core of numerous mining tasks such as clustering, classification, and indexing. Various approaches have been suggested to measure similarity based on the geometric and dynamic properties of trajectory, the overlapping between trajectory segments, and the confined area between entire trajectories. In this article, an evaluation of these approaches has been done based on computational cost, usage memory, accuracy, and the amount of data which is needed in advance to determine its suitability to stream mining applications. The evaluation results show that the stream mining applications support similarity methods which have low computational cost and memory, single scan on data, and free of mathematical complexity due to the high-speed generation of data.

Keywords: global distance measure, local distance measure, semantic trajectory, spatial dimension, stream data mining

Procedia PDF Downloads 372
3877 Human Action Retrieval System Using Features Weight Updating Based Relevance Feedback Approach

Authors: Munaf Rashid

Abstract:

For content-based human action retrieval systems, search accuracy is often inferior because of the following two reasons 1) global information pertaining to videos is totally ignored, only low level motion descriptors are considered as a significant feature to match the similarity between query and database videos, and 2) the semantic gap between the high level user concept and low level visual features. Hence, in this paper, we propose a method that will address these two issues and in doing so, this paper contributes in two ways. Firstly, we introduce a method that uses both global and local information in one framework for an action retrieval task. Secondly, to minimize the semantic gap, a user concept is involved by incorporating features weight updating (FWU) Relevance Feedback (RF) approach. We use statistical characteristics to dynamically update weights of the feature descriptors so that after every RF iteration feature space is modified accordingly. For testing and validation purpose two human action recognition datasets have been utilized, namely Weizmann and UCF. Results show that even with a number of visual challenges the proposed approach performs well.

Keywords: relevance feedback (RF), action retrieval, semantic gap, feature descriptor, codebook

Procedia PDF Downloads 438
3876 Performance Comparison of Non-Binary RA and QC-LDPC Codes

Authors: Ni Wenli, He Jing

Abstract:

Repeat–Accumulate (RA) codes are subclass of LDPC codes with fast encoder structures. In this paper, we consider a nonbinary extension of binary LDPC codes over GF(q) and construct a non-binary RA code and a non-binary QC-LDPC code over GF(2^4), we construct non-binary RA codes with linear encoding method and non-binary QC-LDPC codes with algebraic constructions method. And the BER performance of RA and QC-LDPC codes over GF(q) are compared with BP decoding and by simulation over the Additive White Gaussian Noise (AWGN) channels.

Keywords: non-binary RA codes, QC-LDPC codes, performance comparison, BP algorithm

Procedia PDF Downloads 349
3875 Semantic-Based Collaborative Filtering to Improve Visitor Cold Start in Recommender Systems

Authors: Baba Mbaye

Abstract:

In collaborative filtering recommendation systems, a user receives suggested items based on the opinions and evaluations of a community of users. This type of recommendation system uses only the information (notes in numerical values) contained in a usage matrix as input data. This matrix can be constructed based on users' behaviors or by offering users to declare their opinions on the items they know. The cold start problem leads to very poor performance for new users. It is a phenomenon that occurs at the beginning of use, in the situation where the system lacks data to make recommendations. There are three types of cold start problems: cold start for a new item, a new system, and a new user. We are interested in this article at the cold start for a new user. When the system welcomes a new user, the profile exists but does not have enough data, and its communities with other users profiles are still unknown. This leads to recommendations not adapted to the profile of the new user. In this paper, we propose an approach that improves cold start by using the notions of similarity and semantic proximity between users profiles during cold start. We will use the cold-metadata available (metadata extracted from the new user's data) useful in positioning the new user within a community. The aim is to look for similarities and semantic proximities with the old and current user profiles of the system. Proximity is represented by close concepts considered to belong to the same group, while similarity groups together elements that appear similar. Similarity and proximity are two close but not similar concepts. This similarity leads us to the construction of similarity which is based on: a) the concepts (properties, terms, instances) independent of ontology structure and, b) the simultaneous representation of the two concepts (relations, presence of terms in a document, simultaneous presence of the authorities). We propose an ontology, OIVCSRS (Ontology of Improvement Visitor Cold Start in Recommender Systems), in order to structure the terms and concepts representing the meaning of an information field, whether by the metadata of a namespace, or the elements of a knowledge domain. This approach allows us to automatically attach the new user to a user community, partially compensate for the data that was not initially provided and ultimately to associate a better first profile with the cold start. Thus, the aim of this paper is to propose an approach to improving cold start using semantic technologies.

Keywords: visitor cold start, recommender systems, collaborative filtering, semantic filtering

Procedia PDF Downloads 195
3874 Multidimensional Item Response Theory Models for Practical Application in Large Tests Designed to Measure Multiple Constructs

Authors: Maria Fernanda Ordoñez Martinez, Alvaro Mauricio Montenegro

Abstract:

This work presents a statistical methodology for measuring and founding constructs in Latent Semantic Analysis. This approach uses the qualities of Factor Analysis in binary data with interpretations present on Item Response Theory. More precisely, we propose initially reducing dimensionality with specific use of Principal Component Analysis for the linguistic data and then, producing axes of groups made from a clustering analysis of the semantic data. This approach allows the user to give meaning to previous clusters and found the real latent structure presented by data. The methodology is applied in a set of real semantic data presenting impressive results for the coherence, speed and precision.

Keywords: semantic analysis, factorial analysis, dimension reduction, penalized logistic regression

Procedia PDF Downloads 414