Search results for: linguistic similarity
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1515

Search results for: linguistic similarity

1515 Cross-Dialect Sentence Transformation: A Comparative Analysis of Language Models for Adapting Sentences to British English

Authors: Shashwat Mookherjee, Shruti Dutta

Abstract:

This study explores linguistic distinctions among American, Indian, and Irish English dialects and assesses various Language Models (LLMs) in their ability to generate British English translations from these dialects. Using cosine similarity analysis, the study measures the linguistic proximity between original British English translations and those produced by LLMs for each dialect. The findings reveal that Indian and Irish English translations maintain notably high similarity scores, suggesting strong linguistic alignment with British English. In contrast, American English exhibits slightly lower similarity, reflecting its distinct linguistic traits. Additionally, the choice of LLM significantly impacts translation quality, with Llama-2-70b consistently demonstrating superior performance. The study underscores the importance of selecting the right model for dialect translation, emphasizing the role of linguistic expertise and contextual understanding in achieving accurate translations.

Keywords: cross-dialect translation, language models, linguistic similarity, multilingual NLP

Procedia PDF Downloads 18
1514 The Linguistic Fingerprint in Western and Arab Judicial Applications

Authors: Asem Bani Amer

Abstract:

This study handles the linguistic fingerprint in judicial applications described in a law technicality that is recent and developing. It can be adopted to discover criminals by identifying their way of speaking and their special linguistic expressions. This is achieved by understanding the expression "linguistic fingerprint," its concept, and its extended domain, then revealing some of the linguistic fingerprint tools in Western judicial applications and deducing a technical imagination for a linguistic fingerprint in the Arabic language, which is needy for such judicial applications regarding this field, through dictionaries, language rhythm, and language structure.

Keywords: linguistic fingerprint, judicial, application, dictionary, picture, rhythm, structure

Procedia PDF Downloads 53
1513 Integration of Fuzzy Logic in the Representation of Knowledge: Application in the Building Domain

Authors: Hafida Bouarfa, Mohamed Abed

Abstract:

The main object of our work is the development and the validation of a system indicated Fuzzy Vulnerability. Fuzzy Vulnerability uses a fuzzy representation in order to tolerate the imprecision during the description of construction. At the the second phase, we evaluated the similarity between the vulnerability of a new construction and those of the whole of the historical cases. This similarity is evaluated on two levels: 1) individual similarity: bases on the fuzzy techniques of aggregation; 2) Global similarity: uses the increasing monotonous linguistic quantifiers (RIM) to combine the various individual similarities between two constructions. The third phase of the process of Fuzzy Vulnerability consists in using vulnerabilities of historical constructions narrowly similar to current construction to deduce its estimate vulnerability. We validated our system by using 50 cases. We evaluated the performances of Fuzzy Vulnerability on the basis of two basic criteria, the precision of the estimates and the tolerance of the imprecision along the process of estimation. The comparison was done with estimates made by tiresome and long models. The results are satisfactory.

Keywords: case based reasoning, fuzzy logic, fuzzy case based reasoning, seismic vulnerability

Procedia PDF Downloads 251
1512 Approximately Similarity Measurement of Web Sites Using Genetic Algorithms and Binary Trees

Authors: Doru Anastasiu Popescu, Dan Rădulescu

Abstract:

In this paper, we determine the similarity of two HTML web applications. We are going to use a genetic algorithm in order to determine the most significant web pages of each application (we are not going to use every web page of a site). Using these significant web pages, we will find the similarity value between the two applications. The algorithm is going to be efficient because we are going to use a reduced number of web pages for comparisons but it will return an approximate value of the similarity. The binary trees are used to keep the tags from the significant pages. The algorithm was implemented in Java language.

Keywords: Tag, HTML, web page, genetic algorithm, similarity value, binary tree

Procedia PDF Downloads 329
1511 Measuring Text-Based Semantics Relatedness Using WordNet

Authors: Madiha Khan, Sidrah Ramzan, Seemab Khan, Shahzad Hassan, Kamran Saeed

Abstract:

Measuring semantic similarity between texts is calculating semantic relatedness between texts using various techniques. Our web application (Measuring Relatedness of Concepts-MRC) allows user to input two text corpuses and get semantic similarity percentage between both using WordNet. Our application goes through five stages for the computation of semantic relatedness. Those stages are: Preprocessing (extracts keywords from content), Feature Extraction (classification of words into Parts-of-Speech), Synonyms Extraction (retrieves synonyms against each keyword), Measuring Similarity (using keywords and synonyms, similarity is measured) and Visualization (graphical representation of similarity measure). Hence the user can measure similarity on basis of features as well. The end result is a percentage score and the word(s) which form the basis of similarity between both texts with use of different tools on same platform. In future work we look forward for a Web as a live corpus application that provides a simpler and user friendly tool to compare documents and extract useful information.

Keywords: Graphviz representation, semantic relatedness, similarity measurement, WordNet similarity

Procedia PDF Downloads 204
1510 Quick Similarity Measurement of Binary Images via Probabilistic Pixel Mapping

Authors: Adnan A. Y. Mustafa

Abstract:

In this paper we present a quick technique to measure the similarity between binary images. The technique is based on a probabilistic mapping approach and is fast because only a minute percentage of the image pixels need to be compared to measure the similarity, and not the whole image. We exploit the power of the Probabilistic Matching Model for Binary Images (PMMBI) to arrive at an estimate of the similarity. We show that the estimate is a good approximation of the actual value, and the quality of the estimate can be improved further with increased image mappings. Furthermore, the technique is image size invariant; the similarity between big images can be measured as fast as that for small images. Examples of trials conducted on real images are presented.

Keywords: big images, binary images, image matching, image similarity

Procedia PDF Downloads 159
1509 Linguistic Trend in the Qur'anic Tafsir of 'Al Tahreer Wa Al Tanveer' by Sheikh Tahir Bin A'shur

Authors: Numan Hasan

Abstract:

We have tried to highlight the linguistic trend in the Qur’anic Tafsir of ‘Al Tahreer wa Al Tanveer’ by Sheikh Tahir Bin A’shur, the brightest linguistic commentator in the modern era. We have started studying the life of Bin A’shur and his contributions to the field of Qur’anic knowledge. We have also studied to focus on the linguistic approach of ‘Al Tahreer wa Al Tanveer’ and emphasized the importance of linguistic interpretations. We have tried to have a clear understanding about the features and characteristics of his Tafsir. We have also reflected on the methodological approach and linguistic reference of his interpretation. In the conclusion we presented the main results of a research.

Keywords: Sheikh Tahir Bin A’shur, tafsir, linguistics, interpretation, Islamic studies

Procedia PDF Downloads 340
1508 A Context-Sensitive Algorithm for Media Similarity Search

Authors: Guang-Ho Cha

Abstract:

This paper presents a context-sensitive media similarity search algorithm. One of the central problems regarding media search is the semantic gap between the low-level features computed automatically from media data and the human interpretation of them. This is because the notion of similarity is usually based on high-level abstraction but the low-level features do not sometimes reflect the human perception. Many media search algorithms have used the Minkowski metric to measure similarity between image pairs. However those functions cannot adequately capture the aspects of the characteristics of the human visual system as well as the nonlinear relationships in contextual information given by images in a collection. Our search algorithm tackles this problem by employing a similarity measure and a ranking strategy that reflect the nonlinearity of human perception and contextual information in a dataset. Similarity search in an image database based on this contextual information shows encouraging experimental results.

Keywords: context-sensitive search, image search, similarity ranking, similarity search

Procedia PDF Downloads 335
1507 Review and Suggestions of the Similarity between Employee and Its Workplace

Authors: Gi Ryung Song, Kyoung Seok Kim

Abstract:

This study reviewed the literature that focused on similarity of various characteristics such as values, personality, or demographics between employee and other elements in its organization for example employee with leader, job, and organization. We divided a body of this study into two parts and organized and demonstrated recent studies in first part. Three issues appeared in this part, which are statistical ways of measuring similarity, supervisor-subordinate similarity, and person-organization fit with person-job fit. In the latter part, based on the three issues of recent studies, we suggested three propositions about points that the recent studies missed or the studies did not orient. First proposition argued about the direction of similarity, which could also be interpreted as there is causal relation between employee and its workplace environments. Second, we suggested a consideration of eliminating common variance buried in one’s characteristics or its profiles. Third proposition was about the similarity of extra role behavior between individual and organization, and we treated this organization’s level of extra role behavior as a kind of its culture. In doing so, similarity of individual’s extra role behavior and organization’s has the meaning that individual’s congruence against their organization culture.

Keywords: similarity, person-organization fit, supervisor-subordinate similarity, literature review

Procedia PDF Downloads 247
1506 2D Fingerprint Performance for PubChem Chemical Database

Authors: Fatimah Zawani Abdullah, Shereena Mohd Arif, Nurul Malim

Abstract:

The study of molecular similarity search in chemical database is increasingly widespread, especially in the area of drug discovery. Similarity search is an application in the field of Chemoinformatics to measure the similarity between the molecular structure which is known as the query and the structure of chemical compounds in the database. Similarity search is also one of the approaches in virtual screening which involves computational techniques and scoring the probabilities of activity. The main objective of this work is to determine the best fingerprint when compared to the other five fingerprints selected in this study using PubChem chemical dataset. This paper will discuss the similarity searching process conducted using 6 types of descriptors, which are ECFP4, ECFC4, FCFP4, FCFC4, SRECFC4 and SRFCFC4 on 15 activity classes of PubChem dataset using Tanimoto coefficient to calculate the similarity between the query structures and each of the database structure. The results suggest that ECFP4 performs the best to be used with Tanimoto coefficient in the PubChem dataset.

Keywords: 2D fingerprints, Tanimoto, PubChem, similarity searching, chemoinformatics

Procedia PDF Downloads 260
1505 Similarity Based Membership of Elements to Uncertain Concept in Information System

Authors: M. Kamel El-Sayed

Abstract:

The process of determining the degree of membership for an element to an uncertain concept has been found in many ways, using equivalence and symmetry relations in information systems. In the case of similarity, these methods did not take into account the degree of symmetry between elements. In this paper, we use a new definition for finding the membership based on the degree of symmetry. We provide an example to clarify the suggested methods and compare it with previous methods. This method opens the door to more accurate decisions in information systems.

Keywords: information system, uncertain concept, membership function, similarity relation, degree of similarity

Procedia PDF Downloads 186
1504 The Latent Model of Linguistic Features in Korean College Students’ L2 Argumentative Writings: Syntactic Complexity, Lexical Complexity, and Fluency

Authors: Jiyoung Bae, Gyoomi Kim

Abstract:

This study explores a range of linguistic features used in Korean college students’ argumentative writings for the purpose of developing a model that identifies variables which predict writing proficiencies. This study investigated the latent variable structure of L2 linguistic features, including syntactic complexity, the lexical complexity, and fluency. One hundred forty-six university students in Korea participated in this study. The results of the study’s confirmatory factor analysis (CFA) showed that indicators of linguistic features from this study-provided a foundation for re-categorizing indicators found in extant research on L2 Korean writers depending on each latent variable of linguistic features. The CFA models indicated one measurement model of L2 syntactic complexity and L2 learners’ writing proficiency; these two latent factors were correlated with each other. Based on the overall findings of the study, integrated linguistic features of L2 writings suggested some pedagogical implications in L2 writing instructions.

Keywords: linguistic features, syntactic complexity, lexical complexity, fluency

Procedia PDF Downloads 139
1503 Correlation between Funding and Publications: A Pre-Step towards Future Research Prediction

Authors: Ning Kang, Marius Doornenbal

Abstract:

Funding is a very important – if not crucial – resource for research projects. Usually, funding organizations will publish a description of the funded research to describe the scope of the funding award. Logically, we would expect research outcomes to align with this funding award. For that reason, we might be able to predict future research topics based on present funding award data. That said, it remains to be shown if and how future research topics can be predicted by using the funding information. In this paper, we extract funding project information and their generated paper abstracts from the Gateway to Research database as a group, and use the papers from the same domains and publication years in the Scopus database as a baseline comparison group. We annotate both the project awards and the papers resulting from the funded projects with linguistic features (noun phrases), and then calculate tf-idf and cosine similarity between these two set of features. We show that the cosine similarity between the project-generated papers group is bigger than the project-baseline group, and also that these two groups of similarities are significantly different. Based on this result, we conclude that the funding information actually correlates with the content of future research output for the funded project on the topical level. How funding really changes the course of science or of scientific careers remains an elusive question.

Keywords: natural language processing, noun phrase, tf-idf, cosine similarity

Procedia PDF Downloads 216
1502 Translation and Sociolinguistics of Classical Books

Authors: Laura de Almeida

Abstract:

This paper aims to present research involving the translation of classical books originally in English and translated into the Portuguese language. The objective is to analyze the linguistic varieties evident and how they appear in the other language the work was translated into. We based our study on the sociolinguistics theory, more specifically, the study of the Black English Vernacular. Our methodology is built on collecting data from the speech characters of the Black English Vernacular from some books such as The Adventures of Huckleberry Finn by Mark Twain. On doing so, we compare the two versions of a book and how they reflected the linguistic variety. Our purpose is to show that some translators do not worry when dealing with linguistic variety. In other words, they just translate the story without taking into account some important linguistic aspects which need attention, such as language variation.

Keywords: classical books, linguistic variation, sociolinguistics, translation

Procedia PDF Downloads 367
1501 Agglomerative Hierarchical Clustering Using the Tθ Family of Similarity Measures

Authors: Salima Kouici, Abdelkader Khelladi

Abstract:

In this work, we begin with the presentation of the Tθ family of usual similarity measures concerning multidimensional binary data. Subsequently, some properties of these measures are proposed. Finally, the impact of the use of different inter-elements measures on the results of the Agglomerative Hierarchical Clustering Methods is studied.

Keywords: binary data, similarity measure, Tθ measures, agglomerative hierarchical clustering

Procedia PDF Downloads 443
1500 Empirical Study of Partitions Similarity Measures

Authors: Abdelkrim Alfalah, Lahcen Ouarbya, John Howroyd

Abstract:

This paper investigates and compares the performance of four existing distances and similarity measures between partitions. The partition measures considered are Rand Index (RI), Adjusted Rand Index (ARI), Variation of Information (VI), and Normalised Variation of Information (NVI). This work investigates the ability of these partition measures to capture three predefined intuitions: the variation within randomly generated partitions, the sensitivity to small perturbations, and finally the independence from the dataset scale. It has been shown that the Adjusted Rand Index performed well overall, with regards to these three intuitions.

Keywords: clustering, comparing partitions, similarity measure, partition distance, partition metric, similarity between partitions, clustering comparison.

Procedia PDF Downloads 146
1499 Functions and Pragmatic Aspects of English Nonsense

Authors: Natalia V. Ursul

Abstract:

In linguistic studies, the question of nonsense is attracting increasing interest. Nonsense is usually defined as spoken or written words that have no meaning. However, this definition is likely to be outdated as any speech act is generated due to the speaker’s pragmatic reasons, thus it cannot be purely illogical or meaningless. In the current paper a new working definition of nonsense as a linguistic medium will be formulated; moreover, the pragmatic peculiarities of newly coined linguistic patterns and possible ways of their interpretation will be discussed.

Keywords: nonsense, nonse verse, pragmatics, speech act

Procedia PDF Downloads 486
1498 Linguistic Summarization of Structured Patent Data

Authors: E. Y. Igde, S. Aydogan, F. E. Boran, D. Akay

Abstract:

Patent data have an increasingly important role in economic growth, innovation, technical advantages and business strategies and even in countries competitions. Analyzing of patent data is crucial since patents cover large part of all technological information of the world. In this paper, we have used the linguistic summarization technique to prove the validity of the hypotheses related to patent data stated in the literature.

Keywords: data mining, fuzzy sets, linguistic summarization, patent data

Procedia PDF Downloads 244
1497 A Relational Approach to Adverb Use in Interactions

Authors: Guillaume P. Fernandez

Abstract:

Individual language use is a matter of choice in particular interactions. The paper proposes a conceptual and theoretical framework with methodological consideration to develop how language produced in dyadic relations is to be considered and situated in the larger social configuration the interaction is embedded within. An integrated and comprehensive view is taken: social interactions are expected to be ruled by a normative context, defined by the chain of interdependences that structures the personal network. In this approach, the determinants of discursive practices are not only constrained by the moment of production and isolated from broader influences. Instead, the position the individual and the dyad have in the personal network influences the discursive practices in a twofold manner: on the one hand, the network limits the access to linguistic resources available within it, and, on the other hand, the structure of the network influences the agency of the individual, by the social control inherent to particular network characteristics. Concretely, we investigate how and to what extent consistent ego is from one interaction to another in his or her use of adverbs. To do so, social network analysis (SNA) methods are mobilized. Participants (N=130) are college students recruited in the french speaking part of Switzerland. The personal network of significant ones of each individual is created using name generators and edge interpreters, with a focus on social support and conflict. For the linguistic parts, respondents were asked to record themselves with five of their close relations. From the recordings, we computed an average similarity score based on the adverb used across interactions. In terms of analyses, two are envisaged: First, OLS regressions including network-level measures, such as density and reciprocity, and individual-level measures, such as centralities, are performed to understand the tenets of linguistic similarity from one interaction to another. The second analysis considers each social tie as nested within ego networks. Multilevel models are performed to investigate how the different types of ties may influence the likelihood to use adverbs, by controlling structural properties of the personal network. Primary results suggest that the more cohesive the network, the less likely is the individual to change his or her manner of speaking, and social support increases the use of adverbs in interactions. While promising results emerge, further research should consider a longitudinal approach to able the claim of causality.

Keywords: personal network, adverbs, interactions, social influence

Procedia PDF Downloads 28
1496 A Similarity Measure for Classification and Clustering in Image Based Medical and Text Based Banking Applications

Authors: K. P. Sandesh, M. H. Suman

Abstract:

Text processing plays an important role in information retrieval, data-mining, and web search. Measuring the similarity between the documents is an important operation in the text processing field. In this project, a new similarity measure is proposed. To compute the similarity between two documents with respect to a feature the proposed measure takes the following three cases into account: (1) The feature appears in both documents; (2) The feature appears in only one document and; (3) The feature appears in none of the documents. The proposed measure is extended to gauge the similarity between two sets of documents. The effectiveness of our measure is evaluated on several real-world data sets for text classification and clustering problems, especially in banking and health sectors. The results show that the performance obtained by the proposed measure is better than that achieved by the other measures.

Keywords: document classification, document clustering, entropy, accuracy, classifiers, clustering algorithms

Procedia PDF Downloads 479
1495 Comparing Deep Architectures for Selecting Optimal Machine Translation

Authors: Despoina Mouratidis, Katia Lida Kermanidis

Abstract:

Machine translation (MT) is a very important task in Natural Language Processing (NLP). MT evaluation is crucial in MT development, as it constitutes the means to assess the success of an MT system, and also helps improve its performance. Several methods have been proposed for the evaluation of (MT) systems. Some of the most popular ones in automatic MT evaluation are score-based, such as the BLEU score, and others are based on lexical similarity or syntactic similarity between the MT outputs and the reference involving higher-level information like part of speech tagging (POS). This paper presents a language-independent machine learning framework for classifying pairwise translations. This framework uses vector representations of two machine-produced translations, one from a statistical machine translation model (SMT) and one from a neural machine translation model (NMT). The vector representations consist of automatically extracted word embeddings and string-like language-independent features. These vector representations used as an input to a multi-layer neural network (NN) that models the similarity between each MT output and the reference, as well as between the two MT outputs. To evaluate the proposed approach, a professional translation and a "ground-truth" annotation are used. The parallel corpora used are English-Greek (EN-GR) and English-Italian (EN-IT), in the educational domain and of informal genres (video lecture subtitles, course forum text, etc.) that are difficult to be reliably translated. They have tested three basic deep learning (DL) architectures to this schema: (i) fully-connected dense, (ii) Convolutional Neural Network (CNN), and (iii) Long Short-Term Memory (LSTM). Experiments show that all tested architectures achieved better results when compared against those of some of the well-known basic approaches, such as Random Forest (RF) and Support Vector Machine (SVM). Better accuracy results are obtained when LSTM layers are used in our schema. In terms of a balance between the results, better accuracy results are obtained when dense layers are used. The reason for this is that the model correctly classifies more sentences of the minority class (SMT). For a more integrated analysis of the accuracy results, a qualitative linguistic analysis is carried out. In this context, problems have been identified about some figures of speech, as the metaphors, or about certain linguistic phenomena, such as per etymology: paronyms. It is quite interesting to find out why all the classifiers led to worse accuracy results in Italian as compared to Greek, taking into account that the linguistic features employed are language independent.

Keywords: machine learning, machine translation evaluation, neural network architecture, pairwise classification

Procedia PDF Downloads 102
1494 Tool for Determining the Similarity between Two Web Applications

Authors: Doru Anastasiu Popescu, Raducanu Dragos Ionut

Abstract:

In this paper the presentation of a tool which measures the similarity between two websites is made. The websites are compound only from webpages created with HTML. The tool uses three ways of calculating the similarity between two websites based on certain results already published. The first way compares all the webpages within a website, the second way compares a webpage with all the pages within the second website and the third way compares two webpages. Java programming language and technologies such as spring, Jsoup, log4j were used for the implementation of the tool.

Keywords: Java, Jsoup, HTM, spring

Procedia PDF Downloads 348
1493 Improving Similarity Search Using Clustered Data

Authors: Deokho Kim, Wonwoo Lee, Jaewoong Lee, Teresa Ng, Gun-Ill Lee, Jiwon Jeong

Abstract:

This paper presents a method for improving object search accuracy using a deep learning model. A major limitation to provide accurate similarity with deep learning is the requirement of huge amount of data for training pairwise similarity scores (metrics), which is impractical to collect. Thus, similarity scores are usually trained with a relatively small dataset, which comes from a different domain, causing limited accuracy on measuring similarity. For this reason, this paper proposes a deep learning model that can be trained with a significantly small amount of data, a clustered data which of each cluster contains a set of visually similar images. In order to measure similarity distance with the proposed method, visual features of two images are extracted from intermediate layers of a convolutional neural network with various pooling methods, and the network is trained with pairwise similarity scores which is defined zero for images in identical cluster. The proposed method outperforms the state-of-the-art object similarity scoring techniques on evaluation for finding exact items. The proposed method achieves 86.5% of accuracy compared to the accuracy of the state-of-the-art technique, which is 59.9%. That is, an exact item can be found among four retrieved images with an accuracy of 86.5%, and the rest can possibly be similar products more than the accuracy. Therefore, the proposed method can greatly reduce the amount of training data with an order of magnitude as well as providing a reliable similarity metric.

Keywords: visual search, deep learning, convolutional neural network, machine learning

Procedia PDF Downloads 185
1492 Literature, Culture, and Shakespeare's Dramatization of Linguistic Scenes

Authors: Cheang Wai Fong

Abstract:

This paper takes language and its interconnection with power as a point of departure to analyze some linguistic scenes played up by William Shakespeare. By placing language into the big picture of literature and culture, and by reexamining the etymological relations between the three terms, language, literature and culture, the paper attempts to formulate an understanding of their more expansive meanings. It compares their respective traditional notions with their modern concepts brought up by literary critics, anthropologists and sociolinguists. Then it uses these expansive meanings to reinterpret Shakespeare’s linguistic scenes featuring language contentions, and to discuss Shakespeare’s success as a signification of literature’s role within the linguistic and cultural context of Elizabethan England.

Keywords: culture, language, literature, shakespeare

Procedia PDF Downloads 505
1491 Impact of Similarity Ratings on Human Judgement

Authors: Ian A. McCulloh, Madelaine Zinser, Jesse Patsolic, Michael Ramos

Abstract:

Recommender systems are a common artificial intelligence (AI) application. For any given input, a search system will return a rank-ordered list of similar items. As users review returned items, they must decide when to halt the search and either revise search terms or conclude their requirement is novel with no similar items in the database. We present a statistically designed experiment that investigates the impact of similarity ratings on human judgement to conclude a search item is novel and halt the search. 450 participants were recruited from Amazon Mechanical Turk to render judgement across 12 decision tasks. We find the inclusion of ratings increases the human perception that items are novel. Percent similarity increases novelty discernment when compared with star-rated similarity or the absence of a rating. Ratings reduce the time to decide and improve decision confidence. This suggests the inclusion of similarity ratings can aid human decision-makers in knowledge search tasks.

Keywords: ratings, rankings, crowdsourcing, empirical studies, user studies, similarity measures, human-centered computing, novelty in information retrieval

Procedia PDF Downloads 90
1490 Text Similarity in Vector Space Models: A Comparative Study

Authors: Omid Shahmirzadi, Adam Lugowski, Kenneth Younge

Abstract:

Automatic measurement of semantic text similarity is an important task in natural language processing. In this paper, we evaluate the performance of different vector space models to perform this task. We address the real-world problem of modeling patent-to-patent similarity and compare TFIDF (and related extensions), topic models (e.g., latent semantic indexing), and neural models (e.g., paragraph vectors). Contrary to expectations, the added computational cost of text embedding methods is justified only when: 1) the target text is condensed; and 2) the similarity comparison is trivial. Otherwise, TFIDF performs surprisingly well in other cases: in particular for longer and more technical texts or for making finer-grained distinctions between nearest neighbors. Unexpectedly, extensions to the TFIDF method, such as adding noun phrases or calculating term weights incrementally, were not helpful in our context.

Keywords: big data, patent, text embedding, text similarity, vector space model

Procedia PDF Downloads 138
1489 Static vs. Stream Mining Trajectories Similarity Measures

Authors: Musaab Riyadh, Norwati Mustapha, Dina Riyadh

Abstract:

Trajectory similarity can be defined as the cost of transforming one trajectory into another based on certain similarity method. It is the core of numerous mining tasks such as clustering, classification, and indexing. Various approaches have been suggested to measure similarity based on the geometric and dynamic properties of trajectory, the overlapping between trajectory segments, and the confined area between entire trajectories. In this article, an evaluation of these approaches has been done based on computational cost, usage memory, accuracy, and the amount of data which is needed in advance to determine its suitability to stream mining applications. The evaluation results show that the stream mining applications support similarity methods which have low computational cost and memory, single scan on data, and free of mathematical complexity due to the high-speed generation of data.

Keywords: global distance measure, local distance measure, semantic trajectory, spatial dimension, stream data mining

Procedia PDF Downloads 368
1488 Emerging Virtual Linguistic Landscape Created by Members of Language Community in TikTok

Authors: Kai Zhu, Shanhua He, Yujiao Chang

Abstract:

This paper explores the virtual linguistic landscape of an emerging virtual language community in TikTok, a language community realizing immediate and non-immediate communication without a precise Spatio-temporal domain or a specific socio-cultural boundary or interpersonal network. This kind of language community generates a large number and various forms of virtual linguistic landscape, with which we conducted a virtual ethnographic survey together with telephone interviews to collect data from coping. We have been following two language communities in TikTok for several months so that we can illustrate the composition of the two language communities and some typical virtual language landscapes in both language communities first. Then we try to explore the reasons why and how they are formed through the organization, transcription, and analysis of the interviews. Our analysis reveals the richness and diversity of the virtual linguistic landscape, and finally, we summarize some of the characteristics of this language community.

Keywords: virtual linguistic landscape, virtual language community, virtual ethnographic survey, TikTok

Procedia PDF Downloads 72
1487 A Linguistic Relativity Appraisal of an African Drama: The Lion and The Jewel

Authors: T. O. Adekunle, R. L. Makhubu, C. N. Ngwane

Abstract:

This research was designed to assess the validity of the Sapir Whorf hypothesis in relation to the linguistic and cultural notions of the Yoruba and Zulu language speakers’ via the evaluation of the culture enriched dramatic text The Lion and The Jewel by Wole Soyinka. The study queried both the hypothesis’ strong version, (language governs thought: linguistic classifications restrain and influence mental classifications); and its weak version, (linguistic classifications and their use influence thought as well as some other classes of non-linguistic activities) and their possible reliability. Participants were purposively selected and their ages ranged from 16-46 years old. The participants amounted to 38 (18 Yoruba and 20 Zulu) students of DUT who all speak both English and Zulu (Zulu participants) and English and Yoruba (Yoruba participants) and the mixed methods approach was used. Thus with the use of questionnaire and interviews the research questions were answered and the findings provided support for validity of the linguistic relativity hypothesis, languages indeed influence thought. The findings also revealed that linguistic influence on cognition is not limited to different language users alone, but also same language speakers per level of exposure to other languages and concepts.

Keywords: culture, cognition, DUT, language, linguistic relativity hypothesis, Sapir-Whorf hypothesis, The Lion and The Jewel, thought, Wole Soyinka, Yoruba, Zulu

Procedia PDF Downloads 422
1486 A Critical Discourse Analysis of the Impact of the Linguistic Behavior of the Soccer Moroccan Coach in Light of Motivation Theory and Discursive Psychology

Authors: Abdelaadim Bidaoui

Abstract:

As one of the most important linguistic inquiries, the topic of the intertwined relationship between language, the mind, and the world has attracted many scholars. In the fifties, Sapir and Whorf advocated the hypothesis that language shapes our cultural realities as an early attempt to provide answers to this linguistic inquiry. Later, discursive psychology views the linguistic behavior as “a dynamic form of social practice which constructs the social world, individual selves and identity.” (Jorgensen & Phillips 2002, 118). Discursive psychology also considers discourse as a trigger of social action and change. Building on discursive psychology and motivation theory, this paper examines the impact of linguistic behavior of the Moroccan coach Walid Reggragui on the Moroccan team’s exceptional performance in Qatar 2022 Soccer World Cup. The data used in the research is based on interviews conducted by the Moroccan coach prior and during the World Cup. Using a discourse analysis of the linguistic behavior of Reggragui, this paper shows how the linguistic behavior of Reggragui provided support for the three psychological needs: sense of belonging, competence, and autonomy. As any CDA research, this paper uses a triangulated theoretical framework that includes language, cognition and society.

Keywords: critical discourse analysis, motivation theory, discursive psychology, linguistic behavior

Procedia PDF Downloads 56