Search results for: bilingual semantic processing
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 4124

Search results for: bilingual semantic processing

4064 The Contribution of Corpora to the Investigation of Cross-Linguistic Equivalence in Phraseology: A Contrastive Analysis of Russian and Italian Idioms

Authors: Federica Floridi

Abstract:

The long tradition of contrastive idiom research has essentially been focusing on three domains: the comparison of structural types of idioms (e.g. verbal idioms, idioms with noun-phrase structure, etc.), the description of idioms belonging to the same thematic groups (Sachgruppen), the identification of different types of cross-linguistic equivalents (i.e. full equivalents, partial equivalents, phraseological parallels, non-equivalents). The diastratic, diachronic and diatopic aspects of the compared idioms, as well as their syntactic, pragmatic and semantic properties, have been rather ignored. Corpora (both monolingual and parallel) give the opportunity to investigate the actual use of correlating idioms in authentic texts of L1 and L2. Adopting the corpus-based approach, it is possible to draw attention to the frequency of occurrence of idioms, their syntactic embedding, their potential syntactic transformations (e.g., nominalization, passivization, relativization, etc.), their combinatorial possibilities, the variations of their lexical structure, their connotations in terms of stylistic markedness or register. This paper aims to present the results of a contrastive analysis of Russian and Italian idioms referring to the concepts of ‘beginning’ and ‘end’, that has been carried out by using the Russian National Corpus and the ‘La Repubblica’ corpus. Beyond the digital corpora, bilingual dictionaries, like Skvorcova - Majzel’, Dobrovol’skaja, Kovalev, Čerdanceva, as well as monolingual resources, have been consulted. The study has shown that many of the idioms that have been traditionally indicated as cross-linguistic equivalents on bilingual dictionaries cannot be considered correspondents. The findings demonstrate that even those idioms, that are formally identical in Russian and Italian and are presumably derived from the same source (e.g., conceptual metaphor, Bible, classical mythology, World literature), exhibit differences regarding usage. The ultimate purpose of this article is to highlight that it is necessary to review and improve the existing bilingual dictionaries considering the empirical data collected in corpora. The materials gathered in this research can contribute to this sense.

Keywords: corpora, cross-linguistic equivalence, idioms, Italian, Russian

Procedia PDF Downloads 116
4063 Evaluation and Compression of Different Language Transformer Models for Semantic Textual Similarity Binary Task Using Minority Language Resources

Authors: Ma. Gracia Corazon Cayanan, Kai Yuen Cheong, Li Sha

Abstract:

Training a language model for a minority language has been a challenging task. The lack of available corpora to train and fine-tune state-of-the-art language models is still a challenge in the area of Natural Language Processing (NLP). Moreover, the need for high computational resources and bulk data limit the attainment of this task. In this paper, we presented the following contributions: (1) we introduce and used a translation pair set of Tagalog and English (TL-EN) in pre-training a language model to a minority language resource; (2) we fine-tuned and evaluated top-ranking and pre-trained semantic textual similarity binary task (STSB) models, to both TL-EN and STS dataset pairs. (3) then, we reduced the size of the model to offset the need for high computational resources. Based on our results, the models that were pre-trained to translation pairs and STS pairs can perform well for STSB task. Also, having it reduced to a smaller dimension has no negative effect on the performance but rather has a notable increase on the similarity scores. Moreover, models that were pre-trained to a similar dataset have a tremendous effect on the model’s performance scores.

Keywords: semantic matching, semantic textual similarity binary task, low resource minority language, fine-tuning, dimension reduction, transformer models

Procedia PDF Downloads 175
4062 A Secure System for Handling Information from Heterogeous Sources

Authors: Shoohira Aftab, Hammad Afzal

Abstract:

Information integration is a well known procedure to provide consolidated view on sets of heterogeneous information sources. It not only provides better statistical analysis of information but also facilitates users to query without any knowledge on the underlying heterogeneous information sources The problem of providing a consolidated view of information can be handled using Semantic data (information stored in such a way that is understandable by machines and integrate-able without manual human intervention). However, integrating information using semantic web technology without any access management enforced, will results in increase of privacy and confidentiality concerns. In this research we have designed and developed a framework that would allow information from heterogeneous formats to be consolidated, thus resolving the issue of interoperability. We have also devised an access control system for defining explicit privacy constraints. We designed and applied our framework on both semantic and non-semantic data from heterogeneous resources. Our approach is validated using scenario based testing.

Keywords: information integration, semantic data, interoperability, security, access control system

Procedia PDF Downloads 319
4061 Translanguaging as a Decolonial Move in South African Bilingual Classrooms

Authors: Malephole Philomena Sefotho

Abstract:

Nowadays, it is a fact that the majority of people, worldwide, are bilingual rather than monolingual due to the surge of globalisation and mobility. Consequently, bilingual education is a topical issue of discussion among researchers. Several studies that have focussed on it have highlighted the importance and need for incorporating learners’ linguistic repertoires in multilingual classrooms and move away from the colonial approach which is a monolingual bias – one language at a time. Researchers pointed out that a systematic approach that involves the concurrent use of languages and not a separation of languages must be implemented in bilingual classroom settings. Translanguaging emerged as a systematic approach that assists learners to make meaning of their world and it involves allowing learners to utilize all their linguistic resources in their classrooms. The South African language policy also room for diverse languages use in bi/multilingual classrooms. This study, therefore, sought to explore how teachers apply translanguaging in bilingual classrooms in incorporating learners’ linguistic repertoires. It further establishes teachers’ perspectives in the use of more than one language in teaching and learning. The participants for this study were language teachers who teach at bilingual primary schools in Johannesburg in South Africa. Semi-structured interviews were conducted to establish their perceptions on the concurrent use of languages. Qualitative research design was followed in analysing data. The findings showed that teachers were reluctant to allow translanguaging to take place in their classrooms even though they realise the importance thereof. Not allowing bilingual learners to use their linguistic repertoires has resulted in learners’ negative attitude towards their languages and contributed in learners’ loss of their identity. This article, thus recommends a drastic change to decolonised approaches in teaching and learning in multilingual settings and translanguaging as a decolonial move where learners are allowed to translanguage freely in their classroom settings for better comprehension and making meaning of concepts and/or related ideas. It further proposes continuous conversations be encouraged to bring eminent cultural and linguistic genocide to a halt.

Keywords: bilingualism, decolonisation, linguistic repertoires, translanguaging

Procedia PDF Downloads 144
4060 Semantic Textual Similarity on Contracts: Exploring Multiple Negative Ranking Losses for Sentence Transformers

Authors: Yogendra Sisodia

Abstract:

Researchers are becoming more interested in extracting useful information from legal documents thanks to the development of large-scale language models in natural language processing (NLP), and deep learning has accelerated the creation of powerful text mining models. Legal fields like contracts benefit greatly from semantic text search since it makes it quick and easy to find related clauses. After collecting sentence embeddings, it is relatively simple to locate sentences with a comparable meaning throughout the entire legal corpus. The author of this research investigated two pre-trained language models for this task: MiniLM and Roberta, and further fine-tuned them on Legal Contracts. The author used Multiple Negative Ranking Loss for the creation of sentence transformers. The fine-tuned language models and sentence transformers showed promising results.

Keywords: legal contracts, multiple negative ranking loss, natural language inference, sentence transformers, semantic textual similarity

Procedia PDF Downloads 68
4059 Teacher Education in a Bilingual Perspective: Brazilian Sign Language and Portuguese

Authors: Neuma Chaveiro, Juliana Guimarães Faria

Abstract:

Introduction: The thematic that guides this study is teacher training for the teaching of sign language in a perspective of bilingual education – specifically aimed at Brazilian public schools that offer inclusive education, and that have, among its students, deaf children who use Brazilian Sign Language as a means of communication and expression. In the Teacher Training Course for Letters/Libras at the Universidade Federal de Goiás/UFG, we developed a bilingual education project for the deaf, linked to PIBID (Institutional Scholarship for Teaching Initiation Program), funded by the Brazilian Federal Government through CAPES (Coordination for the Improvement of Higher Education Personnel). Goals: to provide the education of higher education teachers to work in public schools in basic education and to insert students from the UFG’s Letters/Libras course in the school’s daily life, giving them the opportunity for the creation and participation in methodological experiences and of teaching practices in order to overcome the problems identified in the teaching-learning process of deaf students, in a bilingual perspective, associating Libras (Brazilian Sign Language) and Portuguese. Methodology: qualitative approach and research-action, prioritizing action – reflection – action of the people involved. The Letters-Libras PIBID of the College of Letters/UFG, in this qualitative context, is guided by the assumptions of investigation-action to contribute to the education of the Libras teacher. Results: production of studies and researches in the area of education, professionalization and teaching practice for the degree holder in Letters: Libras; b) studies, research and training in bilingual education; c) clarification and discussion of the myths that permeate the reality of users of sign languages; d) involving students in the development of didactic materials for bilingual education. Conclusion: the PIBID Project Letters/Libras allows, both to the basic education school and to the teachers in training for the teaching of Libras, an integrated and collective work partnership, with discussions and changes in relation to bilingual education for the deaf and the teaching of Libras.

Keywords: deaf, sign language, teacher training, educacion

Procedia PDF Downloads 261
4058 Assessing Language Dominance in Mexican Deaf Signers with the Bilingual Language Profile (BLP)

Authors: E. Mendoza, D. Jackson-Maldonado, G. Avecilla-Ramírez, A. Mondaca

Abstract:

Assessing language proficiency is a major issue in psycholinguistic research. There are multiple tools that measure language dominance and language proficiency in hearing bilinguals, however, this is not the case for Deaf bilinguals. Specifically, there are few, if not none, assessment tools useful in the description of the multilingual abilities of Mexican Deaf signers. Because of this, the linguistic characteristics of Mexican Deaf population have been poorly described. This paper attempts to explain the necessary changes done in order to adapt the Bilingual Language Profile (BLP) to Mexican Sign Language (LSM) and written/oral Spanish. BLP is a Self-Evaluation tool that has been adapted and translated to several oral languages, but not to sign languages. Lexical, syntactic, cultural, and structural changes were applied to the BLP. 35 Mexican Deaf signers participated in a pilot study. All of them were enrolled in Higher Education programs. BLP was presented online in written Spanish via Google Forms. No additional information in LSM was provided. Results show great heterogeneity as it is expected of Deaf populations and BLP seems to be a useful tool to create a bilingual profile of the Mexican Deaf population. This is a first attempt to adapt a widely tested tool in bilingualism research to sign language. Further modifications need to be done.

Keywords: deaf bilinguals, assessment tools, bilingual language profile, mexican sign language

Procedia PDF Downloads 122
4057 DocPro: A Framework for Processing Semantic and Layout Information in Business Documents

Authors: Ming-Jen Huang, Chun-Fang Huang, Chiching Wei

Abstract:

With the recent advance of the deep neural network, we observe new applications of NLP (natural language processing) and CV (computer vision) powered by deep neural networks for processing business documents. However, creating a real-world document processing system needs to integrate several NLP and CV tasks, rather than treating them separately. There is a need to have a unified approach for processing documents containing textual and graphical elements with rich formats, diverse layout arrangement, and distinct semantics. In this paper, a framework that fulfills this unified approach is presented. The framework includes a representation model definition for holding the information generated by various tasks and specifications defining the coordination between these tasks. The framework is a blueprint for building a system that can process documents with rich formats, styles, and multiple types of elements. The flexible and lightweight design of the framework can help build a system for diverse business scenarios, such as contract monitoring and reviewing.

Keywords: document processing, framework, formal definition, machine learning

Procedia PDF Downloads 184
4056 Speech Perception by Monolingual and Bilingual Dravidian Speakers under Adverse Listening Conditions

Authors: S. B. Rathna Kumar, Sale Kranthi, Sandya K. Varudhini

Abstract:

The precise perception of spoken language is influenced by several variables, including the listeners’ native language, distance between speaker and listener, reverberation and background noise. When noise is present in an acoustic environment, it masks the speech signal resulting in reduction in the redundancy of the acoustic and linguistic cues of speech. There is strong evidence that bilinguals face difficulty in speech perception for their second language compared with monolingual speakers under adverse listening conditions such as presence of background noise. This difficulty persists even for speakers who are highly proficient in their second language and is greater in those who have learned the second language later in life. The present study aimed to assess the performance of monolingual (Telugu speaking) and bilingual (Tamil as first language and Telugu as second language) speakers on Telugu speech perception task under quiet and noisy environments. The results indicated that both the groups performed similar in both quiet and noisy environments. The findings of the present study are not in accordance with the findings of previous studies which strongly report poorer speech perception in adverse listening conditions such as noise with bilingual speakers for their second language compared with monolinguals.

Keywords: monolingual, bilingual, second language, speech perception, quiet, noise

Procedia PDF Downloads 363
4055 Crossover Memories and Code-Switching in the Narratives of Arabic-Hebrew and Hebrew-English Bilingual Adults in Israel

Authors: Amani Jaber-Awida

Abstract:

This study examines two bilingual phenomena in the narratives of Arabic Hebrew and Hebrew-English bilingual adults in Israel: CO memories and code-switching (CS). The study examined these phenomena in the context of autobiographical memory, using a cue word technique. Student experimenters held two sessions in the homes of the participants. In separate language sessions, the participant was asked to look first at each of 16 cue words and then to state a concrete memory. After stating the memory, participants reported whether their memories were in the same language of the experiment session or different. Memories were classified as ‘Crossovers’ (CO) or ‘Same Language’ (SL) according to participants' self-reports. Participants were also required to elaborate about the setting, interlocutors and other languages involved in the specific memory. Beyond replicating the procedure of cuing technique, one memory from a specific lifespan period was chosen per participant, and the participant was required to provide further details about it. For the more detailed memories, CS count was conducted. Both bilingual groups confirmed the Reminiscence Bump phenomenon, retrieving more memories in the 10-30 age period. CO memories prevailed in second language sessions (L2). Same language memories were more abundant in first language sessions (L1). Higher CS frequency was found in L2 sessions. Finally, as predicted, 'individual' CS was prevalent in L2 sessions, but 'community-based' CS was not higher in L1 sessions. The two bilingual measures in this study, crossovers, and CS came from different research traditions, the former from an experimental paradigm in the psychology of autobiographical memory based on self-reported judgments, the latter a behavioral measure from linguistics. This merger of approaches offers new insight into the field of bilingual autobiographical memory. In addition, the study attempted to shed light on the investigation of motivations for CS, beginning with Walters’ SPPL Model and concluding with a distinction between ‘community-based’ and individual motivations.

Keywords: bilinguals, code-switching, crossover memories, narratives

Procedia PDF Downloads 144
4054 An Online Corpus-Based Bilingual Collocations Dictionary for Second/Foreign Language Learners

Authors: Adriane Orenha-Ottaiano

Abstract:

Collocations are conventionalized, recurrent and arbitrary lexical combinations. Due to the fact that they are highly specific for a particular language and may be contextually restricted, collocations pose a problem to EFL/ESL learners with regard to production or encoding. Taking that into account, the compilation of monolingual and bilingual collocations dictionaries for the referred audience is highly crucial and significant. Thus, the aim of this paper is to discuss the importance of the compilation of an Online Corpus-based Bilingual Collocations Dictionary, in the English-Portuguese and Portuguese-English directions. On a first phase, with the use of WordSmith Tools, the collocations were extracted from a Translation Learner Corpus (TLC), a parallel corpus made up of university students’ translations in the Portuguese-English direction, with approximately 100,000 words. In a second stage, based on the keywords analyzed from the TLC, more collocational patterns were extracted using the Sketch Engine. In order to include more collocations as well as to ensure dictionary users will have access to more frequent and recurrent collocations, we also use the frequency list from The Corpus of Contemporary American English, with the purpose of extracting more patterns. The dictionary focuses on all types of collocations (verbal, noun, adjectival and adverbial collocations), in order to help the referred audience use them more accurately and productively – so far the dictionary has more than 330 entries, and more than 3,500 collocations extracted. The idea of having the proposed dictionary in online format may allow to incorporate more qualitatively and quantitatively collocational information. Besides, more examples may be included, different from conventional printed collocations dictionaries. Being the first bilingual collocations dictionary in the aforementioned directions, it is hoped to achieve the challenge of meeting learners’ collocational needs as the collocations have been selected according to learners’ difficulties regarding the use of collocations.

Keywords: Corpus-Based Collocations Dictionary, Collocations , Bilingual Collocations Dictionary, Collocational Patterns

Procedia PDF Downloads 285
4053 The Processing of Context-Dependent and Context-Independent Scalar Implicatures

Authors: Liu Jia’nan

Abstract:

The default accounts hold the view that there exists a kind of scalar implicature which can be processed without context and own a psychological privilege over other scalar implicatures which depend on context. In contrast, the Relevance Theorist regards context as a must because all the scalar implicatures have to meet the need of relevance in discourse. However, in Katsos, the experimental results showed: Although quantitatively the adults rejected under-informative utterance with lexical scales (context-independent) and the ad hoc scales (context-dependent) at almost the same rate, adults still regarded the violation of utterance with lexical scales much more severe than with ad hoc scales. Neither default account nor Relevance Theory can fully explain this result. Thus, there are two questionable points to this result: (1) Is it possible that the strange discrepancy is due to other factors instead of the generation of scalar implicature? (2) Are the ad hoc scales truly formed under the possible influence from mental context? Do the participants generate scalar implicatures with ad hoc scales instead of just comparing semantic difference among target objects in the under- informative utterance? In my Experiment 1, the question (1) will be answered by repetition of Experiment 1 by Katsos. Test materials will be showed by PowerPoint in the form of pictures, and each procedure will be done under the guidance of a tester in a quiet room. Our Experiment 2 is intended to answer question (2). The test material of picture will be transformed into the literal words in DMDX and the target sentence will be showed word-by-word to participants in the soundproof room in our lab. Reading time of target parts, i.e. words containing scalar implicatures, will be recorded. We presume that in the group with lexical scale, standardized pragmatically mental context would help generate scalar implicature once the scalar word occurs, which will make the participants hope the upcoming words to be informative. Thus if the new input after scalar word is under-informative, more time will be cost for the extra semantic processing. However, in the group with ad hoc scale, scalar implicature may hardly be generated without the support from fixed mental context of scale. Thus, whether the new input is informative or not does not matter at all, and the reading time of target parts will be the same in informative and under-informative utterances. People’s mind may be a dynamic system, in which lots of factors would co-occur. If Katsos’ experimental result is reliable, will it shed light on the interplay of default accounts and context factors in scalar implicature processing? We might be able to assume, based on our experiments, that one single dominant processing paradigm may not be plausible. Furthermore, in the processing of scalar implicature, the semantic interpretation and the pragmatic interpretation may be made in a dynamic interplay in the mind. As to the lexical scale, the pragmatic reading may prevail over the semantic reading because of its greater exposure in daily language use, which may also lead the possible default or standardized paradigm override the role of context. However, those objects in ad hoc scale are not usually treated as scalar membership in mental context, and thus lexical-semantic association of the objects may prevent their pragmatic reading from generating scalar implicature. Only when the sufficient contextual factors are highlighted, can the pragmatic reading get privilege and generate scalar implicature.

Keywords: scalar implicature, ad hoc scale, dynamic interplay, default account, Mandarin Chinese processing

Procedia PDF Downloads 290
4052 Discovering Semantic Links Between Synonyms, Hyponyms and Hypernyms

Authors: Ricardo Avila, Gabriel Lopes, Vania Vidal, Jose Macedo

Abstract:

This proposal aims for semantic enrichment between glossaries using the Simple Knowledge Organization System (SKOS) vocabulary to discover synonyms, hyponyms and hyperonyms semiautomatically, in Brazilian Portuguese, generating new semantic relationships based on WordNet. To evaluate the quality of this proposed model, experiments were performed by the use of two sets containing new relations, being one generated automatically and the other manually mapped by the domain expert. The applied evaluation metrics were precision, recall, f-score, and confidence interval. The results obtained demonstrate that the applied method in the field of Oil Production and Extraction (E&P) is effective, which suggests that it can be used to improve the quality of terminological mappings. The procedure, although adding complexity in its elaboration, can be reproduced in others domains.

Keywords: ontology matching, mapping enrichment, semantic web, linked data, SKOS

Procedia PDF Downloads 179
4051 Analysis of Expert Information in Linguistic Terms

Authors: O. Poleshchuk, E. Komarov

Abstract:

In this paper, semantic spaces with the properties of completeness and orthogonality (complete orthogonal semantic spaces) were chosen as models of expert evaluations. As the theoretical and practical studies have shown all the properties of complete orthogonal semantic spaces correspond to the thinking activity of experts that is why these semantic spaces were chosen for modeling. Two methods of construction such spaces were proposed. Models of comparative and fuzzy cluster analysis of expert evaluations were developed. The practical application of the developed methods has demonstrated their viability and validity.

Keywords: expert evaluation, comparative analysis, fuzzy cluster analysis, theoretical and practical studies

Procedia PDF Downloads 501
4050 An Event-Related Potentials Study on the Processing of English Subjunctive Mood by Chinese ESL Learners

Authors: Yan Huang

Abstract:

Event-related potentials (ERPs) technique helps researchers to make continuous measures on the whole process of language comprehension, with an excellent temporal resolution at the level of milliseconds. The research on sentence processing has developed from the behavioral level to the neuropsychological level, which brings about a variety of sentence processing theories and models. However, the applicability of these models to L2 learners is still under debate. Therefore, the present study aims to investigate the neural mechanisms underlying English subjunctive mood processing by Chinese ESL learners. To this end, English subject clauses with subjunctive moods are used as the stimuli, all of which follow the same syntactic structure, “It is + adjective + that … + (should) do + …” Besides, in order to examine the role that language proficiency plays on L2 processing, this research deals with two groups of Chinese ESL learners (18 males and 22 females, mean age=21.68), namely, high proficiency group (Group H) and low proficiency group (Group L). Finally, the behavioral and neurophysiological data analysis reveals the following findings: 1) Syntax and semantics interact with each other on the SECOND phase (300-500ms) of sentence processing, which is partially in line with the Three-phase Sentence Model; 2) Language proficiency does affect L2 processing. Specifically, for Group H, it is the syntactic processing that plays the dominant role in sentence processing while for Group L, semantic processing also affects the syntactic parsing during the THIRD phase of sentence processing (500-700ms). Besides, Group H, compared to Group L, demonstrates a richer native-like ERPs pattern, which further demonstrates the role of language proficiency in L2 processing. Based on the research findings, this paper also provides some enlightenment for the L2 pedagogy as well as the L2 proficiency assessment.

Keywords: Chinese ESL learners, English subjunctive mood, ERPs, L2 processing

Procedia PDF Downloads 107
4049 The Use of Semantic Mapping Technique When Teaching English Vocabulary at Saudi Schools

Authors: Mohammed Hassan Alshaikhi

Abstract:

Vocabulary is essential factor of learning and mastering any languages, and it helps learners to communicate with others and to be understood. The aim of this study was to examine whether semantic mapping technique was helpful in terms of improving student's English vocabulary learning comparing to the traditional technique. The students’ age was between 11 and 13 years old. There were 60 students in total who participated in this study. 30 students were in the treatment group (target vocabulary items were taught with semantic mapping). The other 30 students were in the control group (the target vocabulary items were taught by a traditional technique). A t-test was used with the results of pre-test and post-test in order to examine the outcomes of using semantic mapping when teaching vocabulary. The results showed that the vocabulary mastery in the treatment group was increased more than the control group.

Keywords: English language, learning vocabulary, Saudi teachers, semantic mapping, teaching vocabulary strategies

Procedia PDF Downloads 208
4048 Application of Semantic Technologies in Rapid Reconfiguration of Factory Systems

Authors: J. Zhang, K. Agyapong-Kodua

Abstract:

Digital factory based on visual design and simulation has emerged as a mainstream to reduce digital development life cycle. Some basic industrial systems are being integrated via semantic modelling, and products (P) matching process (P)-resource (R) requirements are designed to fulfill current customer demands. Nevertheless, product design is still limited to fixed product models and known knowledge of product engineers. Therefore, this paper presents a rapid reconfiguration method based on semantic technologies with PPR ontologies to reuse known and unknown knowledge. In order to avoid the influence of big data, our system uses a cloud manufactory and distributed database to improve the efficiency of querying meeting PPR requirements.

Keywords: semantic technologies, factory system, digital factory, cloud manufactory

Procedia PDF Downloads 459
4047 Building Semantic-Relatedness Thai Word Ontology for Semantic Analysis

Authors: Gridaphat Sriharee

Abstract:

Building semantic-relatedness Thai word ontology can be implemented by considering word forms and word meaning. This research proposed the methodology for building the ontology, which can be used for semantic analysis. There are four categories of words: similar form and the same meaning, similar form and similar meaning, different form and opposite/same meaning, and different form and similar meaning, which will be used as initial words for building the proposed ontology. Extension of the ontology can be augmented by considering the messages that give the meaning of the word from the dictionaries. Exploiting WordNet to construct the proposed ontology was investigated and discussed. The proposed ontology was evaluated for its quality. With the proposed methodology, it is promising that the constructed ontology is a well-defined ontology.

Keywords: Thai, NLP, semantics, ontology

Procedia PDF Downloads 60
4046 The Study of Formal and Semantic Errors of Lexis by Persian EFL Learners

Authors: Mohammad J. Rezai, Fereshteh Davarpanah

Abstract:

Producing a text in a language which is not one’s mother tongue can be a demanding task for language learners. Examining lexical errors committed by EFL learners is a challenging area of investigation which can shed light on the process of second language acquisition. Despite the considerable number of investigations into grammatical errors, few studies have tackled formal and semantic errors of lexis committed by EFL learners. The current study aimed at examining Persian learners’ formal and semantic errors of lexis in English. To this end, 60 students at three different proficiency levels were asked to write on 10 different topics in 10 separate sessions. Finally, 600 essays written by Persian EFL learners were collected, acting as the corpus of the study. An error taxonomy comprising formal and semantic errors was selected to analyze the corpus. The formal category covered misselection and misformation errors, while the semantic errors were classified into lexical, collocational and lexicogrammatical categories. Each category was further classified into subcategories depending on the identified errors. The results showed that there were 2583 errors in the corpus of 9600 words, among which, 2030 formal errors and 553 semantic errors were identified. The most frequent errors in the corpus included formal error commitment (78.6%), which were more prevalent at the advanced level (42.4%). The semantic errors (21.4%) were more frequent at the low intermediate level (40.5%). Among formal errors of lexis, the highest number of errors was devoted to misformation errors (98%), while misselection errors constituted 2% of the errors. Additionally, no significant differences were observed among the three semantic error subcategories, namely collocational, lexical choice and lexicogrammatical. The results of the study can shed light on the challenges faced by EFL learners in the second language acquisition process.

Keywords: collocational errors, lexical errors, Persian EFL learners, semantic errors

Procedia PDF Downloads 110
4045 Measuring Text-Based Semantics Relatedness Using WordNet

Authors: Madiha Khan, Sidrah Ramzan, Seemab Khan, Shahzad Hassan, Kamran Saeed

Abstract:

Measuring semantic similarity between texts is calculating semantic relatedness between texts using various techniques. Our web application (Measuring Relatedness of Concepts-MRC) allows user to input two text corpuses and get semantic similarity percentage between both using WordNet. Our application goes through five stages for the computation of semantic relatedness. Those stages are: Preprocessing (extracts keywords from content), Feature Extraction (classification of words into Parts-of-Speech), Synonyms Extraction (retrieves synonyms against each keyword), Measuring Similarity (using keywords and synonyms, similarity is measured) and Visualization (graphical representation of similarity measure). Hence the user can measure similarity on basis of features as well. The end result is a percentage score and the word(s) which form the basis of similarity between both texts with use of different tools on same platform. In future work we look forward for a Web as a live corpus application that provides a simpler and user friendly tool to compare documents and extract useful information.

Keywords: Graphviz representation, semantic relatedness, similarity measurement, WordNet similarity

Procedia PDF Downloads 205
4044 The Istrian Istrovenetian-Croatian Bilingual Corpus

Authors: Nada Poropat Jeletic, Gordana Hrzica

Abstract:

Bilingual conversational corpora represent a meaningful and the most comprehensive data source for investigating the genuine contact phenomena in non-monitored bi-lingual speech productions. They can be particularly useful for bilingual research since some features of bilingual interaction can hardly be accessed with more traditional methodologies (e.g., elicitation tasks). The method of language sampling provides the resources for describing language interaction in a bilingual community and/or in bilingual situations (e.g. code-switching, amount of languages used, number of languages used, etc.). To capture these phenomena in genuine communication situations, such sampling should be as close as possible to spontaneous communication. Bilingual spoken corpus design is methodologically demanding. Therefore this paper aims at describing the methodological challenges that apply to the corpus design of the conversational corpus design of the Istrian Istrovenetian-Croatian Bilingual Corpus. Croatian is the first official language of the Croatian-Italian officially bilingual Istria County, while Istrovenetian is a diatopic subvariety of Venetian, a longlasting lingua franca in the Istrian peninsula, the mother tongue of the members of the Italian National Community in Istria and the primary code of informal everyday communication among the Istrian Italophone population. Within the CLARIN infrastructure, TalkBank is being used, as it provides relevant procedures for designing and analyzing bilingual corpora. Furthermore, it allows public availability allows for easy replication of studies and cumulative progress as a research community builds up around the corpus, while the tools developed within the field of corpus linguistics enable easy retrieval and analysis of information. The method of language sampling employed is kept at the level of spontaneous communication, in order to maximise the naturalness of the collected conversational data. All speakers have provided written informed consent in which they agree to be recorded at a random point within the period of one month after signing the consent. Participants are administered a background questionnaire providing information about the socioeconomic status and the exposure and language usage in the participants social networks. Recording data are being transcribed, phonologically adapted within a standard-sized orthographic form, coded and segmented (speech streams are being segmented into communication units based on syntactic criteria) and are being marked following the CHAT transcription system and its associated CLAN suite of programmes within the TalkBank toolkit. The corpus consists of transcribed sound recordings of 36 bilingual speakers, while the target is to publish the whole corpus by the end of 2020, by sampling spontaneous conversations among approximately 100 speakers from all the bilingual areas of Istria for ensuring representativeness (the participants are being recruited across three generations of native bilingual speakers in all the bilingual areas of the peninsula). Conversational corpora are still rare in TalkBank, so the Corpus will contribute to BilingBank as a highly relevant and scientifically reliable resource for an internationally established and active research community. The impact of the research of communities with societal bilingualism will contribute to the growing body of research on bilingualism and multilingualism, especially regarding topics of language dominance, language attrition and loss, interference and code-switching etc.

Keywords: conversational corpora, bilingual corpora, code-switching, language sampling, corpus design methodology

Procedia PDF Downloads 109
4043 Multidimensional Item Response Theory Models for Practical Application in Large Tests Designed to Measure Multiple Constructs

Authors: Maria Fernanda Ordoñez Martinez, Alvaro Mauricio Montenegro

Abstract:

This work presents a statistical methodology for measuring and founding constructs in Latent Semantic Analysis. This approach uses the qualities of Factor Analysis in binary data with interpretations present on Item Response Theory. More precisely, we propose initially reducing dimensionality with specific use of Principal Component Analysis for the linguistic data and then, producing axes of groups made from a clustering analysis of the semantic data. This approach allows the user to give meaning to previous clusters and found the real latent structure presented by data. The methodology is applied in a set of real semantic data presenting impressive results for the coherence, speed and precision.

Keywords: semantic analysis, factorial analysis, dimension reduction, penalized logistic regression

Procedia PDF Downloads 411
4042 Ontology-Based Approach for Temporal Semantic Modeling of Social Networks

Authors: Souâad Boudebza, Omar Nouali, Faiçal Azouaou

Abstract:

Social networks have recently gained a growing interest on the web. Traditional formalisms for representing social networks are static and suffer from the lack of semantics. In this paper, we will show how semantic web technologies can be used to model social data. The SemTemp ontology aligns and extends existing ontologies such as FOAF, SIOC, SKOS and OWL-Time to provide a temporal and semantically rich description of social data. We also present a modeling scenario to illustrate how our ontology can be used to model social networks.

Keywords: ontology, semantic web, social network, temporal modeling

Procedia PDF Downloads 348
4041 Corpus-Based Neural Machine Translation: Empirical Study Multilingual Corpus for Machine Translation of Opaque Idioms - Cloud AutoML Platform

Authors: Khadija Refouh

Abstract:

Culture bound-expressions have been a bottleneck for Natural Language Processing (NLP) and comprehension, especially in the case of machine translation (MT). In the last decade, the field of machine translation has greatly advanced. Neural machine translation NMT has recently achieved considerable development in the quality of translation that outperformed previous traditional translation systems in many language pairs. Neural machine translation NMT is an Artificial Intelligence AI and deep neural networks applied to language processing. Despite this development, there remain some serious challenges that face neural machine translation NMT when translating culture bounded-expressions, especially for low resources language pairs such as Arabic-English and Arabic-French, which is not the case with well-established language pairs such as English-French. Machine translation of opaque idioms from English into French are likely to be more accurate than translating them from English into Arabic. For example, Google Translate Application translated the sentence “What a bad weather! It runs cats and dogs.” to “يا له من طقس سيء! تمطر القطط والكلاب” into the target language Arabic which is an inaccurate literal translation. The translation of the same sentence into the target language French was “Quel mauvais temps! Il pleut des cordes.” where Google Translate Application used the accurate French corresponding idioms. This paper aims to perform NMT experiments towards better translation of opaque idioms using high quality clean multilingual corpus. This Corpus will be collected analytically from human generated idiom translation. AutoML translation, a Google Neural Machine Translation Platform, is used as a custom translation model to improve the translation of opaque idioms. The automatic evaluation of the custom model will be compared to the Google NMT using Bilingual Evaluation Understudy Score BLEU. BLEU is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. Human evaluation is integrated to test the reliability of the Blue Score. The researcher will examine syntactical, lexical, and semantic features using Halliday's functional theory.

Keywords: multilingual corpora, natural language processing (NLP), neural machine translation (NMT), opaque idioms

Procedia PDF Downloads 107
4040 Pilot-free Image Transmission System of Joint Source Channel Based on Multi-Level Semantic Information

Authors: Linyu Wang, Liguo Qiao, Jianhong Xiang, Hao Xu

Abstract:

In semantic communication, the existing joint Source Channel coding (JSCC) wireless communication system without pilot has unstable transmission performance and can not effectively capture the global information and location information of images. In this paper, a pilot-free image transmission system of joint source channel based on multi-level semantic information (Multi-level JSCC) is proposed. The transmitter of the system is composed of two networks. The feature extraction network is used to extract the high-level semantic features of the image, compress the information transmitted by the image, and improve the bandwidth utilization. Feature retention network is used to preserve low-level semantic features and image details to improve communication quality. The receiver also is composed of two networks. The received high-level semantic features are fused with the low-level semantic features after feature enhancement network in the same dimension, and then the image dimension is restored through feature recovery network, and the image location information is effectively used for image reconstruction. This paper verifies that the proposed multi-level JSCC algorithm can effectively transmit and recover image information in both AWGN channel and Rayleigh fading channel, and the peak signal-to-noise ratio (PSNR) is improved by 1~2dB compared with other algorithms under the same simulation conditions.

Keywords: deep learning, JSCC, pilot-free picture transmission, multilevel semantic information, robustness

Procedia PDF Downloads 88
4039 Bilingual Identities of Kuwaiti Students at Universities with EMI

Authors: Marta Tryzna, Shahd Al Shammari

Abstract:

Though Modern Standard Arabic (MSA) is the only official language in GCC states, including Kuwait, and traditionally the preferred vehicle for literacy in the Arab countries, recent studies in Qatar and the UAE observe a growing role of English, particularly in literacy and knowledge transmission contexts. The present study examines the attitudes to Arabic and English and the use of both languages in literacy-related domains based on a sample of bilingual Arabic-English undergraduates (N=522) at a private university with EMI in Kuwait. The results indicate that Arabic (Kuwaiti dialect) is associated with familial interactions, Arabic-English bilingualism predominates in interactions with classmates, friends, on social media and at work, while English is prevalent in literacy-related contexts such as reading books, magazines, or online material, domains traditionally associated with MSA. Attitudes towards Arabic and English are equally positive according to the majority of the respondents, who report being comfortable expressing themselves and projecting their identity in both languages. No statistically significant differences were found comparing the importance of Arabic and English in the sample. Future trends were identified based on high agreement on the importance of speaking English with children and low agreement on speaking only Arabic at home. The study corroborates recently observed trends in the GCC favoring bilingualism across personal, academic and professional domains, with English becoming the preferred language of literacy among young bilingual Kuwaitis.

Keywords: bilingual, English, Arabic, EMI, identity

Procedia PDF Downloads 114
4038 Language Development and Growing Spanning Trees in Children Semantic Network

Authors: Somayeh Sadat Hashemi Kamangar, Fatemeh Bakouie, Shahriar Gharibzadeh

Abstract:

In this study, we target to exploit Maximum Spanning Trees (MST) of children's semantic networks to investigate their language development. To do so, we examine the graph-theoretic properties of word-embedding networks. The networks are made of words children learn prior to the age of 30 months as the nodes and the links which are built from the cosine vector similarity of words normatively acquired by children prior to two and a half years of age. These networks are weighted graphs and the strength of each link is determined by the numerical similarities of the two words (nodes) on the sides of the link. To avoid changing the weighted networks to the binaries by setting a threshold, constructing MSTs might present a solution. MST is a unique sub-graph that connects all the nodes in such a way that the sum of all the link weights is maximized without forming cycles. MSTs as the backbone of the semantic networks are suitable to examine developmental changes in semantic network topology in children. From these trees, several parameters were calculated to characterize the developmental change in network organization. We showed that MSTs provides an elegant method sensitive to capture subtle developmental changes in semantic network organization.

Keywords: maximum spanning trees, word-embedding, semantic networks, language development

Procedia PDF Downloads 107
4037 Making Waves: Preparing the Next Generation of Bilingual Medical Doctors

Authors: Edith Esparza-Young, Ángel M. Matos, Yaritza Gonzalez, Kirthana Sugunathevan

Abstract:

Introduction: This research describes the existing medical school program which supports a multicultural setting and bilingualism. The rise of Spanish speakers in the United States has led to the recruitment of bilingual medical students who can serve the evolving demographics. This paper includes anecdotal evidence, narratives and the latest research on the outcomes of supporting a multilingual academic experience in medical school and beyond. People in the United States will continue to need health care from physicians who have experience with multicultural competence. Physicians who are bilingual and possess effective communication skills will be in high demand. Methodologies: This research is descriptive. Through this descriptive research, the researcher will describe the qualities and characteristics of the existing medical school programs, curriculum, and student services. Additionally, the researcher will shed light on the existing curriculum in the medical school and also describe specific programs which help to serve as safety nets to support diverse populations. The method included observations of the existing program and the implementation of the medical school program, specifically the Accelerated Review Program, the Language Education and Professional Communication Program, student organizations and the Global Health Institute. Concluding Statement: This research identified and described characteristics of the medical school’s program. The research explained and described the current and present phenomenon of this medical program, which has focused on increasing the graduation of bilingual and minority physicians. The findings are based on observations of the curriculum, programs and student organizations which evolves and remains innovative to stay current with student enrollment.

Keywords: bilingual, English, medicine, doctor

Procedia PDF Downloads 118
4036 Didacticization of Code Switching as a Tool for Bilingual Education in Mali

Authors: Kadidiatou Toure

Abstract:

Mali has started experimentation of teaching the national languages at school through the convergent pedagogy in 1987. Then, it is in 1994 that it will become widespread with eleven of the thirteen former national languages used at primary school. The aim was to improve the Malian educational system because the use of French as the only medium of instruction was considered a contributing factor to the significant number of student dropouts and the high rate of repetition. The Convergent pedagogy highlights the knowledge acquired by children at home, their vision of the world and especially the knowledge they have of their mother tongue. That pedagogy requires the use of a specific medium only during classroom practices and teachers have been trained in this sense. The specific medium depends on the learning content, which sometimes is French, other times, it is the national language. Research has shown that bilingual learners do not only use the required medium in their learning activities, but they code switch. It is part of their learning processes. Currently, many scholars agree on the importance of CS in bilingual classes, and teachers have been told about the necessity of integrating it into their classroom practices. One of the challenges of the Malian bilingual education curriculum is the question of ‘effective languages management’. Theoretically, depending on the classrooms, an average have been established for each of the involved language. Following that, teachers make use of CS differently, sometimes, it favors the learners, other times, it contributes to the development of some linguistic weaknesses. The present research tries to fill that gap through a tentative model of didactization of CS, which simply means the practical management of the languages involved in the bilingual classrooms. It is to know how to use CS for effective learning. Moreover, the didactization of CS tends to sensitize the teachers about the functional role of CS so that they may overcome their own weaknesses. The overall goal of this research is to make code switching a real tool for bilingual education. The specific objectives are: to identify the types of CS used during classroom activities to present the functional role of CS for the teachers as well as the pupils. to develop a tentative model of code-switching, which will help the teachers in transitional classes of bilingual schools to recognize the appropriate moment for making use of code switching in their classrooms. The methodology adopted is a qualitative one. The study is based on recorded videos of teachers of 3rd year of primary school during their classroom activities and interviews with the teachers in order to confirm the functional role of CS in bilingual classes. The theoretical framework adopted is the typology of CS proposed by Poplack (1980) to identify the types of CS used. The study reveals that teachers need to be trained on the types of CS and the different functions they assume and on the consequences of inappropriate use of language alternation.

Keywords: bilingual curriculum, code switching, didactization, national languages

Procedia PDF Downloads 35
4035 A Semantic Analysis of Modal Verbs in Barak Obama’s 2012 Presidential Campaign Speech

Authors: Kais A. Kadhim

Abstract:

This paper is a semantic analysis of the English modals in Obama’s speech. The main objective of this study is to analyze selected modal auxiliaries identified in selected speeches of Obama’s campaign based on Coates’ (1983) semantic clusters. A total of fifteen speeches of Obama’s campaign were selected as the primary data and the modal auxiliaries selected for analysis include will, would, can, could, should, must, ought, shall, may and might. All the modal auxiliaries taken from the speeches of Barack Obama were analyzed based on the framework of Coates’ semantic clusters. Such analytical framework was carried out to examine how modal auxiliaries are used in the context of persuading people in Obama’s campaign speeches. The findings reveal that modals of intention, prediction, futurity and modals of possibility, ability, permission are mostly used in Obama’s campaign speeches.

Keywords: modals, meaning, persuasion, speech

Procedia PDF Downloads 377