Search results for: corpus pragmatics
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 414

Search results for: corpus pragmatics

354 Towards a Large Scale Deep Semantically Analyzed Corpus for Arabic: Annotation and Evaluation

Authors: S. Alansary, M. Nagi

Abstract:

This paper presents an approach of conducting semantic annotation of Arabic corpus using the Universal Networking Language (UNL) framework. UNL is intended to be a promising strategy for providing a large collection of semantically annotated texts with formal, deep semantics rather than shallow. The result would constitute a semantic resource (semantic graphs) that is editable and that integrates various phenomena, including predicate-argument structure, scope, tense, thematic roles and rhetorical relations, into a single semantic formalism for knowledge representation. The paper will also present the Interactive Analysis​ tool for automatic semantic annotation (IAN). In addition, the cornerstone of the proposed methodology which are the disambiguation and transformation rules, will be presented. Semantic annotation using UNL has been applied to a corpus of 20,000 Arabic sentences representing the most frequent structures in the Arabic Wikipedia. The representation, at different linguistic levels was illustrated starting from the morphological level passing through the syntactic level till the semantic representation is reached. The output has been evaluated using the F-measure. It is 90% accurate. This demonstrates how powerful the formal environment is, as it enables intelligent text processing and search.

Keywords: semantic analysis, semantic annotation, Arabic, universal networking language

Procedia PDF Downloads 561
353 Developing an Intonation Labeled Dataset for Hindi

Authors: Esha Banerjee, Atul Kumar Ojha, Girish Nath Jha

Abstract:

This study aims to develop an intonation labeled database for Hindi. Although no single standard for prosody labeling exists in Hindi, researchers in the past have employed perceptual and statistical methods in literature to draw inferences about the behavior of prosody patterns in Hindi. Based on such existing research and largely agreed upon intonational theories in Hindi, this study attempts to develop a manually annotated prosodic corpus of Hindi speech data, which can be used for training speech models for natural-sounding speech in the future. 100 sentences ( 500 words) each for declarative and interrogative types have been labeled using Praat.

Keywords: speech dataset, Hindi, intonation, labeled corpus

Procedia PDF Downloads 163
352 Corpus-Based Description of Core English Nouns of Pakistani English, an EFL Learner Perspective at Secondary Level

Authors: Abrar Hussain Qureshi

Abstract:

Vocabulary has been highlighted as a key indicator in any foreign language learning program, especially English as a foreign language (EFL). It is often considered a potential tool in foreign language curriculum, and its deficiency impedes successful communication in the target language. The knowledge of the lexicon is very significant in getting communicative competence and performance. Nouns constitute a considerable bulk of English vocabulary. Rather, they are the bones of the English language and are the main semantic carrier in spoken and written discourse. As nouns dominate the bulk of the English lexicon, their role becomes all the more potential. The undertaken research is a systematic effort in this regard to work out a list of highly frequent list of Pakistani English nouns for the EFL learners at the secondary level. It will encourage autonomy for the EFL learners as well as will save their time. The corpus used for the research has been developed locally from leading English newspapers of Pakistan. Wordsmith Tools has been used to process the research data and to retrieve word list of frequent Pakistani English nouns. The retrieved list of core Pakistani English nouns is supposed to be useful for English language learners at the secondary level as it covers a wide range of speech events.

Keywords: corpus, EFL, frequency list, nouns

Procedia PDF Downloads 75
351 A Corpus-Based Study of Subtitling Religious Words into Arabic

Authors: Yousef Sahari, Eisa Asiri

Abstract:

Hollywood films are produced in an open and liberal context, and when subtitling for a more conservative and closed society such as an Arabic society, religious words can pose a thorny challenge for subtitlers. Using a corpus of 90 Hollywood films released between 2000 and 2018 and applying insights from Descriptive Translation Studies (Toury, 1995, 2012) and the dichotomy of domestication and foreignization, this paper investigates three main research questions: (1) What are the dominant religious terms and functions in the English subtitles? (2) What are the dominant translation strategies used in the translation of religious words? (3) Do these strategies tend to be SL-oriented or TL-oriented (domesticating or foreignising)? To answer the research questions above, a quantitative and qualitative analysis of the corpus is conducted, in which the researcher adopts a self-designed, parallel, aligned corpus of ninety films and their Arabic subtitles. A quantitative analysis is performed to compare the frequencies and distribution of religious words, their functions, and the translation strategies employed by the subtitlers of ninety films, with the aim of identifying similarities or differences in addition to identifying the impact of functions of religious terms on the use of subtitling strategies. Based on the quantitative analysis, a qualitative analysis is performed to identify any translational patterns in Arabic translations of religious words and the possible reasons for subtitlers’ choices. The results show that the function of religious words has a strong influence on the choice of subtitling strategies. Also, it is found that foreignization strategies are applied in about two-thirds of the total occurrences of religious words.

Keywords: religious terms, subtitling, audiovisual translation, modern standard arabic, subtitling strategies, english-arabic subtitling

Procedia PDF Downloads 126
350 The Development of Explicit Pragmatic Knowledge: An Exploratory Study

Authors: Aisha Siddiqa

Abstract:

The knowledge of pragmatic practices in a particular language is considered key to effective communication. Unlike one’s native language where this knowledge is acquired spontaneously, more conscious attention is required to learn second language pragmatics. Traditional foreign language (FL) classrooms generally focus on the acquisition of vocabulary and lexico-grammatical structures, neglecting pragmatic functions that are essential for effective communication in the multilingual networks of the modern world. In terms of effective communication, of particular importance is knowledge of what is perceived as polite or impolite in a certain language, an aspect of pragmatics which is not perceived as obligatory but is nonetheless indispensable for successful intercultural communication and integration. While learning a second language, the acquisition of politeness assumes more prominence as the politeness norms and practices vary according to language and culture. Therefore, along with focusing on the ‘use’ of politeness strategies, it is crucial to examine the ‘acquisition’ and the ‘acquisitional development’ of politeness strategies by second language learners, particularly, by lower proficiency leaners as the norms of politeness are usually focused in lower levels. Hence, there is an obvious need for a study that not only investigates the acquisition of pragmatics by young FL learners using innovative multiple methods; but also identifies the potential causes of the gaps in their development. The present research employs a cross sectional design to explore the acquisition of politeness by young English as a foreign language learners (EFL) in France; at three levels of secondary school learning. The methodology involves two phases. In the first phase a cartoon oral production task (COPT) is used to elicit samples of requests from young EFL learners in French schools. These data are then supplemented by a) role plays, b) an analysis of textbooks, and c) video recordings of classroom activities. This mixed method approach allows us to explore the repertoire of politeness strategies the learners possess and delve deeper into the opportunities available to learners in classrooms to learn politeness strategies in requests. The paper will provide the results of the analysis of COPT data for 250 learners at three different stages of English as foreign language development. Data analysis is based on categorization of requests developed in CCSARP project. The preliminary analysis of the COPT data shows that there is substantial evidence of pragmalinguistic development across all levels but the developmental process seems to gain momentum in the second half of the secondary school period as compared to the early period at school. However, there is very little evidence of sociopragmatic development. The study aims to document the current classroom practices in France by looking at the development of young EFL learner’s politeness strategies across three levels of secondary schools.

Keywords: acquisition, English, France, interlanguage pragmatics, politeness

Procedia PDF Downloads 394
349 Presence and Severity of Language Deficits in Comprehension, Production and Pragmatics in a Group of ALS Patients: Analysis with Demographic and Neuropsychological Data

Authors: M. Testa, L. Peotta, S. Giusiano, B. Lazzolino, U. Manera, A. Canosa, M. Grassano, F. Palumbo, A. Bombaci, S. Cabras, F. Di Pede, L. Solero, E. Matteoni, C. Moglia, A. Calvo, A. Chio

Abstract:

Amyotrophic Lateral Sclerosis (ALS) is a neurodegenerative disease of adulthood, which primarily affects the central nervous system and is characterized by progressive bilateral degeneration of motor neurons. The degeneration processes in ALS extend far beyond the neurons of the motor system, and affects cognition, behaviour and language. To outline the prevalence of language deficits in an ALS cohort and explore their profile along with demographic and neuropsychological data. A full neuropsychological battery and language assessment was administered to 56 ALS patients. Neuropsychological assessment included tests of executive functioning, verbal fluency, social cognition and memory. Language was assessed using tests for verbal comprehension, production and pragmatics. Patients were cognitively classified following the Revised Consensus Criteria and divided in three groups showing different levels of language deficits: group 1 - no language deficit; group 2 - one language deficit; group 3 - two or more language deficits. Chi-square for independence and non-parametric measures to compare groups were applied. Nearly half of ALS-CN patients (48%) reported one language test under the clinical cut-off, and only 13% of patents classified as ALS-CI showed no language deficits, while the rest 87% of ALS-CI reported two or more language deficits. ALS-BI and ALS-CBI cases all reported two or more language deficits. Deficits in production and in comprehension appeared more frequent in ALS-CI patients (p=0.011, p=0.003 respectively), with a higher percentage of comprehension deficits (83%). Nearly all ALS-CI reported at least one deficit in pragmatic abilities (96%) and all ALS-BI and ALS-CBI patients showed pragmatic deficits. Males showed higher percentage of pragmatic deficits (97%, p=0.007). No significant differences in language deficits have been found between bulbar and spinal onset. Months from onset and level of impairment at testing (ALS-FRS total score) were not significantly different between levels and type of language impairment. Age and education were significantly higher for cases showing no deficits in comprehension and pragmatics and in the group showing no language deficits. Comparing performances at neuropsychological tests among the three levels of language deficits, no significant differences in neuropsychological performances were found between group 1 and 2; compared to group 1, group 3 appeared to decay specifically on executive testing, verbal/visuospatial learning, and social cognition. Compared to group 2, group 3 showed worse performances specifically in tests of working memory and attention. Language deficits have found to be spread in our sample, encompassing verbal comprehension, production and pragmatics. Our study reveals that also cognitive intact patients (ALS-CN) showed at least one language deficit in 48% of cases. Pragmatic domain is the most compromised (84% of the total sample), present in nearly all ALS-CI (96%), likely due to the influence of executive impairment. Lower age and higher education seem to preserve comprehension, pragmatics and presence of language deficits. Finally, executive functions, verbal/visuospatial learning and social cognition differentiate the group with no language deficits from the group with a clinical language impairment (group 3), while attention and working memory differentiate the group with one language deficit from the clinical impaired group.

Keywords: amyotrophic lateral sclerosis, language assessment, neuropsychological assessment, language deficit

Procedia PDF Downloads 119
348 A Comparison of the First Language Vocabulary Used by Indonesian Year 4 Students and the Vocabulary Taught to Them in English Language Textbooks

Authors: Fitria Ningsih

Abstract:

This study concerns on the process of making corpus obtained from Indonesian year 4 students’ free writing compared to the vocabulary taught in English language textbooks. 369 students’ sample writings from 19 public elementary schools in Malang, East Java, Indonesia and 5 selected English textbooks were analyzed through corpus in linguistics method using AdTAT -the Adelaide Text Analysis Tool- program. The findings produced wordlists of the top 100 words most frequently used by students and the top 100 words given in English textbooks. There was a 45% match between the two lists. Furthermore, the classifications of the top 100 most frequent words from the two corpora based on part of speech found that both the Indonesian and English languages employed a similar use of nouns, verbs, adjectives, and prepositions. Moreover, to see the contextualizing the vocabulary of learning materials towards the students’ need, a depth-analysis dealing with the content and the cultural views from the vocabulary taught in the textbooks was discussed through the criteria developed from the checklist. Lastly, further suggestions are addressed to language teachers to understand the students’ background such as recognizing the basic words students acquire before teaching them new vocabulary in order to achieve successful learning of the target language.

Keywords: corpus, frequency, English, Indonesian, linguistics, textbooks, vocabulary, wordlists, writing

Procedia PDF Downloads 159
347 Using Corpora in Semantic Studies of English Adjectives

Authors: Oxana Lukoshus

Abstract:

The methods of corpus linguistics, a well-established field of research, are being increasingly applied in cognitive linguistics. Corpora data are especially useful for different quantitative studies of grammatical and other aspects of language. The main objective of this paper is to demonstrate how present-day corpora can be applied in semantic studies in general and in semantic studies of adjectives in particular. Polysemantic adjectives have been the subject of numerous studies. But most of them have been carried out on dictionaries. Undoubtedly, dictionaries are viewed as one of the basic data sources, but only at the initial steps of a research. The author usually starts with the analysis of the lexicographic data after which s/he comes up with a hypothesis. In the research conducted three polysemantic synonyms true, loyal, faithful have been analyzed in terms of differences and similarities in their semantic structure. A corpus-based approach in the study of the above-mentioned adjectives involves the following. After the analysis of the dictionary data there was the reference to the following corpora to study the distributional patterns of the words under study – the British National Corpus (BNC) and the Corpus of Contemporary American English (COCA). These corpora are continually updated and contain thousands of examples of the words under research which make them a useful and convenient data source. For the purpose of this study there were no special needs regarding genre, mode or time of the texts included in the corpora. Out of the range of possibilities offered by corpus-analysis software (e.g. word lists, statistics of word frequencies, etc.), the most useful tool for the semantic analysis was the extracting a list of co-occurrence for the given search words. Searching by lemmas, e.g. true, true to, and grouping the results by lemmas have proved to be the most efficient corpora feature for the adjectives under the study. Following the search process, the corpora provided a list of co-occurrences, which were then to be analyzed and classified. Not every co-occurrence was relevant for the analysis. For example, the phrases like An enormous sense of responsibility to protect the minds and hearts of the faithful from incursions by the state was perceived to be the basic duty of the church leaders or ‘True,’ said Phoebe, ‘but I'd probably get to be a Union Official immediately were left out as in the first example the faithful is a substantivized adjective and in the second example true is used alone with no other parts of speech. The subsequent analysis of the corpora data gave the grounds for the distribution groups of the adjectives under the study which were then investigated with the help of a semantic experiment. To sum it up, the corpora-based approach has proved to be a powerful, reliable and convenient tool to get the data for the further semantic study.

Keywords: corpora, corpus-based approach, polysemantic adjectives, semantic studies

Procedia PDF Downloads 294
346 The Istrian Istrovenetian-Croatian Bilingual Corpus

Authors: Nada Poropat Jeletic, Gordana Hrzica

Abstract:

Bilingual conversational corpora represent a meaningful and the most comprehensive data source for investigating the genuine contact phenomena in non-monitored bi-lingual speech productions. They can be particularly useful for bilingual research since some features of bilingual interaction can hardly be accessed with more traditional methodologies (e.g., elicitation tasks). The method of language sampling provides the resources for describing language interaction in a bilingual community and/or in bilingual situations (e.g. code-switching, amount of languages used, number of languages used, etc.). To capture these phenomena in genuine communication situations, such sampling should be as close as possible to spontaneous communication. Bilingual spoken corpus design is methodologically demanding. Therefore this paper aims at describing the methodological challenges that apply to the corpus design of the conversational corpus design of the Istrian Istrovenetian-Croatian Bilingual Corpus. Croatian is the first official language of the Croatian-Italian officially bilingual Istria County, while Istrovenetian is a diatopic subvariety of Venetian, a longlasting lingua franca in the Istrian peninsula, the mother tongue of the members of the Italian National Community in Istria and the primary code of informal everyday communication among the Istrian Italophone population. Within the CLARIN infrastructure, TalkBank is being used, as it provides relevant procedures for designing and analyzing bilingual corpora. Furthermore, it allows public availability allows for easy replication of studies and cumulative progress as a research community builds up around the corpus, while the tools developed within the field of corpus linguistics enable easy retrieval and analysis of information. The method of language sampling employed is kept at the level of spontaneous communication, in order to maximise the naturalness of the collected conversational data. All speakers have provided written informed consent in which they agree to be recorded at a random point within the period of one month after signing the consent. Participants are administered a background questionnaire providing information about the socioeconomic status and the exposure and language usage in the participants social networks. Recording data are being transcribed, phonologically adapted within a standard-sized orthographic form, coded and segmented (speech streams are being segmented into communication units based on syntactic criteria) and are being marked following the CHAT transcription system and its associated CLAN suite of programmes within the TalkBank toolkit. The corpus consists of transcribed sound recordings of 36 bilingual speakers, while the target is to publish the whole corpus by the end of 2020, by sampling spontaneous conversations among approximately 100 speakers from all the bilingual areas of Istria for ensuring representativeness (the participants are being recruited across three generations of native bilingual speakers in all the bilingual areas of the peninsula). Conversational corpora are still rare in TalkBank, so the Corpus will contribute to BilingBank as a highly relevant and scientifically reliable resource for an internationally established and active research community. The impact of the research of communities with societal bilingualism will contribute to the growing body of research on bilingualism and multilingualism, especially regarding topics of language dominance, language attrition and loss, interference and code-switching etc.

Keywords: conversational corpora, bilingual corpora, code-switching, language sampling, corpus design methodology

Procedia PDF Downloads 111
345 The Study of Formal and Semantic Errors of Lexis by Persian EFL Learners

Authors: Mohammad J. Rezai, Fereshteh Davarpanah

Abstract:

Producing a text in a language which is not one’s mother tongue can be a demanding task for language learners. Examining lexical errors committed by EFL learners is a challenging area of investigation which can shed light on the process of second language acquisition. Despite the considerable number of investigations into grammatical errors, few studies have tackled formal and semantic errors of lexis committed by EFL learners. The current study aimed at examining Persian learners’ formal and semantic errors of lexis in English. To this end, 60 students at three different proficiency levels were asked to write on 10 different topics in 10 separate sessions. Finally, 600 essays written by Persian EFL learners were collected, acting as the corpus of the study. An error taxonomy comprising formal and semantic errors was selected to analyze the corpus. The formal category covered misselection and misformation errors, while the semantic errors were classified into lexical, collocational and lexicogrammatical categories. Each category was further classified into subcategories depending on the identified errors. The results showed that there were 2583 errors in the corpus of 9600 words, among which, 2030 formal errors and 553 semantic errors were identified. The most frequent errors in the corpus included formal error commitment (78.6%), which were more prevalent at the advanced level (42.4%). The semantic errors (21.4%) were more frequent at the low intermediate level (40.5%). Among formal errors of lexis, the highest number of errors was devoted to misformation errors (98%), while misselection errors constituted 2% of the errors. Additionally, no significant differences were observed among the three semantic error subcategories, namely collocational, lexical choice and lexicogrammatical. The results of the study can shed light on the challenges faced by EFL learners in the second language acquisition process.

Keywords: collocational errors, lexical errors, Persian EFL learners, semantic errors

Procedia PDF Downloads 113
344 Automatic Tagging and Accuracy in Assamese Text Data

Authors: Chayanika Hazarika Bordoloi

Abstract:

This paper is an attempt to work on a highly inflectional language called Assamese. This is also one of the national languages of India and very little has been achieved in terms of computational research. Building a language processing tool for a natural language is not very smooth as the standard and language representation change at various levels. This paper presents inflectional suffixes of Assamese verbs and how the statistical tools, along with linguistic features, can improve the tagging accuracy. Conditional random fields (CRF tool) was used to automatically tag and train the text data; however, accuracy was improved after linguistic featured were fed into the training data. Assamese is a highly inflectional language; hence, it is challenging to standardizing its morphology. Inflectional suffixes are used as a feature of the text data. In order to analyze the inflections of Assamese word forms, a list of suffixes is prepared. This list comprises suffixes, comprising of all possible suffixes that various categories can take is prepared. Assamese words can be classified into inflected classes (noun, pronoun, adjective and verb) and un-inflected classes (adverb and particle). The corpus used for this morphological analysis has huge tokens. The corpus is a mixed corpus and it has given satisfactory accuracy. The accuracy rate of the tagger has gradually improved with the modified training data.

Keywords: CRF, morphology, tagging, tagset

Procedia PDF Downloads 169
343 The Advancements of Transformer Models in Part-of-Speech Tagging System for Low-Resource Tigrinya Language

Authors: Shamm Kidane, Ibrahim Abdella, Fitsum Gaim, Simon Mulugeta, Sirak Asmerom, Natnael Ambasager, Yoel Ghebrihiwot

Abstract:

The call for natural language processing (NLP) systems for low-resource languages has become more apparent than ever in the past few years, with the arduous challenges still present in preparing such systems. This paper presents an improved dataset version of the Nagaoka Tigrinya Corpus for Parts-of-Speech (POS) classification system in the Tigrinya language. The size of the initial Nagaoka dataset was incremented, totaling the new tagged corpus to 118K tokens, which comprised the 12 basic POS annotations used previously. The additional content was also annotated manually in a stringent manner, followed similar rules to the former dataset and was formatted in CONLL format. The system made use of the novel approach in NLP tasks and use of the monolingually pre-trained TiELECTRA, TiBERT and TiRoBERTa transformer models. The highest achieved score is an impressive weighted F1-score of 94.2%, which surpassed the previous systems by a significant measure. The system will prove useful in the progress of NLP-related tasks for Tigrinya and similarly related low-resource languages with room for cross-referencing higher-resource languages.

Keywords: Tigrinya POS corpus, TiBERT, TiRoBERTa, conditional random fields

Procedia PDF Downloads 61
342 Metaphors Investigation between President Xi Jinping of China and Trump of Us on the Corpus-Based Approach

Authors: Jie Zheng, Ruifeng Luo

Abstract:

The United States is the world’s most developed economy with the strongest military power. China is the fastest growing country with growing comprehensive strength and its economic strength is second only to the US. However, the conflict between them is getting serious in recent years. President’s address is the representative of a nation’s ideology. The paper has built up a small sized corpus of President Xi Jinping and Trump’s speech in Davos to investigate their respective use and types of metaphors and calculate the respective percentage of each type of metaphor. The result shows President Xi Jinping employs more metaphors than Trump. The metaphors of Xi includes “building” metaphor, “plant” metaphor, “journey” metaphor, “ship” metaphor, “traffic” metaphor, “nation is a person” metaphor, “show” metaphor, etc while Trump’s comprises “war” metaphor, “building” metaphor, “journey” metaphor, “traffic” metaphor, “tax” metaphor, “book” metaphor, etc. After investigating metaphor use differences, the paper makes an analysis of the underlying ideology between the two nations. China is willing to strengthen ties with all the countries all over the world and has built a platform of development for them and itself to go to the destination of social well being while the US pays much concern to itself, emphasizing its first leading position and is also willing to help its alliances to development. The paper’s comparison of the ideology difference between the two countries will help them get a better understanding and reduce the conflict to some extent.

Keywords: metaphor; corpus; ideology; conflict

Procedia PDF Downloads 125
341 Translating the Gendered Discourse: A Corpus-Based Study of the Chinese Science Fiction The Three Body Problem

Authors: Yi Gu

Abstract:

The Three-Body Problem by Cixin Liu has been a bestseller Chinese Sci-Fi novel for years since 2008. The book was translated into English by Ken Liu in 2014 and won the prestigious 2015 science fiction and fantasy writing Hugo Award, drawing greater attention from wider international communities. The story exposes the horrors of the Chinese Cultural Revolution in the 1960s, in an intriguing narrative for readers at home and abroad. However, without the access to the source text, western readers may not be aware that the original Chinese version of the book is rich in gender-bias. Some Chinese scholars have applied feminist translation theories to their analysis on this book before, based on isolated selected, cherry-picking examples. Thus this paper aims to obtain a more thorough picture of how translators can cope with gender discrimination and reshape the gendered discourse from the source text, by systematically investigating the lexical and syntactic patterns in the translation of Liu’s entire book of 400 pages. The source text and the translation were downloaded into digital files, automatically aligned at paragraph level and then manually post-edited. They were then compiled into a parallel corpus of 114,629 English words and 204,145 Chinese characters using Sketch Engine. Gender-discrimination markers such as the overuse of ‘girl’ to describe an adult woman were searched in the source text, and the alignment made it possible to identify the strategies adopted by the translator to mitigate gender discrimination. The results provide a framework for translators to address gender bias. The study also shows how corpus methods can be used to further research in feminist translation and critical discourse analysis.

Keywords: corpus, discourse analysis, feminist translation, science fiction translation

Procedia PDF Downloads 232
340 Prostitution in Colonial Bengal: Autobiographical Articulations and Fictional Representations

Authors: Aparna Bandyopadhyay

Abstract:

The proposed paper will examine how prostitution produced a vast corpus of literature in colonial Bengal. This corpus included autobiographical accounts by prostitutes themselves. While the authenticity of some of these has, at times, been doubted by contemporary observers, the sheer magnitude of such narrative prose demands critical attention. Many of these autobiographical narratives focused on the prostitute’s early life within respectable society and then proceeded to delineate the transgressions and the inescapable chain of circumstances that eventually rendered her a prostitute. Significantly, these serve to corroborate the findings of official investigations regarding the circumstances that led upper-caste Hindu women in Bengal to embrace prostitution in this period. The literary corpus that dwelt on prostitution also included a vast volume of fiction penned by celebrated writers. These foregrounded a prostitute as the central protagonist, telling the life-stories of prostitutes and the circumstances that made them what they were. Novels and short stories often represented the prostitute as an affective being – an individual capable of deep emotions despite her profession. She was seldom a person who had voluntarily embraced prostitution. She was always a figure of helplessness and suffering, a woman whose desire to love and be loved transcended the carnality of her livelihood. She was an outcast, but she experienced the entire repertoire of emotions experienced by her respectable counterparts. The proposed paper will examine the trends and characteristics of the available repertoire of prostitute-oriented literature in late colonial Bengal. It will begin by focusing on the existing perspectives on the origins of prostitution in late colonial Bengal. It will proceed to discuss the literary corpus supposedly penned by prostitutes themselves and then focus on the manner in which some of the stalwarts of high literature represented the prostitute in their literary creations.

Keywords: emotions, literature, prostitution, transgression

Procedia PDF Downloads 87
339 ChatGPT as a “Foreign Language Teacher”: Attitudes of Tunisian English Language Learners

Authors: Leila Najeh Bel'Kiry

Abstract:

Artificial intelligence (AI) brought about many language robots, with ChatGPT being the most sophisticated thanks to its human-like linguistic capabilities. This aspect raises the idea of using ChatGPT in learning foreign languages. Starting from the premise that positions ChatGPT as a mediator between the language and the leaner, functioning as a “ghost teacher" offering a peaceful and secure learning space, this study aims to explore the attitudes of Tunisian students of English towards ChatGPT as a “Foreign Language Teacher” . Forty-five students, in their third year of fundamental English at Tunisian universities and high institutes, completed a Likert scale questionnaire consisting of thirty-two items and covering various aspects of language (phonology, morphology, syntax, semantics, and pragmatics). A scale ranging from 'Strongly Disagree,' 'Disagree,' 'Undecided,' 'Agree,' to 'Strongly Agree.' is used to assess the attitudes of the participants towards the integration of ChaGPTin learning a foreign language. Results indicate generally positive attitudes towards the reliance on ChatGPT in learning foreign languages, particularly some compounds of language like syntax, phonology, and morphology. However, learners show insecurity towards ChatGPT when it comes to pragmatics and semantics, where the artificial model may fail when dealing with deeper contextual and nuanced language levels.

Keywords: artificial language model, attitudes, foreign language learning, ChatGPT, linguistic capabilities, Tunisian English language learners

Procedia PDF Downloads 23
338 A Corpus-Based Analysis of Japanese Learners' English Modal Auxiliary Verb Usage in Writing

Authors: S. Nakayama

Abstract:

For non-native English speakers, using English modal auxiliary verbs appropriately can be among the most challenging tasks. This research sought to identify differences in modal verb usage between Japanese non-native English speakers (JNNS) and native speakers (NS) from two different perspectives: frequency of use and distribution of verb phrase structures (VPS) where modal verbs occur. This study can contribute to the identification of JNNSs' interlanguage with regard to modal verbs; the main aim is to make a suggestion for the improvement of teaching materials as well as to help language teachers to be able to teach modal verbs in a way that is helpful for learners. To address the primary question in this study, usage of nine central modals (‘can’, ‘could’, ‘may’, ‘might’, ‘shall’, ‘should’, ‘will’, ‘would’, and ‘must’) by JNNS was compared with that by NSs in the International Corpus Network of Asian Learners of English (ICNALE). This corpus is one of the largest freely-available corpora focusing on Asian English learners’ language use. The ICNALE corpus consists of four modules: ‘Spoken Monologue’, ‘Spoken Dialogue’, ‘Written Essays’, and ‘Edited Essays’. Among these, this research adopted the ‘Written Essays’ module only, which is the set of 200-300 word essays and contains approximately 1.3 million words in total. Frequency analysis revealed gaps as well as similarities in frequency order. Specifically, both JNNSs and NSs used ‘can’ with the most frequency, followed by ‘should’ and ‘will’; however, usage of all the other modals except for ‘shall’ was not identical to each other. A log-likelihood test uncovered JNNSs’ overuse of ‘can’ and ‘must’ as well as their underuse of ‘will’ and ‘would’. VPS analysis revealed that JNNSs used modal verbs in a relatively narrow range of VPSs as compared to NSs. Results showed that JNNSs used most of the modals with bare infinitives or the passive voice only whereas NSs used the modals in a wide range of VPSs including the progressive construction and the perfect aspect, both of which were the structures where JNNSs rarely used the modals. Results of frequency analysis suggest that language teachers or teaching materials should explain other modality items so that learners can avoid relying heavily on certain modals and have a wide range of lexical items to reflect their feelings more accurately. Besides, the underused modals should be more stressed in the classroom because they are members of epistemic modals, which allow us to not only interject our views into propositions but also build a relationship with readers. As for VPSs, teaching materials should present more examples of the modals occurring in a wide range of VPSs to help learners to be able to express their opinions from a variety of viewpoints.

Keywords: corpus linguistics, Japanese learners of English, modal auxiliary verbs, International Corpus Network of Asian Learners of English

Procedia PDF Downloads 108
337 Number Variation of the Personal Pronoun we Used by Chinese English Learners

Authors: Qiong Hu, Ming Yue

Abstract:

Language variation signals the newest usage of language community, which might become the developmental trend of that language. However, language textbooks cannot keep up with these emergent usages. Most Chinese English learners nowadays are still exposed to traditional grammar prescribed in the textbook so that some variational usages cannot be acquired. The personal pronoun we is prescribed as a plural pronoun in the textbook grammar, but its number value is more flexible in actual use. Based on the Chinese Learner English Corpus (CLEC), and with the homemade Friends corpus as reference, the present research explores the number value of the first person pronoun we used by Chinese English learners. With consideration of the subjectivity of we, this paper annotated the number value of all the wes in “we+ PCU (Perception-cognation-utterance) verbs” collocations. Results show that though exposed to traditional textbooks which prescribe the plural reference of we, there still exists some unconventional usage (singular or vague in reference) in the writings of Chinese English learners, which is less frequent than that of the native speeches. Corpus data and results from manual semantic annotation show that this could be due to the impact of formulaic sequence on the learners and the positive transfer from their native language. An improved SLA model of native language, target language and interlanguage is put forward to recognize the existence of variation in second language acquisition, which should be given more attention during teaching.

Keywords: Chinese English learners, number, PCU verbs, Personal pronoun we

Procedia PDF Downloads 330
336 Track and Evaluate Cortical Responses Evoked by Electrical Stimulation

Authors: Kyosuke Kamada, Christoph Kapeller, Michael Jordan, Mostafa Mohammadpour, Christy Li, Christoph Guger

Abstract:

Cortico-cortical evoked potentials (CCEP) refer to responses generated by cortical electrical stimulation at distant brain sites. These responses provide insights into the functional networks associated with language or motor functions, and in the context of epilepsy, they can reveal pathological networks. Locating the origin and spread of seizures within the cortex is crucial for pre-surgical planning. This process can be enhanced by employing cortical stimulation at the seizure onset zone (SOZ), leading to the generation of CCEPs in remote brain regions that may be targeted for disconnection. In the case of a 24-year-old male patient suffering from intractable epilepsy, corpus callosotomy was performed as part of the treatment. DTI-MRI imaging, conducted using a 3T MRI scanner for fiber tracking, along with CCEP, is used as part of an assessment for surgical planning. Stimulation of the SOZ, with alternating monophasic pulses of 300µs duration and 15mA current intensity, resulted in CCEPs on the contralateral frontal cortex, reaching a peak amplitude of 206µV with a latency of 31ms, specifically in the left pars triangularis. The related fiber tracts were identified with a two-tensor unscented Kalman filter (UKF) technique, showing transversal fibers through the corpus callosum. The CCEPs were monitored through the progress of the surgery. Notably, the SOZ-associated CCEPs exhibited a reduction following the resection of the anterior portion of the corpus callosum, reaching the identified connecting fibers. This intervention demonstrated a potential strategy for mitigating the impact of intractable epilepsy through targeted disconnection of identified cortical regions.

Keywords: CCEP, SOZ, Corpus callosotomy, DTI

Procedia PDF Downloads 32
335 Pragmatics of Socio-Linguistic Influence on Neurologist-Patient Interaction in Selected Hospitals in Nigeria

Authors: Ayodele James Akinola

Abstract:

This study examines how social and linguistic variables influenced communication between neurologists and patients in selected university teaching hospitals (UTHs) in southwestern Nigeria. Jacob Mey’s Pragmatic Acts, complemented by Emanuel and Emanuel’s model of doctor-patient relationship, served as the theoretical framework. Data comprising 22 audio-recorded neurologist-patient interactions were collected from two UTHs in the southwestern region of Nigeria. Data revealed that educational attainment of patients has insignificant influence on the interaction where the linguistic prowess of the patient has been impaired for consultative communication. However, the status influenced the degree of attention paid to patients by neurologists and determines the amount of time 'trying to help patients to communicate'. Patients with lower educational status and who could not communicate in English spent more time narrating their ailment to neurologists. Patients with higher educational status and could communicate in English saves consultation time as they express themselves briefly unlike those who were of little or no education in the clinics. Through this, diagnoses and therapeutic processes took eight to 12 minutes. 20 minutes was the longest duration recorded. Neurologist-patient interaction in the observed hospitals is shaped by neurologists’ experience, patients’ social variables and language.

Keywords: medical pragmatics, neurologist-patient interaction, nigeria, socio-linguistic influence

Procedia PDF Downloads 237
334 The Development of Chinese-English Homophonic Word Pairs Databases for English Teaching and Learning

Authors: Yuh-Jen Wu, Chun-Min Lin

Abstract:

Homophonic words are common in Mandarin Chinese which belongs to the tonal language family. Using homophonic cues to study foreign languages is one of the learning techniques of mnemonics that can aid the retention and retrieval of information in the human memory. When learning difficult foreign words, some learners transpose them with words in a language they are familiar with to build an association and strengthen working memory. These phonological clues are beneficial means for novice language learners. In the classroom, if mnemonic skills are used at the appropriate time in the instructional sequence, it may achieve their maximum effectiveness. For Chinese-speaking students, proper use of Chinese-English homophonic word pairs may help them learn difficult vocabulary. In this study, a database program is developed by employing Visual Basic. The database contains two corpora, one with Chinese lexical items and the other with English ones. The Chinese corpus contains 59,053 Chinese words that were collected by a web crawler. The pronunciations of this group of words are compared with words in an English corpus based on WordNet, a lexical database for the English language. Words in both databases with similar pronunciation chunks and batches are detected. A total of approximately 1,000 Chinese lexical items are located in the preliminary comparison. These homophonic word pairs can serve as a valuable tool to assist Chinese-speaking students in learning and memorizing new English vocabulary.

Keywords: Chinese, corpus, English, homophonic words, vocabulary

Procedia PDF Downloads 149
333 A Corpus-Based Analysis of "MeToo" Discourse in South Korea: Coverage Representation in Korean Newspapers

Authors: Sun-Hee Lee, Amanda Kraley

Abstract:

The “MeToo” movement is a social movement against sexual abuse and harassment. Though the hashtag went viral in 2017 following different cultural flashpoints in different countries, the initial response was quiet in South Korea. This radically changed in January 2018, when a high-ranking senior prosecutor, Seo Ji-hyun, gave a televised interview discussing being sexually assaulted by a colleague. Acknowledging public anger, particularly among women, on the long-existing problems of sexual harassment and abuse, the South Korean media have focused on several high-profile cases. Analyzing the media representation of these cases is a window into the evolving South Korean discourse around “MeToo.” This study presents a linguistic analysis of “MeToo” discourse in South Korea by utilizing a corpus-based approach. The term corpus (pl. corpora) is used to refer to electronic language data, that is, any collection of recorded instances of spoken or written language. A “MeToo” corpus has been collected by extracting newspaper articles containing the keyword “MeToo” from BIGKinds, big data analysis, and service and Nexis Uni, an online academic database search engine, to conduct this language analysis. The corpus analysis explores how Korean media represent accusers and the accused, victims and perpetrators. The extracted data includes 5,885 articles from four broadsheet newspapers (Chosun, JoongAng, Hangyore, and Kyunghyang) and 88 articles from two Korea-based English newspapers (Korea Times and Korea Herald) between January 2017 and November 2020. The information includes basic data analysis with respect to keyword frequency and network analysis and adds refined examinations of select corpus samples through naming strategies, semantic relations, and pragmatic properties. Along with the exponential increase of the number of articles containing the keyword “MeToo” from 104 articles in 2017 to 3,546 articles in 2018, the network and keyword analysis highlights ‘US,’ ‘Harvey Weinstein’, and ‘Hollywood,’ as keywords for 2017, with articles in 2018 highlighting ‘Seo Ji-Hyun, ‘politics,’ ‘President Moon,’ ‘An Ui-Jeong, ‘Lee Yoon-taek’ (the names of perpetrators), and ‘(Korean) society.’ This outcome demonstrates the shift of media focus from international affairs to domestic cases. Another crucial finding is that word ‘defamation’ is widely distributed in the “MeToo” corpus. This relates to the South Korean legal system, in which a person who defames another by publicly alleging information detrimental to their reputation—factual or fabricated—is punishable by law (Article 307 of the Criminal Act of Korea). If the defamation occurs on the internet, it is subject to aggravated punishment under the Act on Promotion of Information and Communications Network Utilization and Information Protection. These laws, in particular, have been used against accusers who have publicly come forward in the wake of “MeToo” in South Korea, adding an extra dimension of risk. This corpus analysis of “MeToo” newspaper articles contributes to the analysis of the media representation of the “MeToo” movement and sheds light on the shifting landscape of gender relations in the public sphere in South Korea.

Keywords: corpus linguistics, MeToo, newspapers, South Korea

Procedia PDF Downloads 187
332 Laundering vs. Blanqueo: Translating Financial Crime Metaphors From English to Spanish

Authors: Stephen Gerome

Abstract:

This study examines the translation and use of metaphors in the realm of public safety discourse and intends to shed light on a continuing problem in cross-cultural communication. Metaphors can cause problems not only within languages but also in interlingual communication. The use and misuse of metaphors may hinder the ability to adequately communicate prevention efforts and, in some cases, facilitate and allow financial crime to go undetected. The use of lexicalized metaphors in communications by political entities, journalists, and legal agents in communications regarding law, policy making, compliance monitoring and enforcement as well as in adjudication can have negative consequences if misconstrued. This study provides examples of metaphor usage in published documents in a corpus linguistic study that compares the use of lexicalized metaphors in this discourse to shed light on possible unexpected consequences as well as counterproductive ones.

Keywords: translation, legal, corpus linguistics, financial

Procedia PDF Downloads 80
331 The Mirage of Progress? a Longitudinal Study of Japanese Students’ L2 Oral Grammar

Authors: Robert Long, Hiroaki Watanabe

Abstract:

This longitudinal study examines the grammatical errors of Japanese university students’ dialogues with a native speaker over an academic year. The L2 interactions of 15 Japanese speakers were taken from the JUSFC2018 corpus (April/May 2018) and the JUSFC2019 corpus (January/February). The corpora were based on a self-introduction monologue and a three-question dialogue; however, this study examines the grammatical accuracy found in the dialogues. Research questions focused on a possible significant difference in grammatical accuracy from the first interview session in 2018 and the second one the following year, specifically regarding errors in clauses per 100 words, global errors and local errors, and with specific errors related to parts of speech. The investigation also focused on which forms showed the least improvement or had worsened? Descriptive statistics showed that error-free clauses/errors per 100 words decreased slightly while clauses with errors/100 words increased by one clause. Global errors showed a significant decline, while local errors increased from 97 to 158 errors. For errors related to parts of speech, a t-test confirmed there was a significant difference between the two speech corpora with more error frequency occurring in the 2019 corpus. This data highlights the difficulty in having students self-edit themselves.

Keywords: clause analysis, global vs. local errors, grammatical accuracy, L2 output, longitudinal study

Procedia PDF Downloads 104
330 Tracing the Developmental Repertoire of the Progressive: Evidence from L2 Construction Learning

Authors: Tianqi Wu, Min Wang

Abstract:

Research investigating language acquisition from a constructionist perspective has demonstrated that language is learned as constructions at various linguistic levels, which is related to factors of frequency, semantic prototypicality, and form-meaning contingency. However, previous research on construction learning tended to focus on clause-level constructions such as verb argument constructions but few attempts were made to study morpheme-level constructions such as the progressive construction, which is regarded as a source of acquisition problems for English learners from diverse L1 backgrounds, especially for those whose L1 do not have an equivalent construction such as German and Chinese. To trace the developmental trajectory of Chinese EFL learners’ use of the progressive with respect to verb frequency, verb-progressive contingency, and verbal prototypicality and generality, a learner corpus consisting of three sub-corpora representing three different English proficiency levels was extracted from the Chinese Learners of English Corpora (CLEC). As the reference point, a native speakers’ corpus extracted from the Louvain Corpus of Native English Essays was also established. All the texts were annotated with C7 tagset by part-of-speech tagging software. After annotation all valid progressive hits were retrieved with AntConc 3.4.3 followed by a manual check. Frequency-related data showed that from the lowest to the highest proficiency level, (1) the type token ratio increased steadily from 23.5% to 35.6%, getting closer to 36.4% in the native speakers’ corpus, indicating a wider use of verbs in the progressive; (2) the normalized entropy value rose from 0.776 to 0.876, working towards the target score of 0.886 in native speakers’ corpus, revealing that upper-intermediate learners exhibited a more even distribution and more productive use of verbs in the progressive; (3) activity verbs (i.e., verbs with prototypical progressive meanings like running and singing) dropped from 59% to 34% but non-prototypical verbs such as state verbs (e.g., being and living) and achievement verbs (e.g., dying and finishing) were increasingly used in the progressive. Apart from raw frequency analyses, collostructional analyses were conducted to quantify verb-progressive contingency and to determine what verbs were distinctively associated with the progressive construction. Results were in line with raw frequency findings, which showed that contingency between the progressive and non-prototypical verbs represented by light verbs (e.g., going, doing, making, and coming) increased as English proficiency proceeded. These findings altogether suggested that beginning Chinese EFL learners were less productive in using the progressive construction: they were constrained by a small set of verbs which had concrete and typical progressive meanings (e.g., the activity verbs). But with English proficiency increasing, their use of the progressive began to spread to marginal members such as the light verbs.

Keywords: Construction learning, Corpus-based, Progressives, Prototype

Procedia PDF Downloads 106
329 Dialogism in Research Article Introductions Written by Iranian Non-Native and English Native Speaking Writers

Authors: Moharram Sharifi

Abstract:

Despite a growing interest in the study of the introduction section of Research Articles (RA), there have been few studies to investigate how academic writers engage with other voices and alternative positions in this academic genre. Therefore, the purpose of this study was to show how Native Speaker (NS) and (Non-Native Speaker (NNS) writers take positions and stances in research article introductions. For this purpose, Engagement resources based on the appraisal framework were investigated in sixty articles written by English NS and Iranian NNS published in applied linguistics journals. It was found that the mean occurrences of heteroglossic items in both corpora were larger than those of monoglossic items, but comparing the means of monoglossic engagements between the two corpora, it was revealed that NS writers’ corpus had larger mean occurrences of monoglossic engagements than NNS writers’ corpus implying the native’s stronger authorial stance in the texts. The results also revealed that there was no significant difference in the use of contractive and expansive engagements by NS writers (t (29) = -0.995, p>0.05), indicating a balanced use between the two options. However, the higher mean occurrences of expansive options compared with contractive options in the NNS corpus may suggest that NN writers open up more dialogic room for alternative positions in the RA introductions. The findings of this study may help writers to better perceive the creation of a strong authorial position using appropriate engagement resources in RA introductions.

Keywords: engagement, heteroglossic, monoglossic, introduction

Procedia PDF Downloads 18
328 Cataphora in English and Chinese Conversation: A Corpus-based Contrastive Study

Authors: Jun Gao

Abstract:

This paper combines the corpus-based and contrastive approaches, seeking to provide a systematic account of cataphora in English and Chinese natural conversations. Based on spoken corpus data, the first part of the paper examines a range of characteristics of cataphora in the two languages, including frequency of occurrence, patterns, and syntactic features. On the basis of this exploration, cataphora in the two languages are contrasted in a structured way. The analysis shows that English and Chinese share a similar distribution of cataphora in natural conversations in terms of frequency of occurrence, with repeat identification cataphora higher than first mention cataphora and intra-sentential cataphora much higher than inter-sentential cataphora. In terms of patterns, three types are identified in English, i.e. P+N, Ø+N, and it+Clause, while in Chinese, two types are identified, i.e., P+N and Ø+N. English and Chinese are similar in terms of syntactic features, i.e., cataphor and postcedent in the intra-sentential cataphora mainly occur in the initial subject position of the same clause, with postcedent immediately followed or delayed, and cataphor and postcedent are mostly in adjacent sentences in inter-sentential cataphora. In the second part of the paper, the motivations of cataphora are investigated. It is found that cataphora is primarily motivated by the speaker and hearer’s different knowledge states with regard to the referent. Other factors are also involved, such as interference, word search, and the tension between the principles of Economy and Clarity.

Keywords: cataphora, contrastive study, motivation, pattern, syntactic features

Procedia PDF Downloads 57
327 Lennox-gastaut Syndrome Associated with Dysgenesis of Corpus Callosum

Authors: A. Bruce Janati, Muhammad Umair Khan, Naif Alghassab, Ibrahim Alzeir, Assem Mahmoud, M. Sammour

Abstract:

Rationale: Lennox-Gastaut syndrome(LGS) is an electro-clinical syndrome composed of the triad of mental retardation, multiple seizure types, and the characteristic generalized slow spike-wave complexes in the EEG. In this article, we report on two patients with LGS whose brain MRI showed dysgenesis of corpus callosum(CC). We review the literature and stress the role of CC in the genesis of secondary bilateral synchrony(SBS). Method: This was a clinical study conducted at King Khalid Hospital. Results: The EEG was consistent with LGS in patient 1 and unilateral slow spike-wave complexes in patient 2. The MRI showed hypoplasia of the splenium of CC in patient 1, and global hypoplasia of CC combined with Joubert syndrome in patient 2. Conclusion: Based on the data, we proffer the following hypotheses: 1-Hypoplasia of CC interferes with functional integrity of this structure. 2-The genu of CC plays a pivotal role in the genesis of secondary bilateral synchrony. 3-Electrodecremental seizures in LGS emanate from pacemakers generated in the brain stem, in particular the mesencephalon projecting abnormal signals to the cortex via thalamic nuclei. 4-Unilateral slow spike-wave complexes in the context of mental retardation and multiple seizure types may represent a variant of LGS, justifying neuroimaging studies.

Keywords: EEG, Lennox-Gastaut syndrome, corpus callosum , MRI

Procedia PDF Downloads 405
326 Translation Quality Assessment in Fansubbed English-Chinese Swearwords: A Corpus-Based Study of the Big Bang Theory

Authors: Qihang Jiang

Abstract:

Fansubbing, the combination of fan and subtitling, is one of the main branches of Audiovisual Translation (AVT) having kindled more and more interest of researchers into the AVT field in recent decades. In particular, the quality of so-called non-professional translation seems questionable due to the non-transparent qualification of subtitlers in a huge community network. This paper attempts to figure out how YYeTs aka 'ZiMuZu', the largest fansubbing group in China, translates swearwords from English to Chinese for its fans of the prevalent American sitcom The Big Bang Theory, taking cultural, social and political elements into account in the context of China. By building a bilingual corpus containing both the source and target texts, this paper found that most of the original swearwords were translated in a toned-down manner, probably due to Chinese audiences’ cultural and social network features as well as the strict censorship under the Chinese government. Additionally, House (2015)’s newly revised model of Translation Quality Assessment (TQA) was applied and examined. Results revealed that most of the subtitled swearwords achieved their pragmatic functions and exerted a communicative effect for audiences. In conclusion, this paper enriches the empirical research concerning House’s new TQA model, gives a full picture of the subtitling of swearwords in AVT field and provides a practical guide for the practitioners in their career of subtitling.

Keywords: corpus-based approach, fansubbing, pragmatic functions, swearwords, translation quality assessment

Procedia PDF Downloads 125
325 The Diminished Online Persona: A Semantic Change of Chinese Classifier Mei on Weibo

Authors: Hui Shi

Abstract:

This study investigates a newly emerged usage of Chinese numeral classifier mei (枚) in the cyberspace. In modern Chinese grammar, mei as a classifier should occupy the pre-nominal position, and its valid accompanying nouns are restricted to small, flat, fragile inanimate objects rather than humans. To examine the semantic change of mei, two types of data from Weibo.com were collected. First, 500 mei-included Weibo posts constructed a corpus for analyzing this classifier's word order distribution (post-nominal or pre-nominal) as well as its accompanying nouns' semantics (inanimate or human). Second, considering that mei accompanies a remarkable number of human nouns in the first corpus, the second corpus is composed of mei-involved Weibo IDs from users located in first and third-tier cities (n=8 respectively). The findings show that in the cyber community, mei frequently classifies human-related neologisms at the archaic post-normal position. Besides, the 23 to 29-year-old females as well as Weibo users from third-tier cities are the major populations who adopt mei in their user IDs for self-description and identity expression. This paper argues that the creative usage of mei gains popularity in the Chinese internet due to a humor effect. The marked word order switch and semantic misapplication combined to trigger incongruity and jocularity. This study has significance for research on Chinese cyber neologism. It may also lay a foundation for further studies on Chinese classifier change and Chinese internet communication.

Keywords: Chinese classifier, humor, neologism, semantic change

Procedia PDF Downloads 228