Search results for: corpus linguistic
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1136

Search results for: corpus linguistic

1106 Grammatically Coded Corpus of Spoken Lithuanian: Methodology and Development

Authors: L. Kamandulytė-Merfeldienė

Abstract:

The paper deals with the main issues of methodology of the Corpus of Spoken Lithuanian which was started to be developed in 2006. At present, the corpus consists of 300,000 grammatically annotated word forms. The creation of the corpus consists of three main stages: collecting the data, the transcription of the recorded data, and the grammatical annotation. Collecting the data was based on the principles of balance and naturality. The recorded speech was transcribed according to the CHAT requirements of CHILDES. The transcripts were double-checked and annotated grammatically using CHILDES. The development of the Corpus of Spoken Lithuanian has led to the constant increase in studies on spontaneous communication, and various papers have dealt with a distribution of parts of speech, use of different grammatical forms, variation of inflectional paradigms, distribution of fillers, syntactic functions of adjectives, the mean length of utterances.

Keywords: CHILDES, corpus of spoken Lithuanian, grammatical annotation, grammatical disambiguation, lexicon, Lithuanian

Procedia PDF Downloads 210
1105 Linguistic Trend in the Qur'anic Tafsir of 'Al Tahreer Wa Al Tanveer' by Sheikh Tahir Bin A'shur

Authors: Numan Hasan

Abstract:

We have tried to highlight the linguistic trend in the Qur’anic Tafsir of ‘Al Tahreer wa Al Tanveer’ by Sheikh Tahir Bin A’shur, the brightest linguistic commentator in the modern era. We have started studying the life of Bin A’shur and his contributions to the field of Qur’anic knowledge. We have also studied to focus on the linguistic approach of ‘Al Tahreer wa Al Tanveer’ and emphasized the importance of linguistic interpretations. We have tried to have a clear understanding about the features and characteristics of his Tafsir. We have also reflected on the methodological approach and linguistic reference of his interpretation. In the conclusion we presented the main results of a research.

Keywords: Sheikh Tahir Bin A’shur, tafsir, linguistics, interpretation, Islamic studies

Procedia PDF Downloads 344
1104 Cognitive Translation and Conceptual Wine Tasting Metaphors: A Corpus-Based Research

Authors: Christine Demaecker

Abstract:

Many researchers have underlined the importance of metaphors in specialised language. Their use of specific domains helps us understand the conceptualisations used to communicate new ideas or difficult topics. Within the wide area of specialised discourse, wine tasting is a very specific example because it is almost exclusively metaphoric. Wine tasting metaphors express various conceptualisations. They are not linguistic but rather conceptual, as defined by Lakoff & Johnson. They correspond to the linguistic expression of a mental projection from a well-known or more concrete source domain onto the target domain, which is the taste of wine. But unlike most specialised terminologies, the vocabulary is never clearly defined. When metaphorical terms are listed in dictionaries, their definitions remain vague, unclear, and circular. They cannot be replaced by literal linguistic expressions. This makes it impossible to transfer them into another language with the traditional linguistic translation methods. Qualitative research investigates whether wine tasting metaphors could rather be translated with the cognitive translation process, as well described by Nili Mandelblit (1995). The research is based on a corpus compiled from two high-profile wine guides; the Parker’s Wine Buyer’s Guide and its translation into French and the Guide Hachette des Vins and its translation into English. In this small corpus with a total of 68,826 words, 170 metaphoric expressions have been identified in the original English text and 180 in the original French text. They have been selected with the MIPVU Metaphor Identification Procedure developed at the Vrije Universiteit Amsterdam. The selection demonstrates that both languages use the same set of conceptualisations, which are often combined in wine tasting notes, creating conceptual integrations or blends. The comparison of expressions in the source and target texts also demonstrates the use of the cognitive translation approach. In accordance with the principle of relevance, the translation always uses target language conceptualisations, but compared to the original, the highlighting of the projection is often different. Also, when original metaphors are complex with a combination of conceptualisations, at least one element of the original metaphor underlies the target expression. This approach perfectly integrates into Lederer’s interpretative model of translation (2006). In this triangular model, the transfer of conceptualisation could be included at the level of ‘deverbalisation/reverbalisation’, the crucial stage of the model, where the extraction of meaning combines with the encyclopedic background to generate the target text.

Keywords: cognitive translation, conceptual integration, conceptual metaphor, interpretative model of translation, wine tasting metaphor

Procedia PDF Downloads 104
1103 The Use of Corpora in Improving Modal Verb Treatment in English as Foreign Language Textbooks

Authors: Lexi Li, Vanessa H. K. Pang

Abstract:

This study aims to demonstrate how native and learner corpora can be used to enhance modal verb treatment in EFL textbooks in mainland China. It contributes to a corpus-informed and learner-centered design of grammar presentation in EFL textbooks that enhances the authenticity and appropriateness of textbook language for target learners. The linguistic focus is will, would, can, could, may, might, shall, should, must. The native corpus is the spoken component of BNC2014 (hereafter BNCS2014). The spoken part is chosen because pedagogical purpose of the textbooks is communication-oriented. Using the standard query option of CQPweb, 5% of each of the nine modals was sampled from BNCS2014. The learner corpus is the POS-tagged Ten-thousand English Compositions of Chinese Learners (TECCL). All the essays under the 'secondary school' section were selected. A series of five secondary coursebooks comprise the textbook corpus. All the data in both the learner and the textbook corpora are retrieved through the concordance functions of WordSmith Tools (version, 5.0). Data analysis was divided into two parts. The first part compared the patterns of modal verbs in the textbook corpus and BNC2014 with respect to distributional features, semantic functions, and co-occurring constructions to examine whether the textbooks reflect the authentic use of English. Secondly, the learner corpus was analyzed in terms of the use (distributional features, semantic functions, and co-occurring constructions) and the misuse (syntactic errors, e.g., she can sings*.) of the nine modal verbs to uncover potential difficulties that confront learners. The analysis of distribution indicates several discrepancies between the textbook corpus and BNCS2014. The first four most frequent modal verbs in BNCS2014 are can, would, will, could, while can, will, should, could are the top four in the textbooks. Most strikingly, there is an unusually high proportion of can (41.1%) in the textbooks. The results on different meanings shows that will, would and must are the most problematic. For example, for will, the textbooks contain 20% more occurrences of 'volition' and 20% less of 'prediction' than those in BNCS2014. Regarding co-occurring structures, the textbooks over-represented the structure 'modal +do' across the nine modal verbs. Another major finding is that the structure of 'modal +have done' that frequently co-occur with could, would, should, and must is underused in textbooks. Besides, these four modal verbs are the most difficult for learners, as the error analysis shows. This study demonstrates how the synergy of native and learner corpora can be harnessed to improve EFL textbook presentation of modal verbs in a way that textbooks can provide not only authentic language used in natural discourse but also appropriate design tailed for the needs of target learners.

Keywords: English as Foreign Language, EFL textbooks, learner corpus, modal verbs, native corpus

Procedia PDF Downloads 118
1102 Towards a Large Scale Deep Semantically Analyzed Corpus for Arabic: Annotation and Evaluation

Authors: S. Alansary, M. Nagi

Abstract:

This paper presents an approach of conducting semantic annotation of Arabic corpus using the Universal Networking Language (UNL) framework. UNL is intended to be a promising strategy for providing a large collection of semantically annotated texts with formal, deep semantics rather than shallow. The result would constitute a semantic resource (semantic graphs) that is editable and that integrates various phenomena, including predicate-argument structure, scope, tense, thematic roles and rhetorical relations, into a single semantic formalism for knowledge representation. The paper will also present the Interactive Analysis​ tool for automatic semantic annotation (IAN). In addition, the cornerstone of the proposed methodology which are the disambiguation and transformation rules, will be presented. Semantic annotation using UNL has been applied to a corpus of 20,000 Arabic sentences representing the most frequent structures in the Arabic Wikipedia. The representation, at different linguistic levels was illustrated starting from the morphological level passing through the syntactic level till the semantic representation is reached. The output has been evaluated using the F-measure. It is 90% accurate. This demonstrates how powerful the formal environment is, as it enables intelligent text processing and search.

Keywords: semantic analysis, semantic annotation, Arabic, universal networking language

Procedia PDF Downloads 561
1101 Laundering vs. Blanqueo: Translating Financial Crime Metaphors From English to Spanish

Authors: Stephen Gerome

Abstract:

This study examines the translation and use of metaphors in the realm of public safety discourse and intends to shed light on a continuing problem in cross-cultural communication. Metaphors can cause problems not only within languages but also in interlingual communication. The use and misuse of metaphors may hinder the ability to adequately communicate prevention efforts and, in some cases, facilitate and allow financial crime to go undetected. The use of lexicalized metaphors in communications by political entities, journalists, and legal agents in communications regarding law, policy making, compliance monitoring and enforcement as well as in adjudication can have negative consequences if misconstrued. This study provides examples of metaphor usage in published documents in a corpus linguistic study that compares the use of lexicalized metaphors in this discourse to shed light on possible unexpected consequences as well as counterproductive ones.

Keywords: translation, legal, corpus linguistics, financial

Procedia PDF Downloads 80
1100 Variables, Annotation, and Metadata Schemas for Early Modern Greek

Authors: Eleni Karantzola, Athanasios Karasimos, Vasiliki Makri, Ioanna Skouvara

Abstract:

Historical linguistics unveils the historical depth of languages and traces variation and change by analyzing linguistic variables over time. This field of linguistics usually deals with a closed data set that can only be expanded by the (re)discovery of previously unknown manuscripts or editions. In some cases, it is possible to use (almost) the entire closed corpus of a language for research, as is the case with the Thesaurus Linguae Graecae digital library for Ancient Greek, which contains most of the extant ancient Greek literature. However, concerning ‘dynamic’ periods when the production and circulation of texts in printed as well as manuscript form have not been fully mapped, representative samples and corpora of texts are needed. Such material and tools are utterly lacking for Early Modern Greek (16th-18th c.). In this study, the principles of the creation of EMoGReC, a pilot representative corpus of Early Modern Greek (16th-18th c.) are presented. Its design follows the fundamental principles of historical corpora. The selection of texts aims to create a representative and balanced corpus that gives insight into diachronic, diatopic and diaphasic variation. The pilot sample includes data derived from fully machine-readable vernacular texts, which belong to 4-5 different textual genres and come from different geographical areas. We develop a hierarchical linguistic annotation scheme, further customized to fit the characteristics of our text corpus. Regarding variables and their variants, we use as a point of departure the bundle of twenty-four features (or categories of features) for prose demotic texts of the 16th c. Tags are introduced bearing the variants [+old/archaic] or [+novel/vernacular]. On the other hand, further phenomena that are underway (cf. The Cambridge Grammar of Medieval and Early Modern Greek) are selected for tagging. The annotated texts are enriched with metalinguistic and sociolinguistic metadata to provide a testbed for the development of the first comprehensive set of tools for the Greek language of that period. Based on a relational management system with interconnection of data, annotations, and their metadata, the EMoGReC database aspires to join a state-of-the-art technological ecosystem for the research of observed language variation and change using advanced computational approaches.

Keywords: early modern Greek, variation and change, representative corpus, diachronic variables.

Procedia PDF Downloads 32
1099 A Corpus-Linguistic Analysis of Online Iranian News Coverage on Syrian Revolution

Authors: Amaal Ali Al-Gamde

Abstract:

The Syrian revolution is a major issue in the Middle East, which draws in world powers and receives a great focus in international mass media since 2011. The heavy global reliance on cyber news and digital sources plays a key role in conveying a sense of bias to a wide range of online readers. Thus, based on the assumption that media discourse possesses ideological implications, this study investigates the representation of Syrian revolution in online media. The paper explores the discursive constructions of anti and pro-government powers in Syrian revolution in 1000,000-word corpus of Fars online reports (an Iranian news agency), issued between 2013 and 2015. Taking a corpus assisted discourse analysis approach, the analysis investigates three types of lexicosemantic relations, the semantic macrostructures within which the two social actors are framed, the lexical collocations characterizing the news discourse and the discourse prosodies they tell about the two sides of the conflict. The study utilizes computer-based approaches, sketch engine and AntConc software to minimize the bias of the subjective analysis. The analysis moves from the insights of lexical frequencies and keyness scores to examine themes and the collocational patterns. The findings reveal the Fars agency’s ideological mode of representations in reporting events of Syrian revolution in two ways. The first is by stereotyping the opposition groups under the umbrella of terrorism, using words such as (law breakers, foreign-backed groups, militant groups, terrorists) to legitimize the atrocities of security forces against protesters and enhance horror among civilians. The second is through emphasizing the power of the government and depicting it as the defender of the Arab land by foregrounding the discourse of international conspiracy against Syria. The paper concludes discussing the potential importance of triangulating corpus linguistic tools with critical discourse analysis to elucidate more about discourses and reality.

Keywords: discourse prosody, ideology, keyness, semantic macrostructure

Procedia PDF Downloads 109
1098 Corporate Cautionary Statement: A Genre of Professional Communication

Authors: Chie Urawa

Abstract:

Cautionary statements or disclaimers in corporate annual reports need to be carefully designed because clear cautionary statements may protect a company in the case of legal disputes and may undermine positive impressions. This study compares the language of cautionary statements using two corpora, Sony’s cautionary statement corpus (S-corpus) and Panasonic’s cautionary statement corpus (P-corpus), illustrating the differences and similarities in relation to the use of meaningful cautionary statements and critically analyzing why practitioners use the way. The findings describe the distinct differences between the two companies in the presentation of the risk factors and the way how they make the statements. The word ability is used more for legal protection in S-corpus whereas the word possibility is used more to convey a better impression in P-corpus. The main similarities are identified in the use of lexical words and pronouns, and almost the same wordings for eight years. The findings show how they make the statements unique to the company in the presentation of risk factors, and the characteristics of specific genre of professional communication. Important implications of this study are that more comprehensive approach can be applied in other contexts, and be used by companies to reflect upon their cautionary statements.

Keywords: cautionary statements, corporate annual reports, corpus, risk factors

Procedia PDF Downloads 141
1097 Linguistic Cyberbullying, a Legislative Approach

Authors: Simona Maria Ignat

Abstract:

Bullying online has been an increasing studied topic during the last years. Different approaches, psychological, linguistic, or computational, have been applied. To our best knowledge, a definition and a set of characteristics of phenomenon agreed internationally as a common framework are still waiting for answers. Thus, the objectives of this paper are the identification of bullying utterances on Twitter and their algorithms. This research paper is focused on the identification of words or groups of words, categorized as “utterances”, with bullying effect, from Twitter platform, extracted on a set of legislative criteria. This set is the result of analysis followed by synthesis of law documents on bullying(online) from United States of America, European Union, and Ireland. The outcome is a linguistic corpus with approximatively 10,000 entries. The methods applied to the first objective have been the following. The discourse analysis has been applied in identification of keywords with bullying effect in texts from Google search engine, Images link. Transcription and anonymization have been applied on texts grouped in CL1 (Corpus linguistics 1). The keywords search method and the legislative criteria have been used for identifying bullying utterances from Twitter. The texts with at least 30 representations on Twitter have been grouped. They form the second corpus linguistics, Bullying utterances from Twitter (CL2). The entries have been identified by using the legislative criteria on the the BoW method principle. The BoW is a method of extracting words or group of words with same meaning in any context. The methods applied for reaching the second objective is the conversion of parts of speech to alphabetical and numerical symbols and writing the bullying utterances as algorithms. The converted form of parts of speech has been chosen on the criterion of relevance within bullying message. The inductive reasoning approach has been applied in sampling and identifying the algorithms. The results are groups with interchangeable elements. The outcomes convey two aspects of bullying: the form and the content or meaning. The form conveys the intentional intimidation against somebody, expressed at the level of texts by grammatical and lexical marks. This outcome has applicability in the forensic linguistics for establishing the intentionality of an action. Another outcome of form is a complex of graphemic variations essential in detecting harmful texts online. This research enriches the lexicon already known on the topic. The second aspect, the content, revealed the topics like threat, harassment, assault, or suicide. They are subcategories of a broader harmful content which is a constant concern for task forces and legislators at national and international levels. These topic – outcomes of the dataset are a valuable source of detection. The analysis of content revealed algorithms and lexicons which could be applied to other harmful contents. A third outcome of content are the conveyances of Stylistics, which is a rich source of discourse analysis of social media platforms. In conclusion, this corpus linguistics is structured on legislative criteria and could be used in various fields.

Keywords: corpus linguistics, cyberbullying, legislation, natural language processing, twitter

Procedia PDF Downloads 56
1096 A Corpus-Based Study on the Styles of Three Translators

Authors: Wang Yunhong

Abstract:

The present paper is preoccupied with the different styles of three translators in their translating a Chinese classical novel Shuihu Zhuan. Based on a parallel corpus, it adopts a target-oriented approach to look into whether and what stylistic differences and shifts the three translations have revealed. The findings show that the three translators demonstrate different styles concerning their word choices and sentence preferences, which implies that identification of recurrent textual patterns may be a basic step for investigating the style of a translator.

Keywords: corpus, lexical choices, sentence characteristics, style

Procedia PDF Downloads 242
1095 Synaesthetic Metaphors in Persian: a Cognitive Corpus Based and Comparative Perspective

Authors: A. Afrashi

Abstract:

Introduction: Synaesthesia is a term denoting the perception or description of the perception of one sense modality in terms of another. In literature, synaesthesia refers to a technique adopted by writers to present ideas, characters or places in such a manner that they appeal to more than one sense like hearing, seeing, smell etc. at a given time. In everyday language too we find many examples of synaesthesia. We commonly hear phrases like ‘loud colors’, ‘frozen silence’ and ‘warm colors’, ‘bitter cold’ etc. Empirical cognitive studies have proved that synaesthetic representations both in literature and everyday languages are constrained ie. they do not map randomly among sensory domains. From the beginning of the 20th century Synaesthesia has been a research domain both in literature and structural linguistics. However the exploration of cognitive mechanisms motivating synaesthesia, have made it an important topic in 21st century cognitive linguistics and literary studies. Synaesthetic metaphors are linguistic representations of those mental mechanisms, the study of which reveals invaluable facts about perception, cognition and conceptualization. According to the main tenets of cognitive approach to language and literature, unified and similar cognitive mechanisms are active both in everyday language and literature, and synaesthesia is one of those cognitive mechanisms. Main objective of the present research is to answer the following questions: What types of sense transfers are accessible in Persian synaesthetic metaphors. How are these types of sense transfers cognitively explained. What are the results of cross-linguistic comparative study of synaestetic metaphors based on the existing observations? Methodology: The present research employs a cognitive - corpus based method, and the theoretical framework adopted to analyze linguistic synaesthesia is the contemporary theory of metaphor, where conceptual metaphor is the result of systemic mappings across cognitive domains. Persian Language Data- base (PLDB) in the Institute for Humanities and Cultural Studies which consists mainly of Persian modern prose, is searched for synaesthetic metaphors. Then for each metaphorical structure, the source and target domains are determined. Then sense transfers are identified and the types of synaesthetic metaphors recognized. Findings: Persian synaesthetic metaphors conform to the hierarchical distribution principle, according to which transfers tend to go from touch to taste to smell to sound and to sight, not vice versa. In other words mapping from more accessible or basic concepts onto less accessible or less basic ones seems more natural. Furthermore the most frequent target domain in Persian synaesthetic metaphors is sound. Certain characteristics of Persian synaesthetic metaphors are comparable with existing related researches carried on English, French, Hungarian and Chinese synaesthetic metaphors. Conclusion: Cognitive corpus based approaches to linguistic synaesthesia, are applicable to stylistics and literary criticism and this recent research domain is an efficient approach to study cross linguistic variations to find out which of the five senses is dominant cross linguistically and cross culturally as the target domain in metaphorical mappings , and so forth receiving dominance in conceptualizations.

Keywords: cognitive semantics, conceptual metaphor, synaesthesia, corpus based approach

Procedia PDF Downloads 538
1094 University Level Spanish Heritage Language Students' Use of Metaphor in Writing: Exploring Auto-Biographical Linguistic Narratives

Authors: Lorraine Ramos

Abstract:

The question of heritage language learners in foreign language classrooms has been widely debated in second language education, especially with Spanish in a U.S. Instructors of Spanish as a foreign language have brought pedagogical focus to Spanish heritage language students in order to retain, develop and maintain their first language. This paper proposes a thorough examination of the use of conceptual metaphors within autobiographical linguistic narratives as a key indicator of the writing development of advanced Spanish-language students. By pairing genre theory from Systemic Functional Linguistics with metaphor theory, this paper will examine the metaphors used by 3rd and 4th year university Spanish students within the narrative genre from a corpus of 16, 091 words. The investigation has found that heritage language students use a variety of bicultural metaphors, transferred from both languages to conceptualize their linguistic development, in addition to using metaphor in specific narrative stages as a literary strategy. Since it has been found that the metaphors used were transcultural, the use of conceptual metaphors in heritage language learners can be further examined to help these students achieve their linguistic and academic goals in the Spanish by transferring from their knowledge in English. In conclusion, by closely examining the function of student discourse through their multicultural metaphoric competence, this study provides important insights on how to enable instructors to best further their students’ writing development in the target language.

Keywords: academic writing development, heritage language learners, language attitudes and ideologies, metaphor

Procedia PDF Downloads 200
1093 Ideological Manipulations and Cultural-Norm Constraints

Authors: Masoud Hassanzade Novin, Bahloul Salmani

Abstract:

Translation cannot be considered as a simple linguistic act. Through the rise of descriptive approach in the late 1970s and 1980s, translation process managed to meet the requirements of social aspects as well as linguistic approaches. To have the translation considered as the cross-cultural communication through which various cultures communicate in ideological and cultural constraints, the contrastive analysis was conducted in this paper to reveal the distortions imposed in the translated texts. The corpus of the study involved the novel 1984 written by George Orwell and its Persian translated texts which were analyzed through the qualitative type of the research based on critical discourse analysis (CDA) and Toury's norms as well as Lefever's concepts of ideology. Results of the study revealed the point that ideology and the cultural constraints were considered as an important stimulus which can control the process of the translation.

Keywords: critical discourse analysis, ideology, norms, translated texts

Procedia PDF Downloads 311
1092 Metaphor Scenarios of Translation: An Applied Linguistic Approach to Discourse Analysis

Authors: Elizabeta Eduard Baltadzhyan

Abstract:

This work presents a stage of an investigation about the metaphorical conceptualization of translation in Bulgarian language. The material is a linguistic corpus consisting of 38 interviews with several generations Bulgarian translators and interpreters. The aim of this presentation is to inform about the results of the organization of the source concepts in scenarios that dominate the discursive manifestations of the source domains. The data show that, on the one hand, translators from different generations share some basic assignments of source and target domains, e. g. translation is a journey or translation is an artistic presentation. On the other hand, there are some specific scenarios motivated by significant changes in the socio-economic structure of the country and the valuation of the translator´s mission and work, e. g., the scenario of pleasure and addictive activity marks the generation that enjoy great support and stimulation from the socialist government, whereas the war scenario marks the generation during the Perestroika time.

Keywords: Bulgarian language, metaphor, scenario, translation

Procedia PDF Downloads 274
1091 Saudi Twitter Corpus for Sentiment Analysis

Authors: Adel Assiri, Ahmed Emam, Hmood Al-Dossari

Abstract:

Sentiment analysis (SA) has received growing attention in Arabic language research. However, few studies have yet to directly apply SA to Arabic due to lack of a publicly available dataset for this language. This paper partially bridges this gap due to its focus on one of the Arabic dialects which is the Saudi dialect. This paper presents annotated data set of 4700 for Saudi dialect sentiment analysis with (K= 0.807). Our next work is to extend this corpus and creation a large-scale lexicon for Saudi dialect from the corpus.

Keywords: Arabic, sentiment analysis, Twitter, annotation

Procedia PDF Downloads 595
1090 Unraveling Language Contact through Syntactic Dynamics of ‘Also’ in Hong Kong and Britain English

Authors: Xu Zhang

Abstract:

This article unveils an indicator of language contact between English and Cantonese in one of the Outer Circle Englishes, Hong Kong (HK) English, through an empirical investigation into 1000 tokens from the Global Web-based English (GloWbE) corpus, employing frequency analysis and logistic regression analysis. It is perceived that Cantonese and general Chinese are contextually marked by an integral underlying thinking pattern. Chinese speakers exhibit a reliance on semantic context over syntactic rules and lexical forms. This linguistic trait carries over to their use of English, affording greater flexibility to formal elements in constructing English sentences. The study focuses on the syntactic positioning of the focusing subjunct ‘also’, a linguistic element used to add new or contrasting prominence to specific sentence constituents. The English language generally allows flexibility in the relative position of 'also’, while there is a preference for close marking relationships. This article shifts attention to Hong Kong, where Cantonese and English converge, and 'also' finds counterparts in Cantonese ‘jaa’ and Mandarin ‘ye’. Employing a corpus-based data-driven method, we investigate the syntactic position of 'also' in both HK and GB English. The study aims to ascertain whether HK English exhibits a greater 'syntactic freedom,' allowing for a more distant marking relationship with 'also' compared to GB English. The analysis involves a random extraction of 500 samples from both HK and GB English from the GloWbE corpus, forming a dataset (N=1000). Exclusions are made for cases where 'also' functions as an additive conjunct or serves as a copulative adverb, as well as sentences lacking sufficient indication that 'also' functions as a focusing particle. The final dataset comprises 820 tokens, with 416 for GB and 404 for HK, annotated according to the focused constituent and the relative position of ‘also’. Frequency analysis reveals significant differences in the relative position of 'also' and marking relationships between HK and GB English. Regression analysis indicates a preference in HK English for a distant marking relationship between 'also' and its focused constituent. Notably, the subject and other constituents emerge as significant predictors of a distant position for 'also.' Together, these findings underscore the nuanced linguistic dynamics in HK English and contribute to our understanding of language contact. It suggests that future pedagogical practice should consider incorporating the syntactic variation within English varieties, facilitating leaners’ effective communication in diverse English-speaking environments and enhancing their intercultural communication competence.

Keywords: also, Cantonese, English, focus marker, frequency analysis, language contact, logistic regression analysis

Procedia PDF Downloads 23
1089 The Morphology of Sri Lankan Text Messages

Authors: Chamindi Dilkushi Senaratne

Abstract:

Communicating via a text or an SMS (Short Message Service) has become an integral part of our daily lives. With the increase in the use of mobile phones, text messaging has become a genre by itself worth researching and studying. It is undoubtedly a major phenomenon revealing language change. This paper attempts to describe the morphological processes of text language of urban bilinguals in Sri Lanka. It will be a typological study based on 500 English text messages collected from urban bilinguals residing in Colombo. The messages are selected by categorizing the deviant forms of language use apparent in text messages. These stylistic deviations are a deliberate skilled performance by the users of the language possessing an in-depth knowledge of linguistic systems to create new words and thereby convey their linguistic identity and individual and group solidarity via the message. The findings of the study solidifies arguments that the manipulation of language in text messages is both creative and appropriate. In addition, code mixing theories will be used to identify how existing morphological processes are adapted by bilingual users in Sri Lanka when texting. The study will reveal processes such as omission, initialism, insertion and alternation in addition to other identified linguistic features in text language. The corpus reveals the most common morphological processes used by Sri Lankan urban bilinguals when sending texts.

Keywords: bilingual, deviations, morphology, texts

Procedia PDF Downloads 245
1088 Introducing Data-Driven Learning into Chinese Higher Education EAP Writing Instructional Settings

Authors: Jingwen Ou

Abstract:

Writing for academic purposes in a second or foreign language is one of the most important and the most demanding skills to be mastered by non-native speakers. Traditionally, the EAP writing instruction at the tertiary level encompasses the teaching of academic genre knowledge, more specifically, the disciplinary writing conventions, the rhetorical functions, and specific linguistic features. However, one of the main sources of challenges in English academic writing for L2 students at the tertiary level can still be found in proficiency in academic discourse, especially vocabulary, academic register, and organization. Data-Driven Learning (DDL) is defined as “a pedagogical approach featuring direct learner engagement with corpus data”. In the past two decades, the rising popularity of the application of the data-driven learning (DDL) approach in the field of EAP writing teaching has been noticed. Such a combination has not only transformed traditional pedagogy aided by published DDL guidebooks in classroom use but also triggered global research on corpus use in EAP classrooms. This study endeavors to delineate a systematic review of research in the intersection of DDL and EAP writing instruction by conducting a systematic literature review on both indirect and direct DDL practice in EAP writing instructional settings in China. Furthermore, the review provides a synthesis of significant discoveries emanating from prior research investigations concerning Chinese university students’ perception of Data-Driven Learning (DDL) and the subsequent impact on their academic writing performance following corpus-based training. Research papers were selected from Scopus-indexed journals and core journals from two main Chinese academic databases (CNKI and Wanfang) published in both English and Chinese over the last ten years based on keyword searches. Results indicated an insufficiency of empirical DDL research despite a noticeable upward trend in corpus research on discourse analysis and indirect corpus applications for material design by language teachers. Research on the direct use of corpora and corpus tools in DDL, particularly in combination with genre-based EAP teaching, remains a relatively small fraction of the whole body of research in Chinese higher education settings. Such scarcity is highly related to the prevailing absence of systematic training in English academic writing registers within most Chinese universities' EAP syllabi due to the Chinese English Medium Instruction policy, where only English major students are mandated to submit English dissertations. Findings also revealed that Chinese learners still held mixed attitudes towards corpus tools influenced by learner differences, limited access to language corpora, and insufficient pre-training on corpus theoretical concepts, despite their improvements in final academic writing performance.

Keywords: corpus linguistics, data-driven learning, EAP, tertiary education in China

Procedia PDF Downloads 18
1087 Analysis of Linguistic Disfluencies in Bilingual Children’s Discourse

Authors: Sheena Christabel Pravin, M. Palanivelan

Abstract:

Speech disfluencies are common in spontaneous speech. The primary purpose of this study was to distinguish linguistic disfluencies from stuttering disfluencies in bilingual Tamil–English (TE) speaking children. The secondary purpose was to determine whether their disfluencies are mediated by native language dominance and/or on an early onset of developmental stuttering at childhood. A detailed study was carried out to identify the prosodic and acoustic features that uniquely represent the disfluent regions of speech. This paper focuses on statistical modeling of repetitions, prolongations, pauses and interjections in the speech corpus encompassing bilingual spontaneous utterances from school going children – English and Tamil. Two classifiers including Hidden Markov Models (HMM) and the Multilayer Perceptron (MLP), which is a class of feed-forward artificial neural network, were compared in the classification of disfluencies. The results of the classifiers document the patterns of disfluency in spontaneous speech samples of school-aged children to distinguish between Children Who Stutter (CWS) and Children with Language Impairment CLI). The ability of the models in classifying the disfluencies was measured in terms of F-measure, Recall, and Precision.

Keywords: bi-lingual, children who stutter, children with language impairment, hidden markov models, multi-layer perceptron, linguistic disfluencies, stuttering disfluencies

Procedia PDF Downloads 193
1086 Combining Corpus Linguistics and Critical Discourse Analysis to Study Power Relations in Hindi Newspapers

Authors: Vandana Mishra, Niladri Sekhar Dash, Jayshree Charkraborty

Abstract:

This present paper focuses on the application of corpus linguistics techniques for critical discourse analysis (CDA) of Hindi newspapers. While Corpus linguistics is the study of language as expressed in corpora (samples) of 'real world' text, CDA is an interdisciplinary approach to the study of discourse that views language as a form of social practice. CDA has mainly been studied from a qualitative perspective. However, we can say that recent studies have begun combining corpus linguistics with CDA in analyzing large volumes of text for the study of existing power relations in society. The corpus under our study is also of a sizable amount (1 million words of Hindi newspaper texts) and its analysis requires an alternative analytical procedure. So, we have combined both the quantitative approach i.e. the use of corpus techniques with CDA’s traditional qualitative analysis. In this context, we have focused on the Keyword Analysis Sorting Concordance Lines of the selected Keywords and calculating collocates of the keywords. We have made use of the Wordsmith Tool for all these analysis. The analysis starts with identifying the keywords in the political news corpus when compared with the main news corpus. The keywords are extracted from the corpus based on their keyness calculated through statistical tests like chi-squared test and log-likelihood test on the frequent words of the corpus. Some of the top occurring keywords are मोदी (Modi), भाजपा (BJP), कांग्रेस (Congress), सरकार (Government) and पार्टी (Political party). This is followed by the concordance analysis of these keywords which generates thousands of lines but we have to select few lines and examine them based on our objective. We have also calculated the collocates of the keywords based on their Mutual Information (MI) score. Both concordance and collocation help to identify lexical patterns in the political texts. Finally, all these quantitative results derived from the corpus techniques will be subjectively interpreted in accordance to the CDA’s theory to examine the ways in which political news discourse produces social and political inequality, power abuse or domination.

Keywords: critical discourse analysis, corpus linguistics, Hindi newspapers, power relations

Procedia PDF Downloads 192
1085 The Latent Model of Linguistic Features in Korean College Students’ L2 Argumentative Writings: Syntactic Complexity, Lexical Complexity, and Fluency

Authors: Jiyoung Bae, Gyoomi Kim

Abstract:

This study explores a range of linguistic features used in Korean college students’ argumentative writings for the purpose of developing a model that identifies variables which predict writing proficiencies. This study investigated the latent variable structure of L2 linguistic features, including syntactic complexity, the lexical complexity, and fluency. One hundred forty-six university students in Korea participated in this study. The results of the study’s confirmatory factor analysis (CFA) showed that indicators of linguistic features from this study-provided a foundation for re-categorizing indicators found in extant research on L2 Korean writers depending on each latent variable of linguistic features. The CFA models indicated one measurement model of L2 syntactic complexity and L2 learners’ writing proficiency; these two latent factors were correlated with each other. Based on the overall findings of the study, integrated linguistic features of L2 writings suggested some pedagogical implications in L2 writing instructions.

Keywords: linguistic features, syntactic complexity, lexical complexity, fluency

Procedia PDF Downloads 142
1084 Translation and Sociolinguistics of Classical Books

Authors: Laura de Almeida

Abstract:

This paper aims to present research involving the translation of classical books originally in English and translated into the Portuguese language. The objective is to analyze the linguistic varieties evident and how they appear in the other language the work was translated into. We based our study on the sociolinguistics theory, more specifically, the study of the Black English Vernacular. Our methodology is built on collecting data from the speech characters of the Black English Vernacular from some books such as The Adventures of Huckleberry Finn by Mark Twain. On doing so, we compare the two versions of a book and how they reflected the linguistic variety. Our purpose is to show that some translators do not worry when dealing with linguistic variety. In other words, they just translate the story without taking into account some important linguistic aspects which need attention, such as language variation.

Keywords: classical books, linguistic variation, sociolinguistics, translation

Procedia PDF Downloads 369
1083 Adjectives in Academic Discourse: A Comparative Study of Research Articles

Authors: Beata Grymska

Abstract:

The research studies on academic discourse focus in general on lexical bundles, epistemic modality markers, or interactions between writers and readers. Following the research into the written forms of the academic community, this study concentrates on adjectives in research articles. The study investigates the distribution of adjectives in research articles in two academic disciplines: linguistics and medicine. It is corpus-based in design and consists of 100 linguistic and 100 medical research articles all written in English. The aim of the study is to compare the distribution of adjectives between the two corpora and four main parts of articles: IMRD (Introduction, Methods, Results, and Discussion). The second aim is to see if the two corpora share common core adjectives, e.g., different, important, specific, and if there are discipline-specific adjectives. The further part of the paper elaborates on adjectives use in the corpora together with examples. The results indicate that the two corpora do not differ in the distribution of adjectives to a great extent. The occurrences of the most frequently used adjectives depend on the academic discipline of the research articles. The concluding part reflects upon the role of adjectives in academic discourse and also presents how corpora can be helpful in composing academic texts.

Keywords: academic discourse, academic texts, adjectives, corpus analysis, research articles

Procedia PDF Downloads 160
1082 A Corpus-Based Study of Evaluative Language in Leading Articles in British Broadsheet and Tabloid Newspapers

Authors: Fatimah AlSaiari

Abstract:

In recent years, newspapers in the United Kingdom have been no longer just a means of sharing news about what happens in the world; they are also used to influence target readers by having them become more up-to-date, well-informed, entertained, exasperated, delighted, and infuriated. To achieve these objectives and maintain influence on public opinion, journalists use a particular language in which they can convey emotions and opinions, organize their discourse, and establish solidarity with their audience. This type of language has been widely analyzed under different labels, such as evaluation, appraisal, and stance. There is a considerable amount of linguistic and non-linguistic research devoted to analyzing this type of interpersonal language in journalistic discourse, and most of these studies were carried out to challenge the traditional assumptions of the objectivity and impartiality of news reporting. However, very little research has been undertaken on evaluative language in newspaper institutional editorials, and there is hardly any systematic or exhaustive analysis of this type of language in British tabloid and broadsheet newspapers. This study will attempt to provide new insights into the nature of authorial and non-authorial evaluation in leading articles in popular and quality British newspapers, along with their targets, sources, and discourse functions. The study will also attempt to develop a framework of evaluation that can be applied to evaluative lexical items in newspaper opinion texts. The framework is both theory-driven (i.e., it builds on and modifies previous frameworks of evaluation such as appraisal theory and parameter-based approach) and data-driven (i.e., it elicits the evaluative categories from the analysis of the corpus, which helps in the development of the current framework). To achieve this aim, a corpus of 140 leading articles were selected. The findings revealed that the tabloids tended to express their stance through explicitness, dramatization, frequent reference to social actors’ emotions and beliefs, and exaggeration in negativity, while the broadsheets preferred to express their stance through mitigation ambiguity and implicitness. conceptual themes and propositions were more preferable targets for expressing stance in the broadsheets while human behavior and characters were preferable targets for the tabloids.

Keywords: appraisal theory, evaluative language, British newspapers, broadsheets & tabloids, evaluative adjectives

Procedia PDF Downloads 268
1081 A Corpus-Based Discourse Analysis of the Disappearance of MH370 in Malaysia and United Kingdom Newspapers: A Pilot Study

Authors: Theng Theng Ong

Abstract:

This pilot study adopts a corpus-based discourse analysis to explore the construction of Malaysia airline tragedy MH370 in the selected Malaysian and United Kingdom (UK) newspapers. Fairclough’s three-dimensional model is adopted in the study to support the corpus-based analysis. The analysis aims to determine the ways in which Malaysian Airline tragedy MH370 is linguistically defined and constructed in terms of keywords and collocation. The study also seeks to identify the types of discourse that are presented in the news articles. In addition, the differences or similarities in terms of keywords, topics or issues covered by the selected Malaysian and UK news media are examined.

Keywords: corpus, CDA, newspapers, airline tragedies

Procedia PDF Downloads 271
1080 Men Act, Women Are Acted Upon: Morphosyntactic Framing of the Sexual Intercourse in Online Pornography Titles

Authors: Aleksandra Tomic

Abstract:

According to reliable sources, 4% of all websites is devoted to pornographic material, yet these estimates are often reported to be much higher. The largest internet pornography streaming website reports 21.2 billion visits in 2015 only. Considering the ubiquity of online pornography and the frequency of use, it is necessary to examine its potential influence on the construal of the sexual act and the roles of participants. Apart from the verbal and physical interactions in the pornographic movies themselves, the language in the titles of movies has the power to frame the sexual intercourse. In this study, Critical Discourse Analysis and corpus linguistics approaches will be used to examine the way the sexual intercourse and the roles of the participants are ideologically construed and perpetuated in the Internet pornography discourse. To this end, the study will explore the association between the specific morphosyntactic aspects of the references to performers of both genders, the person and the thematic role, and the gender of referred performer in the corpus of online pornographic movie titles. Distinctive collexeme analysis will be conducted to uncover possible associations between for gender of the performer denoted by the linguistic expression, and the person and thematic role assigned to it in the titles of online pornography movies. Initial results of the chi-square procedure performed on a sample of 295 online pornography movie titles on the largest pornography streaming website ‘Pornhub’ yielded significant results. The use of the three person categories was not equally distributed between genders, X2 (2, N = 106) = 32.52, p < 0.001, with female performers being referred to in the third person in 71.7% of the instances, and speaking in the first person 20.8% of the time, whereas male performers spoke in the first person 68% of the time, and were referred to in the third person in 17% of the instances. Moreover, there was a gender disparity in the assignment of thematic roles, with linguistic expressions for women being assigned the Patient role and men the Agent role in 58.8% of the cases, whereas the roles were reversed in 41.2% of the instances, X2 (1, N = 262) = 8.07633, p < 0.005. The results are discussed in terms of the ideologies surrounding female and male sexuality in the pornography discourse. Potential patterns of power imbalance, objectification, and discrimination are highlighted. Finally, the evidence from psycholinguistic studies on the influence of the language structure on event construal is related to the results of the study.

Keywords: corpus linguistics, gender studies, pornography, thematic roles

Procedia PDF Downloads 151
1079 Functions and Pragmatic Aspects of English Nonsense

Authors: Natalia V. Ursul

Abstract:

In linguistic studies, the question of nonsense is attracting increasing interest. Nonsense is usually defined as spoken or written words that have no meaning. However, this definition is likely to be outdated as any speech act is generated due to the speaker’s pragmatic reasons, thus it cannot be purely illogical or meaningless. In the current paper a new working definition of nonsense as a linguistic medium will be formulated; moreover, the pragmatic peculiarities of newly coined linguistic patterns and possible ways of their interpretation will be discussed.

Keywords: nonsense, nonse verse, pragmatics, speech act

Procedia PDF Downloads 488
1078 Linguistic Summarization of Structured Patent Data

Authors: E. Y. Igde, S. Aydogan, F. E. Boran, D. Akay

Abstract:

Patent data have an increasingly important role in economic growth, innovation, technical advantages and business strategies and even in countries competitions. Analyzing of patent data is crucial since patents cover large part of all technological information of the world. In this paper, we have used the linguistic summarization technique to prove the validity of the hypotheses related to patent data stated in the literature.

Keywords: data mining, fuzzy sets, linguistic summarization, patent data

Procedia PDF Downloads 248
1077 Passive Voice in SLA: Armenian Learners’ Case Study

Authors: Emma Nemishalyan

Abstract:

It is believed that learners’ mother tongue (L1 hereafter) has a huge impact on their second language acquisition (L2 hereafter). This hypothesis has been exposed to both positive and negative criticism. Based on research results of a wide range of learners’ corpora (Chinese, Japanese, Spanish among others) the hypothesis has either been proved or disproved. However, no such study has been conducted on the Armenian learners. The aim of this paper is to understand the implication of the hypothesis on the Armenian learners’ corpus in terms of the use of the passive voice. To this end, the method of Contrastive Interlanguage Analysis (hereafter CIA) has been used on native speakers’ corpus (Louvain Corpus of Native English Essays (LOCNESS)) and Armenian learners’ corpus which has been compiled by me in compliance with International Corpus of Learner English (ICLE) guidelines. CIA compares the interlanguage (the language produced by learners) with the one produced by native speakers. With the help of this method, it is possible not only to highlight the mistakes that learners make, but also to underline the under or overuses. The choice of the grammar issue (passive voice) is conditioned by the fact that typologically Armenian and English are drastically different as they belong to different branches. Moreover, the passive voice is considered to be one of the most problematic grammar topics to be acquired by learners of the English language. Based on this difference, we hypothesized that Armenian learners would either overuse or underuse some types of the passive voice. With the help of Lancsbox software, we have identified the frequency rates of passive voice usage in LOCNESS and Armenian learners’ corpus to understand whether the latter have the same usage pattern of the passive voice as the native speakers. Secondly, we have identified the types of the passive voice used by the Armenian leaners trying to track down the reasons in their mother tongue. The results of the study showed that Armenian learners underused the passive voices in contrast to native speakers. Furthermore, the hypothesis that learners’ L1 has an impact on learners’ L2 acquisition and production was proved.

Keywords: corpus linguistics, applied linguistics, second language acquisition, corpus compilation

Procedia PDF Downloads 64