Search results for: corpus stylistics
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 368

Search results for: corpus stylistics

368 Corpus Stylistics and Multidimensional Analysis for English for Specific Purposes Teaching and Assessment

Authors: Svetlana Strinyuk, Viacheslav Lanin

Abstract:

Academic English has become lingua franca for international scientific community which stimulates universities to introduce English for Specific Purposes (EAP) courses into curriculum. Teaching L2 EAP students might be fulfilled with corpus technologies and digital stylistics. A special software developed to reach the manifold task of teaching, assessing and researching academic writing of L2 students on basis of digital stylistics and multidimensional analysis was created. A set of annotations (style markers) – grammar, lexical and syntactic features most significant of academic writing was built. Contrastive comparison of two corpora “model corpus”, subject domain limited papers published by competent writers in leading academic journals, and “students’ corpus”, subject domain limited papers written by last year students allows to receive data about the features of academic writing underused or overused by L2 EAP student. Both corpora are tagged with a special software created in GATE Developer. Style markers within the framework of research might be replaced depending on the relevance and validity of the result which is achieved from research corpora. Thus, selecting relevant (high frequency) style markers and excluding less relevant, i.e. less frequent annotations, high validity of the model is achieved. Software allows to compare the data received from processing model corpus to students’ corpus and get reports which can be used in teaching and assessment. The less deviation from the model corpus students demonstrates in their writing the higher is academic writing skill acquisition. The research showed that several style markers (hedging devices) were underused by L2 EAP students whereas lexical linking devices were used excessively. A special software implemented into teaching of EAP courses serves as a successful visual aid, makes assessment more valid; it is indicative of the degree of writing skill acquisition, and provides data for further research.

Keywords: corpus technologies in EAP teaching, multidimensional analysis, GATE Developer, corpus stylistics

Procedia PDF Downloads 148
367 A Stylistic Analysis of the Short Story ‘The Escape’ by Qaisra Shahraz

Authors: Huma Javed

Abstract:

Stylistics is a broad term that is concerned with both literature and linguistics, due to which the significance of the stylistics increases. This research aims to analyze Qaisra Shahraz's short story ‘The Escape’ from the stylistic analysis viewpoint. The focus of this study is on three aspects grammar category, lexical category, and figure of speech of the short story. The research designs for this article are both explorative and descriptive. The analysis of the data shows that the writer has used more nouns in the story as compared to other lexical items, which suggests that story has a descriptive style rather than narrative.

Keywords: The Escape, stylistics, grammatical category, lexical category, figure of speech

Procedia PDF Downloads 179
366 An Interdisciplinary Approach to Investigating Style: A Case Study of a Chinese Translation of Gilbert’s (2006) Eat Pray Love

Authors: Elaine Y. L. Ng

Abstract:

Elizabeth Gilbert’s (2006) biography Eat, Pray, Love describes her travels to Italy, India, and Indonesia after a painful divorce. The author’s experiences with love, loss, search for happiness, and meaning have resonated with a huge readership. As regards the translation of Gilbert’s (2006) Eat, Pray, Love into Chinese, it was first translated by a Taiwanese translator He Pei-Hua and published in Taiwan in 2007 by Make Boluo Wenhua Chubanshe with the fairly catching title “Enjoy! Traveling Alone.” The same translation was translocated to China, republished in simplified Chinese characters by Shanxi Shifan Daxue Chubanshe in 2008 and renamed in China, entitled “To Be a Girl for the Whole Life.” Later on, the same translation in simplified Chinese characters was reprinted by Hunan Wenyi Chubanshe in 2013. This study employs Munday’s (2002) systemic model for descriptive translation studies to investigate the translation of Gilbert’s (2006) Eat, Pray, Love into Chinese by the Taiwanese translator Hu Pei-Hua. It employs an interdisciplinary approach, combining systemic functional linguistics and corpus stylistics with sociohistorical research within a descriptive framework to study the translator’s discursive presence in the text. The research consists of three phases. The first phase is to locate the target text within its socio-cultural context. The target-text context concerning the para-texts, readers’ responses, and the publishers’ orientation will be explored. The second phase is to compare the source text and the target text for the categorization of translation shifts by using the methodological tools of systemic functional linguistics and corpus stylistics. The investigation concerns the rendering of mental clauses and speech and thought presentation. The final phase is an explanation of the causes of translation shifts. The linguistic findings are related to the extra-textual information collected in an effort to ascertain the motivations behind the translator’s choices. There exist sets of possible factors that may have contributed to shaping the textual features of the given translation within a specific socio-cultural context. The study finds that the translator generally reproduces the mental clauses and speech and thought presentation closely according to the original. Nevertheless, the language of the translation has been widely criticized to be unidiomatic and stiff, losing the elegance of the original. In addition, the several Chinese translations of the given text produced by one Taiwanese and two Chinese publishers are basically the same. They are repackaged slightly differently, mainly with the change of the book cover and its captions for each version. By relating the textual findings to the extra-textual data of the study, it is argued that the popularity of the Chinese translation of Gilbert’s (2006) Eat, Pray, Love may not be attributed to the quality of the translation. Instead, it may have to do with the way the work is promoted strategically by the social media manipulated by the four e-bookstores promoting and selling the book online in China.

Keywords: chinese translation of eat pray love, corpus stylistics, motivations for translation shifts, systemic approach to translation studies

Procedia PDF Downloads 136
365 Integrating Critical Stylistics and Visual Grammar: A Multimodal Stylistic Approach to the Analysis of Non-Literary Texts

Authors: Shatha Khuzaee

Abstract:

The study develops multimodal stylistic approach to analyse a number of BBC online news articles reporting some key events from the so called ‘Arab Uprisings’. Critical stylistics (CS) and visual grammar (VG) provide insightful arguments to the ways ideology is projected through different verbal and visual modes, yet they are mode specific because they examine how each mode projects its meaning separately and do not attempt to clarify what happens intersemiotically when the two modes co-occur. Therefore, it is the task undertaken in this research to propose multimodal stylistic approach that addresses the issue of ideology construction when the two modes co-occur. Informed by functional grammar and social semiotics, the analysis attempts to integrate three linguistic models developed in critical stylistics, namely, transitivity choices, prioritizing and hypothesizing along with their visual equivalents adopted from visual grammar to investigate the way ideology is constructed, in multimodal text, when text/image participate and interrelate in the process of meaning making on the textual level of analysis. The analysis provides comprehensive theoretical and analytical elaborations on the different points of integration between CS linguistic models and VG equivalents which operate on the textual level of analysis to better account for ideology construction in news as non-literary multimodal texts. It is argued that the analysis well thought out a plan that would remark the first step towards the integration between the well-established linguistic models of critical stylistics and that of visual analysis to analyse multimodal texts on the textual level. Both approaches are compatible to produce multimodal stylistic approach because they intend to analyse text and image depending on whatever textual evidence is available. This supports the analysis maintain the rigor and replicability needed for a stylistic analysis like the one undertaken in this study.

Keywords: multimodality, stylistics, visual grammar, social semiotics, functional grammar

Procedia PDF Downloads 188
364 A Preliminary Study for Building an Arabic Corpus of Pair Questions-Texts from the Web: Aqa-Webcorp

Authors: Wided Bakari, Patrce Bellot, Mahmoud Neji

Abstract:

With the development of electronic media and the heterogeneity of Arabic data on the Web, the idea of building a clean corpus for certain applications of natural language processing, including machine translation, information retrieval, question answer, become more and more pressing. In this manuscript, we seek to create and develop our own corpus of pair’s questions-texts. This constitution then will provide a better base for our experimentation step. Thus, we try to model this constitution by a method for Arabic insofar as it recovers texts from the web that could prove to be answers to our factual questions. To do this, we had to develop a java script that can extract from a given query a list of html pages. Then clean these pages to the extent of having a database of texts and a corpus of pair’s question-texts. In addition, we give preliminary results of our proposal method. Some investigations for the construction of Arabic corpus are also presented in this document.

Keywords: Arabic, web, corpus, search engine, URL, question, corpus building, script, Google, html, txt

Procedia PDF Downloads 287
363 Native Language Identification with Cross-Corpus Evaluation Using Social Media Data: ’Reddit’

Authors: Yasmeen Bassas, Sandra Kuebler, Allen Riddell

Abstract:

Native language identification is one of the growing subfields in natural language processing (NLP). The task of native language identification (NLI) is mainly concerned with predicting the native language of an author’s writing in a second language. In this paper, we investigate the performance of two types of features; content-based features vs. content independent features, when they are evaluated on a different corpus (using social media data “Reddit”). In this NLI task, the predefined models are trained on one corpus (TOEFL), and then the trained models are evaluated on different data using an external corpus (Reddit). Three classifiers are used in this task; the baseline, linear SVM, and logistic regression. Results show that content-based features are more accurate and robust than content independent ones when tested within the corpus and across corpus.

Keywords: NLI, NLP, content-based features, content independent features, social media corpus, ML

Procedia PDF Downloads 92
362 Semantic Preference across Research Articles: A Corpus-Based Study of Adjectives in English

Authors: Valdênia Carvalho e Almeida

Abstract:

The goal of the present study is to investigate the semantic preference of the most frequent adjectives in research articles through a corpus-based analysis of texts published in journals in Applied Linguistics (AL). The corpus used in this study contains texts published in the period from 2014 to 2018 in the three journals: Language Learning and Technology; English for Academic Purposes, and TESOL Quaterly, totaling more than one million words. A corpus-based analysis was carried out on the corpus to identify the most frequent adjectives that co-occurred in the three journals. By observing the concordance lines of the adjectives and analyzing the words they associated with, the semantic preferences of each adjective were determined. Later, the AL corpus analysis was compared to the investigation of the same adjectives in a corpus of Chemistry. This second part of the study aimed to identify possible differences and similarities between the two corpora in relation to the use of the adjectives in research articles from both areas. The results show that there are some preferences which seem to be closely related not only to the academic genre of the texts but also to the specific domain of the discipline and, to a lesser extent, to the context of research in each journal. This research illustrates a possible contribution of Corpus Linguistics to explore the concept of semantic preference in more detail, considering the complex nature of the phenomenon.

Keywords: applied linguistics, corpus linguistics, chemistry, research article, semantic preference

Procedia PDF Downloads 144
361 Cognitive Stylistics and Horror Fiction: A Case Study of Stephen King’s Misery

Authors: Kriangkrai Vathanalaoha

Abstract:

Misery generates fear and anxiety in readers through its intense plot associated with the unpredictable emotional states of the nurse, Annie Wilkes. At the same time, she mentally and physically abuses the novelist victim, Paul Sheldon. The suspense is not only at the story level, where the violent expressions are used but also at the discourse level, where the linguistic structures may intentionally cause the reader to view language as disturbing performative. This performativity could be reflected through linguistic choices where the writer triggers a new imaginative world through experiential metafunction and schema disruption. This study explores striking excerpts from the fiction through mind style and transitivity analysis to demonstrate how the horrific experience contrasts when the protagonist and the antagonist converse extensively. The results reveal that stylistic deviation can be found at the syntactic levels, where the intensity of emotions can be apparent when the protagonist is verbally abused. In addition, transitivity can flesh out how the protagonist is expressed chiefly through the internalized process, whereas the antagonist is eminent with the externalized process. The findings suggest that the application of cognitive stylistics, such as mind style and transitivity analysis, could contribute to the mental representation of horrific reality.

Keywords: horror, mind style, misery, stylistics, transitivity

Procedia PDF Downloads 100
360 Specialized Translation Teaching Strategies: A Corpus-Based Approach

Authors: Yingying Ding

Abstract:

This study presents a methodology of specialized translation with the objective of helping teachers to improve the strategies in teaching translation. In order to allow students to acquire skills to translate specialized texts, they need to become familiar with the semantic and syntactic features of source texts and target texts. The aim of our study is to use a corpus-based approach in the teaching of specialized translation between Chinese and Italian. This study proposes to construct a specialized Chinese - Italian comparable corpus that consists of 50 economic contracts from the domain of food. With the help of AntConc, we propose to compile a comparable corpus in for translation teaching purposes. This paper attempts to provide insight into how teachers could benefit from comparable corpus in the teaching of specialized translation from Italian into Chinese and through some examples of passive sentences how students could learn to apply different strategies for translating appropriately the voice.

Keywords: contrastive studies, specialised translation, corpus-based approach, teaching

Procedia PDF Downloads 331
359 Grammatically Coded Corpus of Spoken Lithuanian: Methodology and Development

Authors: L. Kamandulytė-Merfeldienė

Abstract:

The paper deals with the main issues of methodology of the Corpus of Spoken Lithuanian which was started to be developed in 2006. At present, the corpus consists of 300,000 grammatically annotated word forms. The creation of the corpus consists of three main stages: collecting the data, the transcription of the recorded data, and the grammatical annotation. Collecting the data was based on the principles of balance and naturality. The recorded speech was transcribed according to the CHAT requirements of CHILDES. The transcripts were double-checked and annotated grammatically using CHILDES. The development of the Corpus of Spoken Lithuanian has led to the constant increase in studies on spontaneous communication, and various papers have dealt with a distribution of parts of speech, use of different grammatical forms, variation of inflectional paradigms, distribution of fillers, syntactic functions of adjectives, the mean length of utterances.

Keywords: CHILDES, corpus of spoken Lithuanian, grammatical annotation, grammatical disambiguation, lexicon, Lithuanian

Procedia PDF Downloads 197
358 Corporate Cautionary Statement: A Genre of Professional Communication

Authors: Chie Urawa

Abstract:

Cautionary statements or disclaimers in corporate annual reports need to be carefully designed because clear cautionary statements may protect a company in the case of legal disputes and may undermine positive impressions. This study compares the language of cautionary statements using two corpora, Sony’s cautionary statement corpus (S-corpus) and Panasonic’s cautionary statement corpus (P-corpus), illustrating the differences and similarities in relation to the use of meaningful cautionary statements and critically analyzing why practitioners use the way. The findings describe the distinct differences between the two companies in the presentation of the risk factors and the way how they make the statements. The word ability is used more for legal protection in S-corpus whereas the word possibility is used more to convey a better impression in P-corpus. The main similarities are identified in the use of lexical words and pronouns, and almost the same wordings for eight years. The findings show how they make the statements unique to the company in the presentation of risk factors, and the characteristics of specific genre of professional communication. Important implications of this study are that more comprehensive approach can be applied in other contexts, and be used by companies to reflect upon their cautionary statements.

Keywords: cautionary statements, corporate annual reports, corpus, risk factors

Procedia PDF Downloads 126
357 A Corpus-Based Study on the Styles of Three Translators

Authors: Wang Yunhong

Abstract:

The present paper is preoccupied with the different styles of three translators in their translating a Chinese classical novel Shuihu Zhuan. Based on a parallel corpus, it adopts a target-oriented approach to look into whether and what stylistic differences and shifts the three translations have revealed. The findings show that the three translators demonstrate different styles concerning their word choices and sentence preferences, which implies that identification of recurrent textual patterns may be a basic step for investigating the style of a translator.

Keywords: corpus, lexical choices, sentence characteristics, style

Procedia PDF Downloads 224
356 A Corpus-Assisted Discourse Analysis of Adjectival Collocation of the Word 'Education' in the American Context

Authors: Ngan Nguyen

Abstract:

The study analyses adjectives collocating with the word ‘education’ in the American language of the Corpus of Global Web-based English using a combination of corpus linguistic and discourse analytical methods to examine not only language patterns but also social political ideologies around the topic. Significant conclusions are deduced: (1) there are a large number of adjectival collocates of the word education which have been identified and classified into four categories representing four different aspects of education: level, quality, forms and types of education; (2) education, as in combination with three first categories, carries the meaning as the act and process of teaching and learning while with the last category having the meaning of a particular kind of teaching or training; (3) higher education is the topic that gains most concerns from the American public; (4) five most significant ideologies are discovered from the corpus: higher education associates with financial affairs, higher education is an industry, monetary policy of the government on higher education, people require greater accessibility to higher education and people value higher education. The study contributes to the field of developing meanings of words through corpus analysis and the field of discourse analysis.

Keywords: adjectival collocation, American context, corpus linguistics, discourse analysis, education

Procedia PDF Downloads 293
355 Saudi Twitter Corpus for Sentiment Analysis

Authors: Adel Assiri, Ahmed Emam, Hmood Al-Dossari

Abstract:

Sentiment analysis (SA) has received growing attention in Arabic language research. However, few studies have yet to directly apply SA to Arabic due to lack of a publicly available dataset for this language. This paper partially bridges this gap due to its focus on one of the Arabic dialects which is the Saudi dialect. This paper presents annotated data set of 4700 for Saudi dialect sentiment analysis with (K= 0.807). Our next work is to extend this corpus and creation a large-scale lexicon for Saudi dialect from the corpus.

Keywords: Arabic, sentiment analysis, Twitter, annotation

Procedia PDF Downloads 583
354 Corpus Linguistic Methods in a Theoretical Study of Quran Verb Tense and Aspect in Translations from Arabic to English

Authors: Jawharah Alasmari

Abstract:

In inflectional morphology of verb, tense and aspect indicate action’s time either past/present or future and their period whether completed or not. The usage and meaning of tense and aspect differ in Arabic and English, therefore is no simple one -to- one mapping from an Arabic verb inflected form an appropriate English translation depends on a range of features, including immediate and wider context of use. The Quranic Arabic Corpus includes seven alternative expertly crafted English translations of each Arabic verses, which provides a test dataset for the study of appropriate Arabic to English translations of verb tense and aspect. We applied Corpus Linguistics Methods in a theoretical study of exemplary verbs, to elicit candidate verbal contexts which influence the choice of English inflection for each verse.

Keywords: Corpus linguistics methods, Arabic verb, tense and aspect, English translations

Procedia PDF Downloads 348
353 Combining Corpus Linguistics and Critical Discourse Analysis to Study Power Relations in Hindi Newspapers

Authors: Vandana Mishra, Niladri Sekhar Dash, Jayshree Charkraborty

Abstract:

This present paper focuses on the application of corpus linguistics techniques for critical discourse analysis (CDA) of Hindi newspapers. While Corpus linguistics is the study of language as expressed in corpora (samples) of 'real world' text, CDA is an interdisciplinary approach to the study of discourse that views language as a form of social practice. CDA has mainly been studied from a qualitative perspective. However, we can say that recent studies have begun combining corpus linguistics with CDA in analyzing large volumes of text for the study of existing power relations in society. The corpus under our study is also of a sizable amount (1 million words of Hindi newspaper texts) and its analysis requires an alternative analytical procedure. So, we have combined both the quantitative approach i.e. the use of corpus techniques with CDA’s traditional qualitative analysis. In this context, we have focused on the Keyword Analysis Sorting Concordance Lines of the selected Keywords and calculating collocates of the keywords. We have made use of the Wordsmith Tool for all these analysis. The analysis starts with identifying the keywords in the political news corpus when compared with the main news corpus. The keywords are extracted from the corpus based on their keyness calculated through statistical tests like chi-squared test and log-likelihood test on the frequent words of the corpus. Some of the top occurring keywords are मोदी (Modi), भाजपा (BJP), कांग्रेस (Congress), सरकार (Government) and पार्टी (Political party). This is followed by the concordance analysis of these keywords which generates thousands of lines but we have to select few lines and examine them based on our objective. We have also calculated the collocates of the keywords based on their Mutual Information (MI) score. Both concordance and collocation help to identify lexical patterns in the political texts. Finally, all these quantitative results derived from the corpus techniques will be subjectively interpreted in accordance to the CDA’s theory to examine the ways in which political news discourse produces social and political inequality, power abuse or domination.

Keywords: critical discourse analysis, corpus linguistics, Hindi newspapers, power relations

Procedia PDF Downloads 176
352 Linguistic Cyberbullying, a Legislative Approach

Authors: Simona Maria Ignat

Abstract:

Bullying online has been an increasing studied topic during the last years. Different approaches, psychological, linguistic, or computational, have been applied. To our best knowledge, a definition and a set of characteristics of phenomenon agreed internationally as a common framework are still waiting for answers. Thus, the objectives of this paper are the identification of bullying utterances on Twitter and their algorithms. This research paper is focused on the identification of words or groups of words, categorized as “utterances”, with bullying effect, from Twitter platform, extracted on a set of legislative criteria. This set is the result of analysis followed by synthesis of law documents on bullying(online) from United States of America, European Union, and Ireland. The outcome is a linguistic corpus with approximatively 10,000 entries. The methods applied to the first objective have been the following. The discourse analysis has been applied in identification of keywords with bullying effect in texts from Google search engine, Images link. Transcription and anonymization have been applied on texts grouped in CL1 (Corpus linguistics 1). The keywords search method and the legislative criteria have been used for identifying bullying utterances from Twitter. The texts with at least 30 representations on Twitter have been grouped. They form the second corpus linguistics, Bullying utterances from Twitter (CL2). The entries have been identified by using the legislative criteria on the the BoW method principle. The BoW is a method of extracting words or group of words with same meaning in any context. The methods applied for reaching the second objective is the conversion of parts of speech to alphabetical and numerical symbols and writing the bullying utterances as algorithms. The converted form of parts of speech has been chosen on the criterion of relevance within bullying message. The inductive reasoning approach has been applied in sampling and identifying the algorithms. The results are groups with interchangeable elements. The outcomes convey two aspects of bullying: the form and the content or meaning. The form conveys the intentional intimidation against somebody, expressed at the level of texts by grammatical and lexical marks. This outcome has applicability in the forensic linguistics for establishing the intentionality of an action. Another outcome of form is a complex of graphemic variations essential in detecting harmful texts online. This research enriches the lexicon already known on the topic. The second aspect, the content, revealed the topics like threat, harassment, assault, or suicide. They are subcategories of a broader harmful content which is a constant concern for task forces and legislators at national and international levels. These topic – outcomes of the dataset are a valuable source of detection. The analysis of content revealed algorithms and lexicons which could be applied to other harmful contents. A third outcome of content are the conveyances of Stylistics, which is a rich source of discourse analysis of social media platforms. In conclusion, this corpus linguistics is structured on legislative criteria and could be used in various fields.

Keywords: corpus linguistics, cyberbullying, legislation, natural language processing, twitter

Procedia PDF Downloads 46
351 A Corpus-Based Discourse Analysis of the Disappearance of MH370 in Malaysia and United Kingdom Newspapers: A Pilot Study

Authors: Theng Theng Ong

Abstract:

This pilot study adopts a corpus-based discourse analysis to explore the construction of Malaysia airline tragedy MH370 in the selected Malaysian and United Kingdom (UK) newspapers. Fairclough’s three-dimensional model is adopted in the study to support the corpus-based analysis. The analysis aims to determine the ways in which Malaysian Airline tragedy MH370 is linguistically defined and constructed in terms of keywords and collocation. The study also seeks to identify the types of discourse that are presented in the news articles. In addition, the differences or similarities in terms of keywords, topics or issues covered by the selected Malaysian and UK news media are examined.

Keywords: corpus, CDA, newspapers, airline tragedies

Procedia PDF Downloads 255
350 Passive Voice in SLA: Armenian Learners’ Case Study

Authors: Emma Nemishalyan

Abstract:

It is believed that learners’ mother tongue (L1 hereafter) has a huge impact on their second language acquisition (L2 hereafter). This hypothesis has been exposed to both positive and negative criticism. Based on research results of a wide range of learners’ corpora (Chinese, Japanese, Spanish among others) the hypothesis has either been proved or disproved. However, no such study has been conducted on the Armenian learners. The aim of this paper is to understand the implication of the hypothesis on the Armenian learners’ corpus in terms of the use of the passive voice. To this end, the method of Contrastive Interlanguage Analysis (hereafter CIA) has been used on native speakers’ corpus (Louvain Corpus of Native English Essays (LOCNESS)) and Armenian learners’ corpus which has been compiled by me in compliance with International Corpus of Learner English (ICLE) guidelines. CIA compares the interlanguage (the language produced by learners) with the one produced by native speakers. With the help of this method, it is possible not only to highlight the mistakes that learners make, but also to underline the under or overuses. The choice of the grammar issue (passive voice) is conditioned by the fact that typologically Armenian and English are drastically different as they belong to different branches. Moreover, the passive voice is considered to be one of the most problematic grammar topics to be acquired by learners of the English language. Based on this difference, we hypothesized that Armenian learners would either overuse or underuse some types of the passive voice. With the help of Lancsbox software, we have identified the frequency rates of passive voice usage in LOCNESS and Armenian learners’ corpus to understand whether the latter have the same usage pattern of the passive voice as the native speakers. Secondly, we have identified the types of the passive voice used by the Armenian leaners trying to track down the reasons in their mother tongue. The results of the study showed that Armenian learners underused the passive voices in contrast to native speakers. Furthermore, the hypothesis that learners’ L1 has an impact on learners’ L2 acquisition and production was proved.

Keywords: corpus linguistics, applied linguistics, second language acquisition, corpus compilation

Procedia PDF Downloads 41
349 The Repetition of New Words and Information in Mandarin-Speaking Children: A Corpus-Based Study

Authors: Jian-Jun Gao

Abstract:

Repetition is used for a variety of functions in conversation. When young children first learn to speak, they often repeat words from the adult’s recent utterance with the learning and social function. The objective of this study was to ascertain whether the repetitions are equivalent in indicating attention to new words and the initial repeat of information in conversation. Based on the observation of naturally occurring language use in Taiwan Corpus of Child Mandarin (TCCM), the results in this study provided empirical support to the previous findings that children are more likely to repeat new words they are offered than to repeat new information. When children get older, there would be a drop in the repetition of both new words and new information.

Keywords: acquisition, corpus, mandarin, new words, new information, repetition

Procedia PDF Downloads 108
348 Corpus-Based Model of Key Concepts Selection for the Master English Language Course "Government Relations"

Authors: Elena Pozdnyakova

Abstract:

“Government Relations” is a field of knowledge presently taught at the majority of universities around the globe. English as the default language can become the language of teaching since the issues discussed are both global and national in character. However for this field of knowledge key concepts and their word representations in English don’t often coincide with those in other languages. International master’s degree students abroad as well as students, taught the course in English at their national universities, are exposed to difficulties, connected with correct conceptualizing of terminology of GR in British and American academic traditions. The study was carried out during the GR English language course elaboration (pilot research: 2013 -2015) at Moscow State Institute of Foreign Relations (University), Russian Federation. Within this period, English language instructors designed and elaborated the three-semester course of GR. Methodologically the course design was based on elaboration model with the special focus on conceptual elaboration sequence and theoretical elaboration sequence. The course designers faced difficulties in concept selection and theoretical elaboration sequence. To improve the results and eliminate the problems with concept selection, a new, corpus-based approach was worked out. The computer-based tool WordSmith 6.0 was used with the aim to build a model of key concept selection. The corpus of GR English texts consisted of 1 million words (the study corpus). The approach was based on measuring effect size, i.e. the percent difference of the frequency of a word in the study corpus when compared to that in the reference corpus. The results obtained proved significant improvement in the process of concept selection. The corpus-based model also facilitated theoretical elaboration of teaching materials.

Keywords: corpus-based study, English as the default language, key concepts, measuring effect size, model of key concept selection

Procedia PDF Downloads 265
347 OPEN-EmoRec-II-A Multimodal Corpus of Human-Computer Interaction

Authors: Stefanie Rukavina, Sascha Gruss, Steffen Walter, Holger Hoffmann, Harald C. Traue

Abstract:

OPEN-EmoRecII is an open multimodal corpus with experimentally induced emotions. In the first half of the experiment, emotions were induced with standardized picture material and in the second half during a human-computer interaction (HCI), realized with a wizard-of-oz design. The induced emotions are based on the dimensional theory of emotions (valence, arousal and dominance). These emotional sequences - recorded with multimodal data (mimic reactions, speech, audio and physiological reactions) during a naturalistic-like HCI-environment one can improve classification methods on a multimodal level. This database is the result of an HCI-experiment, for which 30 subjects in total agreed to a publication of their data including the video material for research purposes. The now available open corpus contains sensory signal of: video, audio, physiology (SCL, respiration, BVP, EMG Corrugator supercilii, EMG Zygomaticus Major) and mimic annotations.

Keywords: open multimodal emotion corpus, annotated labels, intelligent interaction

Procedia PDF Downloads 372
346 Compilation and Statistical Analysis of an Arabic-English Legal Corpus in Sketch Engine

Authors: C. Brierley, H. El-Farahaty, A. Farhan

Abstract:

The Leeds Parallel Corpus of Arabic-English Constitutions is a parallel corpus for the Arabic legal domain. Analysis of legal language via Corpus Linguistics techniques is an important development. In legal proceedings, a corpus-based approach to disambiguating meaning is set to replace the dictionary as an interpretative tool, and legal scholarship in the States is now attuned to the potential for Text Analytics over vast quantities of text-based legal material, following the business and medical industries. This trend is reflected in Europe: the interdisciplinary research group in Computer Assisted Legal Linguistics mines big data collections of legal and non-legal texts to analyse: legal interpretations; legal discourse; the comprehensibility of legal texts; conflict resolution; and linguistic human rights. This paper focuses on ‘dignity’ as an important aspect of the overarching concept of human rights in current constitutions across the Arab world. We have compiled a parallel, Arabic-English raw text corpus (169,861 Arabic words and 205,893 English words) from reputable websites such as the World Intellectual Property Organisation and CONSTITUTE, and uploaded and queried our corpus in Sketch Engine. Our most challenging task was sentence-level alignment of Arabic-English data. This entailed manual intervention to ensure correspondence on a one-to-many basis since Arabic sentences differ from English in length and punctuation. We have searched for morphological variants of ‘dignity’ (رامة ك, karāma) in the Arabic data and inspected their English translation equivalents. The term occurs most frequently in the Sudanese constitution (10 instances), and not at all in the constitution of Palestine. Its most frequent collocate, determined via the logDice statistic in Sketch Engine, is ‘human’ as in ‘human dignity’.

Keywords: Arabic constitution, corpus-based legal linguistics, human rights, parallel Arabic-English legal corpora

Procedia PDF Downloads 139
345 Collaborative Stylistic Group Project: A Drama Practical Analysis Application

Authors: Omnia F. Elkommos

Abstract:

In the course of teaching stylistics to undergraduate students of the Department of English Language and Literature, Faculty of Arts and Humanities, the linguistic tool kit of theories comes in handy and useful for the better understanding of the different literary genres: Poetry, drama, and short stories. In the present paper, a model of teaching of stylistics is compiled and suggested. It is a collaborative group project technique for use in the undergraduate diverse specialisms (Literature, Linguistics and Translation tracks) class. Students initially are introduced to the different linguistic tools and theories suitable for each literary genre. The second step is to apply these linguistic tools to texts. Students are required to watch videos performing the poems or play, for example, and search the net for interpretations of the texts by other authorities. They should be using a template (prepared by the researcher) that has guided questions leading students along in their analysis. Finally, a practical analysis would be written up using the practical analysis essay template (also prepared by the researcher). As per collaborative learning, all the steps include activities that are student-centered addressing differentiation and considering their three different specialisms. In the process of selecting the proper tools, the actual application and analysis discussion, students are given tasks that request their collaboration. They also work in small groups and the groups collaborate in seminars and group discussions. At the end of the course/module, students present their work also collaboratively and reflect and comment on their learning experience. The module/course uses a drama play that lends itself to the task: ‘The Bond’ by Amy Lowell and Robert Frost. The project results in an interpretation of its theme, characterization and plot. The linguistic tools are drawn from pragmatics, and discourse analysis among others.

Keywords: applied linguistic theories, collaborative learning, cooperative principle, discourse analysis, drama analysis, group project, online acting performance, pragmatics, speech act theory, stylistics, technology enhanced learning

Procedia PDF Downloads 117
344 Corpus-Assisted Study of Gender Related Tiger Metaphors in the Chinese Context

Authors: Na Xiao

Abstract:

Animal metaphors have many different connotations, ranging from loving emotions to derogatory epithets, but gender expressions using animal metaphors are often imbalanced. Generally, animal metaphors related to females tend to be negative. Little known about the reasons for the negative expressions of animal female metaphors in Chinese contexts still have not been quantified. The Modern Chinese Corpus at the Center for Chinese Linguistics at Peking University (CCL Corpus) provided the data for this research, which aims to identify the influencing variables of gender differences in the description of animal metaphors mapping humans in Chinese by observing the percentage of "tiger" metaphor, which is based on the conceptual metaphor theory. A quantitative research method was used in this study to statistically examine the gender attitude percentage of the "tiger" metaphor using corpus data. This study has proved that the tiger metaphors associated with humans in the Chinese context tend to be negative. Importantly, this study has also shown that the high proportion of tiger metaphorical idioms is what causes the high proportion of negative tiger metaphors that are related to women. This finding can be used as crucial information for future studies on other gender-related animal metaphorical idioms and can offer additional insights for understanding trends in other animal metaphors.

Keywords: Chinese, CCL corpus, gender differences, metaphorical idioms, tigers

Procedia PDF Downloads 63
343 Redundancy in Malay Morphology: School Grammar versus Corpus Grammar

Authors: Zaharani Ahmad, Nor Hashimah Jalaluddin

Abstract:

The aim of this paper is to examine and identify the issue of linguistic redundancy in two competing grammars of Malay, namely the school grammar and the corpus grammar. The former is a normative grammar which is formally and prescriptively taught in the classroom, whereas the latter is a descriptive grammar that is informally acquired and mastered by the students as native speakers of the language outside the classroom. Corpus grammar is depicted based on its actual used in natural occurring texts, as attested in the corpus. It is observed that the grammar taught in schools is incompatible with the grammar used in the corpus. For instance, a noun phrase containing nominal reduplicated form which denotes plurality (i.e. murid-murid ‘students’ which is derived from murid ‘student’) and a modifier categorized as quantifiers (i.e. semua ‘all’, seluruh ‘entire’, and kebanyakan ‘most’) is not acceptable in the school grammar because the formation (i.e. semua murid-murid ‘all the students’ kebanyakan pelajar-pelajar ‘most of the students’) is claimed to be redundant, and redundancy is prohibited in the grammar. Redundancy is generally construed as the property of speech and language by which more information is provided than is precisely required for the message to be understood, so that, if some information is omitted, the remaining information will still be sufficient for the message to be comprehended. Thus, the correct construction to be used is strictly the reduplicated form (i.e. murid-murid ‘students’) or the quantifier plus the root (i.e. semua murid ‘all the students’) with the intention that the grammatical meaning of plural is not repeated. Nevertheless, the so-called redundant form (i.e. kebanyakan pelajar-pelajar ‘most of the students’) is frequently used in the corpus grammar. This study shows that there are a number of redundant forms occur in the morphology of the language, particularly in affixation, reduplication and combination of both. Apparently, the so-called redundancy has grammatical and socio-cultural functions in communication that is to give emphasis and to stress the importance of the information delivered by the speakers or writers.

Keywords: corpus grammar, morphology, redundancy, school grammar

Procedia PDF Downloads 302
342 The Automatisation of Dictionary-Based Annotation in a Parallel Corpus of Old English

Authors: Ana Elvira Ojanguren Lopez, Javier Martin Arista

Abstract:

The aims of this paper are to present the automatisation procedure adopted in the implementation of a parallel corpus of Old English, as well as, to assess the progress of automatisation with respect to tagging, annotation, and lemmatisation. The corpus consists of an aligned parallel text with word-for-word comparison Old English-English that provides the Old English segment with inflectional form tagging (gloss, lemma, category, and inflection) and lemma annotation (spelling, meaning, inflectional class, paradigm, word-formation and secondary sources). This parallel corpus is intended to fill a gap in the field of Old English, in which no parallel and/or lemmatised corpora are available, while the average amount of corpus annotation is low. With this background, this presentation has two main parts. The first part, which focuses on tagging and annotation, selects the layouts and fields of lexical databases that are relevant for these tasks. Most information used for the annotation of the corpus can be retrieved from the lexical and morphological database Nerthus and the database of secondary sources Freya. These are the sources of linguistic and metalinguistic information that will be used for the annotation of the lemmas of the corpus, including morphological and semantic aspects as well as the references to the secondary sources that deal with the lemmas in question. Although substantially adapted and re-interpreted, the lemmatised part of these databases draws on the standard dictionaries of Old English, including The Student's Dictionary of Anglo-Saxon, An Anglo-Saxon Dictionary, and A Concise Anglo-Saxon Dictionary. The second part of this paper deals with lemmatisation. It presents the lemmatiser Norna, which has been implemented on Filemaker software. It is based on a concordance and an index to the Dictionary of Old English Corpus, which comprises around three thousand texts and three million words. In its present state, the lemmatiser Norna can assign lemma to around 80% of textual forms on an automatic basis, by searching the index and the concordance for prefixes, stems and inflectional endings. The conclusions of this presentation insist on the limits of the automatisation of dictionary-based annotation in a parallel corpus. While the tagging and annotation are largely automatic even at the present stage, the automatisation of alignment is pending for future research. Lemmatisation and morphological tagging are expected to be fully automatic in the near future, once the database of secondary sources Freya and the lemmatiser Norna have been completed.

Keywords: corpus linguistics, historical linguistics, old English, parallel corpus

Procedia PDF Downloads 162
341 Tagging a corpus of Media Interviews with Diplomats: Challenges and Solutions

Authors: Roberta Facchinetti, Sara Corrizzato, Silvia Cavalieri

Abstract:

Increasing interconnection between data digitalization and linguistic investigation has given rise to unprecedented potentialities and challenges for corpus linguists, who need to master IT tools for data analysis and text processing, as well as to develop techniques for efficient and reliable annotation in specific mark-up languages that encode documents in a format that is both human and machine-readable. In the present paper, the challenges emerging from the compilation of a linguistic corpus will be taken into consideration, focusing on the English language in particular. To do so, the case study of the InterDiplo corpus will be illustrated. The corpus, currently under development at the University of Verona (Italy), represents a novelty in terms both of the data included and of the tag set used for its annotation. The corpus covers media interviews and debates with diplomats and international operators conversing in English with journalists who do not share the same lingua-cultural background as their interviewees. To date, this appears to be the first tagged corpus of international institutional spoken discourse and will be an important database not only for linguists interested in corpus analysis but also for experts operating in international relations. In the present paper, special attention will be dedicated to the structural mark-up, parts of speech annotation, and tagging of discursive traits, that are the innovational parts of the project being the result of a thorough study to find the best solution to suit the analytical needs of the data. Several aspects will be addressed, with special attention to the tagging of the speakers’ identity, the communicative events, and anthropophagic. Prominence will be given to the annotation of question/answer exchanges to investigate the interlocutors’ choices and how such choices impact communication. Indeed, the automated identification of questions, in relation to the expected answers, is functional to understand how interviewers elicit information as well as how interviewees provide their answers to fulfill their respective communicative aims. A detailed description of the aforementioned elements will be given using the InterDiplo-Covid19 pilot corpus. The data yielded by our preliminary analysis of the data will highlight the viable solutions found in the construction of the corpus in terms of XML conversion, metadata definition, tagging system, and discursive-pragmatic annotation to be included via Oxygen.

Keywords: spoken corpus, diplomats’ interviews, tagging system, discursive-pragmatic annotation, english linguistics

Procedia PDF Downloads 144
340 Statistical Comparison of Machine and Manual Translation: A Corpus-Based Study of Gone with the Wind

Authors: Yanmeng Liu

Abstract:

This article analyzes and compares the linguistic differences between machine translation and manual translation, through a case study of the book Gone with the Wind. As an important carrier of human feeling and thinking, the literature translation poses a huge difficulty for machine translation, and it is supposed to expose distinct translation features apart from manual translation. In order to display linguistic features objectively, tentative uses of computerized and statistical evidence to the systematic investigation of large scale translation corpora by using quantitative methods have been deployed. This study compiles bilingual corpus with four versions of Chinese translations of the book Gone with the Wind, namely, Piao by Chunhai Fan, Piao by Huairen Huang, translations by Google Translation and Baidu Translation. After processing the corpus with the software of Stanford Segmenter, Stanford Postagger, and AntConc, etc., the study analyzes linguistic data and answers the following questions: 1. How does the machine translation differ from manual translation linguistically? 2. Why do these deviances happen? This paper combines translation study with the knowledge of corpus linguistics, and concretes divergent linguistic dimensions in translated text analysis, in order to present linguistic deviances in manual and machine translation. Consequently, this study provides a more accurate and more fine-grained understanding of machine translation products, and it also proposes several suggestions for machine translation development in the future.

Keywords: corpus-based analysis, linguistic deviances, machine translation, statistical evidence

Procedia PDF Downloads 103
339 Theater Metaphor in Event Quantification: A Corpus Study

Authors: Zhuo Jing-Schmidt, Jun Lang

Abstract:

Numeral classifiers are common in Asian languages. Research on numeral classifiers primarily focuses on noun classifiers that quantify and individuate nominal referents. There is a scarcity of research on event quantification using verb classifiers. This study aims to understand the semantic and conceptual basis of event quantification in Chinese. From a usage-based Construction Grammar perspective, this study presents a corpus analysis of event quantification in Chinese. Drawing on a large balanced corpus of contemporary Chinese, we analyze 667 NOUN col-lexemes totaling 31136 tokens of a productive numeral classifier construction in Chinese. Using collostructional analysis of the collexemes, the results show that the construction quantifies and classifies dramatic events using a theater-based conceptual metaphor. We argue that the usage patterns reflect the cultural entrenchment of theater as in Chinese conceptualization and the construal of theatricality in linguistic expression. The study has implications for cognitive semantics and construction grammar.

Keywords: event quantification, classifier, corpus, metaphor

Procedia PDF Downloads 33