Search results for: Corpus interlanguage analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 26888

Search results for: Corpus interlanguage analysis

26888 Morphological Analysis of English L1-Persian L2 Adult Learners’ Interlanguage: From the Perspective of SLA Variation

Authors: Maassoumeh Bemani Naeini

Abstract:

Studies on interlanguage have long been engaged in describing the phenomenon of variation in SLA. Pursuing the same goal and particularly addressing the role of linguistic features, this study describes the use of Persian morphology in the interlanguage of two adult English-speaking learners of Persian L2. Taking the general approach of a combination of contrastive analysis, error analysis and interlanguage analysis, this study focuses on the identification and prediction of some possible instances of transfer from English L1 to Persian L2 across six elicitation tasks aiming to investigate whether any of contextual features may variably influence the learners’ order of morpheme accuracy in the areas of copula, possessives, articles, demonstratives, plural form, personal pronouns, and genitive cases.  Results describe the existence of task variation in the interlanguage system of Persian L2 learners.

Keywords: English L1, Interlanguage Analysis, Persian L2, SLA variation

Procedia PDF Downloads 288
26887 Passive Voice in SLA: Armenian Learners’ Case Study

Authors: Emma Nemishalyan

Abstract:

It is believed that learners’ mother tongue (L1 hereafter) has a huge impact on their second language acquisition (L2 hereafter). This hypothesis has been exposed to both positive and negative criticism. Based on research results of a wide range of learners’ corpora (Chinese, Japanese, Spanish among others) the hypothesis has either been proved or disproved. However, no such study has been conducted on the Armenian learners. The aim of this paper is to understand the implication of the hypothesis on the Armenian learners’ corpus in terms of the use of the passive voice. To this end, the method of Contrastive Interlanguage Analysis (hereafter CIA) has been used on native speakers’ corpus (Louvain Corpus of Native English Essays (LOCNESS)) and Armenian learners’ corpus which has been compiled by me in compliance with International Corpus of Learner English (ICLE) guidelines. CIA compares the interlanguage (the language produced by learners) with the one produced by native speakers. With the help of this method, it is possible not only to highlight the mistakes that learners make, but also to underline the under or overuses. The choice of the grammar issue (passive voice) is conditioned by the fact that typologically Armenian and English are drastically different as they belong to different branches. Moreover, the passive voice is considered to be one of the most problematic grammar topics to be acquired by learners of the English language. Based on this difference, we hypothesized that Armenian learners would either overuse or underuse some types of the passive voice. With the help of Lancsbox software, we have identified the frequency rates of passive voice usage in LOCNESS and Armenian learners’ corpus to understand whether the latter have the same usage pattern of the passive voice as the native speakers. Secondly, we have identified the types of the passive voice used by the Armenian leaners trying to track down the reasons in their mother tongue. The results of the study showed that Armenian learners underused the passive voices in contrast to native speakers. Furthermore, the hypothesis that learners’ L1 has an impact on learners’ L2 acquisition and production was proved.

Keywords: corpus linguistics, applied linguistics, second language acquisition, corpus compilation

Procedia PDF Downloads 55
26886 Interlanguage Acquisition of a Postposition ‘e’ in Korean: Analysis of the Korean Novice Learners’ Output

Authors: Eunjung Lee

Abstract:

This study aims to analyze the sentences generated by the beginners who learn ‘e,’ a postposition in Korean and to find out the regularity of learners’ interlanguage upon investigating the usages of ‘e’ that appears by meanings and functions in their interlanguage, and conditions that ‘e’ is used. This study was conducted with mainly two assumptions; first, the learner’s language has the specific type of interlanguage; and second, there is the regularity of interlanguage when students produce ‘e’ under the specific conditions. Learners’ output has various values and can be used as the useful data to understand interlanguage. Therefore, all the sentences containing a postposition ‘e’ by English speaking learners were searched in ‘Learners’ corpus sharing center in The National Institute of Korean Language’ in Korea, and the data were collected upon limiting the levels of learners with Level 1 and 2. 789 sentences that were used with ‘e’ were selected as the final subjects of the analysis. First, to understand the environmental characteristics to be used with a postposition, ‘e’ after summarizing 13 meaning and functions of ‘e’ appeared in three books of Korean dictionary that summarized the Korean grammar, 1) meaning function of ‘e’ that were used in each sentence was classified; 2) the nouns that were combined with ‘e,’ keywords of the sentences, and the characteristics of modifiers, linkers, and predicates appeared in front of ‘e’ were analyzed; 3) the regularity by the novice learners’ meaning and functions were reviewed; and 4) the differences of the regularity by level 1 and 2 learners’ meaning and functions were found. Upon the study results, the novice learners showed 1) they used the nouns related to ‘time(시간), before(전), after(후), next(다음), the next(그다음), then(때), day of the week(요일), and season(계절)’ mainly in front of ‘e’ when they used ‘e’ as the meaning function of time; 2) they used mainly the verbs of ‘go(가다),’ ‘come(오다),’ and ‘go round(다니다)’ as the predicate to match with ‘e’ that was the meaning function of direction and destination; and 3) they used mainly the nouns related to ‘locations or countries’ in front of ‘e,’ a meaning function postposition of ‘place,’ used mainly the verbs ‘be(있다), not be(없다), live(살다), be many(많다)’ after ‘e,’ and ‘i(이) or ka(가)’ was combined mainly in the subject words in case of ‘be(있다), not be(없다)’ or ‘be many(많다),’ and ‘eun(은) or nun(는)’ was combined mainly in the subject words in front of ‘live at’ In addition, 4) they used ‘e’ which indicates ‘cause or reason’ in the form of ‘because( 때문에),’ and 5) used ‘e’ of the subjects as the predicates to match with the predicates such as ‘treat(대하다), like(들다), and catch(걸리다).’ From these results, ‘e’ usage patterns of the Korean novice learners demonstrated very differently by the meaning functions and the learners’ interlanguage regularity could be deducted. However, little difference was found in interlanguage regularity between level 1 and 2. This study has the meaning to try to understand the interlanguage system and regularity in the learners’ acquisition process of postposition ‘e’ and this can be utilized to lessen their errors.

Keywords: interlanguage, interlagnage anaylsis, postposition ‘e’, Korean acquisition

Procedia PDF Downloads 100
26885 Interlanguage Pragmatics Instruction: Evidence from EFL Teachers

Authors: Asma Ben Abdallah

Abstract:

Interlanguage Pragmatics (ILP) Instruction has brought a lot of enlightenment for Foreign Language Teaching and has secured itself a deserved position in SLA research. In the Tunisian context, ILP instruction remains less explored for academics and educational practitioners. In our experience as teachers, both at secondary school and at university levels, the instruction and assessment of pragmatics seem to be contentious. This paper firstly introduces the theoretical models of Interlanguage pragmatics Instruction and focuses on their implications for foreign language teaching. This study builds on the work of Ben Abdallah (2015) that investigated the effects of pragmatic Instruction on Tunisian EFL Learners where pragmatic Instruction has been approached from the perspective of students and their learning strategies. The data for the present study, however, come from Tunisian EFL teachers by investigating their pragmatics practices and their perceptions of pragmatic instruction. The findings indicated that EFL teachers have pragmatic awareness; yet, their reflections revealed that their awareness was mostly on theoretical pragmatic knowledge, and not explicitly brought into practical pragmatic applications. The paper concludes by promoting pragmatics instruction with the suggestion that EFL teachers should teach pragmatics in class.

Keywords: interlanguage pragmatics theory, pragmatics, pragmatic instruction, SLA

Procedia PDF Downloads 246
26884 Error Analysis of English Inflection among Thai University Students

Authors: Suwaree Yordchim, Toby J. Gibbs

Abstract:

The linguistic competence of Thai university students majoring in Business English was examined in the context of knowledge of English language inflection, and also various linguistic elements. Errors analysis was applied to the results of the testing. Levels of errors in inflection, tense and linguistic elements were shown to be significantly high for all noun, verb and adjective inflections. Findings suggest that students do not gain linguistic competence in their use of English language inflection, because of interlanguage interference. Implications for curriculum reform and treatment of errors in the classroom are discussed.

Keywords: interlanguage, error analysis, inflection, second language acquisition, Thai students

Procedia PDF Downloads 436
26883 A Longitudinal Case Study of Greek as a Second Language

Authors: M. Vassou, A. Karasimos

Abstract:

A primary concern in the field of Second Language Acquisition (SLA) research is to determine the innate mechanisms of second language learning and acquisition through the systematic study of a learner's interlanguage. Errors emerge while a learner attempts to communicate using the target-language and can be seen either as the observable linguistic product of the latent cognitive and language process of mental representations or as an indispensable learning mechanism. Therefore, the study of the learner’s erroneous forms may depict the various strategies and mechanisms that take place during the language acquisition process resulting in deviations from the target-language norms and difficulties in communication. Mapping the erroneous utterances of a late adult learner in the process of acquiring Greek as a second language constitutes one of the main aims of this study. For our research purposes, we created an error-tagged learner corpus composed of the participant’s written texts produced throughout a period of a 4- year instructed language acquisition. Error analysis and interlanguage theory constitute the methodological and theoretical framework, respectively. The research questions pertain to the learner's most frequent errors per linguistic category and per year as well as his choices concerning the Greek Article System. According to the quantitative analysis of the data, the most frequent errors are observed in the categories of the stress system and syntax, whereas a significant fluctuation and/or gradual reduction throughout the 4 years of instructed acquisition indicate the emergence of developmental stages. The findings with regard to the article usage bespeak fossilization of erroneous structures in certain contexts. In general, our results point towards the existence and further development of an established learner’s (inter-) language system governed not only by mother- tongue and target-language influences but also by the learner’s assumptions and set of rules as the result of a complex cognitive process. It is expected that this study will contribute not only to the knowledge in the field of Greek as a second language and SLA generally, but it will also provide an insight into the cognitive mechanisms and strategies developed by multilingual learners of late adulthood.

Keywords: Greek as a second language, error analysis, interlanguage, late adult learner

Procedia PDF Downloads 102
26882 Error Analysis: Examining Written Errors of English as a Second Language (ESL) Spanish Speaking Learners

Authors: Maria Torres

Abstract:

After the acknowledgment of contrastive analysis, Pit Coder’s establishment of error analysis revolutionized the way instructors analyze and examine students’ writing errors. One question that relates to error analysis with speakers of a first language, in this case, Spanish, who are learning a second language (English), is the type of errors that these learners make along with the causes of these errors. Many studies have looked at the way the native tongue influences second language acquisition, but this method does not take into account other possible sources of students’ errors. This paper examines writing samples from an advanced ESL class whose first language is Spanish at non-profit organization, Learning Quest Stanislaus Literacy Center. Through error analysis, errors in the students’ writing were identified, described, and classified. The purpose of this paper was to discover the type and origin of their errors which generated appropriate treatments. The results in this paper show that the most frequent errors in the advanced ESL students’ writing pertain to interlanguage and a small percentage from an intralanguage source. Lastly, the least type of errors were ones that originate from negative transfer. The results further solidify the idea that there are other errors and sources of errors to account for rather than solely focusing on the difference between the students’ mother and target language. This presentation will bring to light some strategies and techniques that address the issues found in this research. Taking into account the amount of error pertaining to interlanguage, an ESL teacher should provide metalinguistic awareness of the students’ errors.

Keywords: error analysis, ESL, interlanguage, intralangauge

Procedia PDF Downloads 272
26881 Semantic Preference across Research Articles: A Corpus-Based Study of Adjectives in English

Authors: Valdênia Carvalho e Almeida

Abstract:

The goal of the present study is to investigate the semantic preference of the most frequent adjectives in research articles through a corpus-based analysis of texts published in journals in Applied Linguistics (AL). The corpus used in this study contains texts published in the period from 2014 to 2018 in the three journals: Language Learning and Technology; English for Academic Purposes, and TESOL Quaterly, totaling more than one million words. A corpus-based analysis was carried out on the corpus to identify the most frequent adjectives that co-occurred in the three journals. By observing the concordance lines of the adjectives and analyzing the words they associated with, the semantic preferences of each adjective were determined. Later, the AL corpus analysis was compared to the investigation of the same adjectives in a corpus of Chemistry. This second part of the study aimed to identify possible differences and similarities between the two corpora in relation to the use of the adjectives in research articles from both areas. The results show that there are some preferences which seem to be closely related not only to the academic genre of the texts but also to the specific domain of the discipline and, to a lesser extent, to the context of research in each journal. This research illustrates a possible contribution of Corpus Linguistics to explore the concept of semantic preference in more detail, considering the complex nature of the phenomenon.

Keywords: applied linguistics, corpus linguistics, chemistry, research article, semantic preference

Procedia PDF Downloads 152
26880 Saudi Twitter Corpus for Sentiment Analysis

Authors: Adel Assiri, Ahmed Emam, Hmood Al-Dossari

Abstract:

Sentiment analysis (SA) has received growing attention in Arabic language research. However, few studies have yet to directly apply SA to Arabic due to lack of a publicly available dataset for this language. This paper partially bridges this gap due to its focus on one of the Arabic dialects which is the Saudi dialect. This paper presents annotated data set of 4700 for Saudi dialect sentiment analysis with (K= 0.807). Our next work is to extend this corpus and creation a large-scale lexicon for Saudi dialect from the corpus.

Keywords: Arabic, sentiment analysis, Twitter, annotation

Procedia PDF Downloads 591
26879 A Corpus-Based Analysis of Japanese Learners' English Modal Auxiliary Verb Usage in Writing

Authors: S. Nakayama

Abstract:

For non-native English speakers, using English modal auxiliary verbs appropriately can be among the most challenging tasks. This research sought to identify differences in modal verb usage between Japanese non-native English speakers (JNNS) and native speakers (NS) from two different perspectives: frequency of use and distribution of verb phrase structures (VPS) where modal verbs occur. This study can contribute to the identification of JNNSs' interlanguage with regard to modal verbs; the main aim is to make a suggestion for the improvement of teaching materials as well as to help language teachers to be able to teach modal verbs in a way that is helpful for learners. To address the primary question in this study, usage of nine central modals (‘can’, ‘could’, ‘may’, ‘might’, ‘shall’, ‘should’, ‘will’, ‘would’, and ‘must’) by JNNS was compared with that by NSs in the International Corpus Network of Asian Learners of English (ICNALE). This corpus is one of the largest freely-available corpora focusing on Asian English learners’ language use. The ICNALE corpus consists of four modules: ‘Spoken Monologue’, ‘Spoken Dialogue’, ‘Written Essays’, and ‘Edited Essays’. Among these, this research adopted the ‘Written Essays’ module only, which is the set of 200-300 word essays and contains approximately 1.3 million words in total. Frequency analysis revealed gaps as well as similarities in frequency order. Specifically, both JNNSs and NSs used ‘can’ with the most frequency, followed by ‘should’ and ‘will’; however, usage of all the other modals except for ‘shall’ was not identical to each other. A log-likelihood test uncovered JNNSs’ overuse of ‘can’ and ‘must’ as well as their underuse of ‘will’ and ‘would’. VPS analysis revealed that JNNSs used modal verbs in a relatively narrow range of VPSs as compared to NSs. Results showed that JNNSs used most of the modals with bare infinitives or the passive voice only whereas NSs used the modals in a wide range of VPSs including the progressive construction and the perfect aspect, both of which were the structures where JNNSs rarely used the modals. Results of frequency analysis suggest that language teachers or teaching materials should explain other modality items so that learners can avoid relying heavily on certain modals and have a wide range of lexical items to reflect their feelings more accurately. Besides, the underused modals should be more stressed in the classroom because they are members of epistemic modals, which allow us to not only interject our views into propositions but also build a relationship with readers. As for VPSs, teaching materials should present more examples of the modals occurring in a wide range of VPSs to help learners to be able to express their opinions from a variety of viewpoints.

Keywords: corpus linguistics, Japanese learners of English, modal auxiliary verbs, International Corpus Network of Asian Learners of English

Procedia PDF Downloads 105
26878 Query in Grammatical Forms and Corpus Error Analysis

Authors: Katerina Florou

Abstract:

Two decades after coined the term "learner corpora" as collections of texts created by foreign or second language learners across various language contexts, and some years following suggestion to incorporate "focusing on form" within a Task-Based Learning framework, this study aims to explore how learner corpora, whether annotated with errors or not, can facilitate a focus on form in an educational setting. Argues that analyzing linguistic form serves the purpose of enabling students to delve into language and gain an understanding of different facets of the foreign language. This same objective is applicable when analyzing learner corpora marked with errors or in their raw state, but in this scenario, the emphasis lies on identifying incorrect forms. Teachers should aim to address errors or gaps in the students' second language knowledge while they engage in a task. Building on this recommendation, we compared the written output of two student groups: the first group (G1) employed the focusing on form phase by studying a specific aspect of the Italian language, namely the past participle, through examples from native speakers and grammar rules; the second group (G2) focused on form by scrutinizing their own errors and comparing them with analogous examples from a native speaker corpus. In order to test our hypothesis, we created four learner corpora. The initial two were generated during the task phase, with one representing each group of students, while the remaining two were produced as a follow-up activity at the end of the lesson. The results of the first comparison indicated that students' exposure to their own errors can enhance their grasp of a grammatical element. The study is in its second stage and more results are to be announced.

Keywords: Corpus interlanguage analysis, task based learning, Italian language as F1, learner corpora

Procedia PDF Downloads 19
26877 A Corpus-Assisted Discourse Analysis of Adjectival Collocation of the Word 'Education' in the American Context

Authors: Ngan Nguyen

Abstract:

The study analyses adjectives collocating with the word ‘education’ in the American language of the Corpus of Global Web-based English using a combination of corpus linguistic and discourse analytical methods to examine not only language patterns but also social political ideologies around the topic. Significant conclusions are deduced: (1) there are a large number of adjectival collocates of the word education which have been identified and classified into four categories representing four different aspects of education: level, quality, forms and types of education; (2) education, as in combination with three first categories, carries the meaning as the act and process of teaching and learning while with the last category having the meaning of a particular kind of teaching or training; (3) higher education is the topic that gains most concerns from the American public; (4) five most significant ideologies are discovered from the corpus: higher education associates with financial affairs, higher education is an industry, monetary policy of the government on higher education, people require greater accessibility to higher education and people value higher education. The study contributes to the field of developing meanings of words through corpus analysis and the field of discourse analysis.

Keywords: adjectival collocation, American context, corpus linguistics, discourse analysis, education

Procedia PDF Downloads 300
26876 A Corpus-Based Discourse Analysis of the Disappearance of MH370 in Malaysia and United Kingdom Newspapers: A Pilot Study

Authors: Theng Theng Ong

Abstract:

This pilot study adopts a corpus-based discourse analysis to explore the construction of Malaysia airline tragedy MH370 in the selected Malaysian and United Kingdom (UK) newspapers. Fairclough’s three-dimensional model is adopted in the study to support the corpus-based analysis. The analysis aims to determine the ways in which Malaysian Airline tragedy MH370 is linguistically defined and constructed in terms of keywords and collocation. The study also seeks to identify the types of discourse that are presented in the news articles. In addition, the differences or similarities in terms of keywords, topics or issues covered by the selected Malaysian and UK news media are examined.

Keywords: corpus, CDA, newspapers, airline tragedies

Procedia PDF Downloads 265
26875 Number Variation of the Personal Pronoun we Used by Chinese English Learners

Authors: Qiong Hu, Ming Yue

Abstract:

Language variation signals the newest usage of language community, which might become the developmental trend of that language. However, language textbooks cannot keep up with these emergent usages. Most Chinese English learners nowadays are still exposed to traditional grammar prescribed in the textbook so that some variational usages cannot be acquired. The personal pronoun we is prescribed as a plural pronoun in the textbook grammar, but its number value is more flexible in actual use. Based on the Chinese Learner English Corpus (CLEC), and with the homemade Friends corpus as reference, the present research explores the number value of the first person pronoun we used by Chinese English learners. With consideration of the subjectivity of we, this paper annotated the number value of all the wes in “we+ PCU (Perception-cognation-utterance) verbs” collocations. Results show that though exposed to traditional textbooks which prescribe the plural reference of we, there still exists some unconventional usage (singular or vague in reference) in the writings of Chinese English learners, which is less frequent than that of the native speeches. Corpus data and results from manual semantic annotation show that this could be due to the impact of formulaic sequence on the learners and the positive transfer from their native language. An improved SLA model of native language, target language and interlanguage is put forward to recognize the existence of variation in second language acquisition, which should be given more attention during teaching.

Keywords: Chinese English learners, number, PCU verbs, Personal pronoun we

Procedia PDF Downloads 326
26874 Combining Corpus Linguistics and Critical Discourse Analysis to Study Power Relations in Hindi Newspapers

Authors: Vandana Mishra, Niladri Sekhar Dash, Jayshree Charkraborty

Abstract:

This present paper focuses on the application of corpus linguistics techniques for critical discourse analysis (CDA) of Hindi newspapers. While Corpus linguistics is the study of language as expressed in corpora (samples) of 'real world' text, CDA is an interdisciplinary approach to the study of discourse that views language as a form of social practice. CDA has mainly been studied from a qualitative perspective. However, we can say that recent studies have begun combining corpus linguistics with CDA in analyzing large volumes of text for the study of existing power relations in society. The corpus under our study is also of a sizable amount (1 million words of Hindi newspaper texts) and its analysis requires an alternative analytical procedure. So, we have combined both the quantitative approach i.e. the use of corpus techniques with CDA’s traditional qualitative analysis. In this context, we have focused on the Keyword Analysis Sorting Concordance Lines of the selected Keywords and calculating collocates of the keywords. We have made use of the Wordsmith Tool for all these analysis. The analysis starts with identifying the keywords in the political news corpus when compared with the main news corpus. The keywords are extracted from the corpus based on their keyness calculated through statistical tests like chi-squared test and log-likelihood test on the frequent words of the corpus. Some of the top occurring keywords are मोदी (Modi), भाजपा (BJP), कांग्रेस (Congress), सरकार (Government) and पार्टी (Political party). This is followed by the concordance analysis of these keywords which generates thousands of lines but we have to select few lines and examine them based on our objective. We have also calculated the collocates of the keywords based on their Mutual Information (MI) score. Both concordance and collocation help to identify lexical patterns in the political texts. Finally, all these quantitative results derived from the corpus techniques will be subjectively interpreted in accordance to the CDA’s theory to examine the ways in which political news discourse produces social and political inequality, power abuse or domination.

Keywords: critical discourse analysis, corpus linguistics, Hindi newspapers, power relations

Procedia PDF Downloads 186
26873 A Preliminary Study for Building an Arabic Corpus of Pair Questions-Texts from the Web: Aqa-Webcorp

Authors: Wided Bakari, Patrce Bellot, Mahmoud Neji

Abstract:

With the development of electronic media and the heterogeneity of Arabic data on the Web, the idea of building a clean corpus for certain applications of natural language processing, including machine translation, information retrieval, question answer, become more and more pressing. In this manuscript, we seek to create and develop our own corpus of pair’s questions-texts. This constitution then will provide a better base for our experimentation step. Thus, we try to model this constitution by a method for Arabic insofar as it recovers texts from the web that could prove to be answers to our factual questions. To do this, we had to develop a java script that can extract from a given query a list of html pages. Then clean these pages to the extent of having a database of texts and a corpus of pair’s question-texts. In addition, we give preliminary results of our proposal method. Some investigations for the construction of Arabic corpus are also presented in this document.

Keywords: Arabic, web, corpus, search engine, URL, question, corpus building, script, Google, html, txt

Procedia PDF Downloads 293
26872 Native Language Identification with Cross-Corpus Evaluation Using Social Media Data: ’Reddit’

Authors: Yasmeen Bassas, Sandra Kuebler, Allen Riddell

Abstract:

Native language identification is one of the growing subfields in natural language processing (NLP). The task of native language identification (NLI) is mainly concerned with predicting the native language of an author’s writing in a second language. In this paper, we investigate the performance of two types of features; content-based features vs. content independent features, when they are evaluated on a different corpus (using social media data “Reddit”). In this NLI task, the predefined models are trained on one corpus (TOEFL), and then the trained models are evaluated on different data using an external corpus (Reddit). Three classifiers are used in this task; the baseline, linear SVM, and logistic regression. Results show that content-based features are more accurate and robust than content independent ones when tested within the corpus and across corpus.

Keywords: NLI, NLP, content-based features, content independent features, social media corpus, ML

Procedia PDF Downloads 100
26871 Language Transfer in Graduate Candidates’ Essays

Authors: Erika Martínez Lugo

Abstract:

Candidates to some graduate studies are asked to write essays in English to prove their competence to write essays and to do it in English. In the present study, language transfer (LT) in 15 written essays is identified, documented, analyzed, and classified. The essays were written in 2019, and the graduate program is a Masters in Modern Languages in a North-Western Mexican city border with USA. This study is of interest since it is important to determine whether or not some errors have been fossilized and have become mistakes, or if it is part of the candidates’ interlanguage. The results show that most language transfer is negative and syntactic, where the influence of candidates L1 (Spanish) is evident in their use of L2 (English).

Keywords: language transfer, cross-linguistic influence, interlanguage, error vs mistake

Procedia PDF Downloads 147
26870 Interlingual Interference in Students’ Writing

Authors: Zakaria Khatraoui

Abstract:

Interlanguage has transcendentally capitalized its central role over a considerable metropolitan landscape. Either academically driven or pedagogically oriented, Interlanguage has principally floated as important than ever before. It academically probes theoretical and linguistic issues in the turf and further malleably flows from idea to reality to vindicate a bridging philosophy between theory and educational rehearsal. Characteristically, the present research grants a prolifically developed theoretical framework that is conversely sustained by empirical teaching practices, along with teasing apart the narrowly confined implementation. The focus of this interlingual study is placed stridently on syntactic errors projected in students’ writing as performance. To attain this endeavor, the paper appropriates qualitatively a plethora of focal methodological choices sponsored by a solid design. The steadily undeniable ipso facto to be examined is the creative sense of syntactic errors unequivocally endorsed by the tangible dominance of cognitively intralingual errors over linguistically interlingual ones. Subsequently, this paper attempts earnestly to highlight transferable implications worth indicating both theoretical and pedagogically professional principles. In particular, results are fundamentally relative to the scholarly community in a multidimensional sense to recommend actions of educational value.

Keywords: interlanguage, interference, error, writing

Procedia PDF Downloads 29
26869 Specialized Translation Teaching Strategies: A Corpus-Based Approach

Authors: Yingying Ding

Abstract:

This study presents a methodology of specialized translation with the objective of helping teachers to improve the strategies in teaching translation. In order to allow students to acquire skills to translate specialized texts, they need to become familiar with the semantic and syntactic features of source texts and target texts. The aim of our study is to use a corpus-based approach in the teaching of specialized translation between Chinese and Italian. This study proposes to construct a specialized Chinese - Italian comparable corpus that consists of 50 economic contracts from the domain of food. With the help of AntConc, we propose to compile a comparable corpus in for translation teaching purposes. This paper attempts to provide insight into how teachers could benefit from comparable corpus in the teaching of specialized translation from Italian into Chinese and through some examples of passive sentences how students could learn to apply different strategies for translating appropriately the voice.

Keywords: contrastive studies, specialised translation, corpus-based approach, teaching

Procedia PDF Downloads 339
26868 Grammatically Coded Corpus of Spoken Lithuanian: Methodology and Development

Authors: L. Kamandulytė-Merfeldienė

Abstract:

The paper deals with the main issues of methodology of the Corpus of Spoken Lithuanian which was started to be developed in 2006. At present, the corpus consists of 300,000 grammatically annotated word forms. The creation of the corpus consists of three main stages: collecting the data, the transcription of the recorded data, and the grammatical annotation. Collecting the data was based on the principles of balance and naturality. The recorded speech was transcribed according to the CHAT requirements of CHILDES. The transcripts were double-checked and annotated grammatically using CHILDES. The development of the Corpus of Spoken Lithuanian has led to the constant increase in studies on spontaneous communication, and various papers have dealt with a distribution of parts of speech, use of different grammatical forms, variation of inflectional paradigms, distribution of fillers, syntactic functions of adjectives, the mean length of utterances.

Keywords: CHILDES, corpus of spoken Lithuanian, grammatical annotation, grammatical disambiguation, lexicon, Lithuanian

Procedia PDF Downloads 206
26867 Theater Metaphor in Event Quantification: A Corpus Study

Authors: Zhuo Jing-Schmidt, Jun Lang

Abstract:

Numeral classifiers are common in Asian languages. Research on numeral classifiers primarily focuses on noun classifiers that quantify and individuate nominal referents. There is a scarcity of research on event quantification using verb classifiers. This study aims to understand the semantic and conceptual basis of event quantification in Chinese. From a usage-based Construction Grammar perspective, this study presents a corpus analysis of event quantification in Chinese. Drawing on a large balanced corpus of contemporary Chinese, we analyze 667 NOUN col-lexemes totaling 31136 tokens of a productive numeral classifier construction in Chinese. Using collostructional analysis of the collexemes, the results show that the construction quantifies and classifies dramatic events using a theater-based conceptual metaphor. We argue that the usage patterns reflect the cultural entrenchment of theater as in Chinese conceptualization and the construal of theatricality in linguistic expression. The study has implications for cognitive semantics and construction grammar.

Keywords: event quantification, classifier, corpus, metaphor

Procedia PDF Downloads 39
26866 Corporate Cautionary Statement: A Genre of Professional Communication

Authors: Chie Urawa

Abstract:

Cautionary statements or disclaimers in corporate annual reports need to be carefully designed because clear cautionary statements may protect a company in the case of legal disputes and may undermine positive impressions. This study compares the language of cautionary statements using two corpora, Sony’s cautionary statement corpus (S-corpus) and Panasonic’s cautionary statement corpus (P-corpus), illustrating the differences and similarities in relation to the use of meaningful cautionary statements and critically analyzing why practitioners use the way. The findings describe the distinct differences between the two companies in the presentation of the risk factors and the way how they make the statements. The word ability is used more for legal protection in S-corpus whereas the word possibility is used more to convey a better impression in P-corpus. The main similarities are identified in the use of lexical words and pronouns, and almost the same wordings for eight years. The findings show how they make the statements unique to the company in the presentation of risk factors, and the characteristics of specific genre of professional communication. Important implications of this study are that more comprehensive approach can be applied in other contexts, and be used by companies to reflect upon their cautionary statements.

Keywords: cautionary statements, corporate annual reports, corpus, risk factors

Procedia PDF Downloads 135
26865 Compilation and Statistical Analysis of an Arabic-English Legal Corpus in Sketch Engine

Authors: C. Brierley, H. El-Farahaty, A. Farhan

Abstract:

The Leeds Parallel Corpus of Arabic-English Constitutions is a parallel corpus for the Arabic legal domain. Analysis of legal language via Corpus Linguistics techniques is an important development. In legal proceedings, a corpus-based approach to disambiguating meaning is set to replace the dictionary as an interpretative tool, and legal scholarship in the States is now attuned to the potential for Text Analytics over vast quantities of text-based legal material, following the business and medical industries. This trend is reflected in Europe: the interdisciplinary research group in Computer Assisted Legal Linguistics mines big data collections of legal and non-legal texts to analyse: legal interpretations; legal discourse; the comprehensibility of legal texts; conflict resolution; and linguistic human rights. This paper focuses on ‘dignity’ as an important aspect of the overarching concept of human rights in current constitutions across the Arab world. We have compiled a parallel, Arabic-English raw text corpus (169,861 Arabic words and 205,893 English words) from reputable websites such as the World Intellectual Property Organisation and CONSTITUTE, and uploaded and queried our corpus in Sketch Engine. Our most challenging task was sentence-level alignment of Arabic-English data. This entailed manual intervention to ensure correspondence on a one-to-many basis since Arabic sentences differ from English in length and punctuation. We have searched for morphological variants of ‘dignity’ (رامة ك, karāma) in the Arabic data and inspected their English translation equivalents. The term occurs most frequently in the Sudanese constitution (10 instances), and not at all in the constitution of Palestine. Its most frequent collocate, determined via the logDice statistic in Sketch Engine, is ‘human’ as in ‘human dignity’.

Keywords: Arabic constitution, corpus-based legal linguistics, human rights, parallel Arabic-English legal corpora

Procedia PDF Downloads 147
26864 Statistical Comparison of Machine and Manual Translation: A Corpus-Based Study of Gone with the Wind

Authors: Yanmeng Liu

Abstract:

This article analyzes and compares the linguistic differences between machine translation and manual translation, through a case study of the book Gone with the Wind. As an important carrier of human feeling and thinking, the literature translation poses a huge difficulty for machine translation, and it is supposed to expose distinct translation features apart from manual translation. In order to display linguistic features objectively, tentative uses of computerized and statistical evidence to the systematic investigation of large scale translation corpora by using quantitative methods have been deployed. This study compiles bilingual corpus with four versions of Chinese translations of the book Gone with the Wind, namely, Piao by Chunhai Fan, Piao by Huairen Huang, translations by Google Translation and Baidu Translation. After processing the corpus with the software of Stanford Segmenter, Stanford Postagger, and AntConc, etc., the study analyzes linguistic data and answers the following questions: 1. How does the machine translation differ from manual translation linguistically? 2. Why do these deviances happen? This paper combines translation study with the knowledge of corpus linguistics, and concretes divergent linguistic dimensions in translated text analysis, in order to present linguistic deviances in manual and machine translation. Consequently, this study provides a more accurate and more fine-grained understanding of machine translation products, and it also proposes several suggestions for machine translation development in the future.

Keywords: corpus-based analysis, linguistic deviances, machine translation, statistical evidence

Procedia PDF Downloads 111
26863 Corpus Stylistics and Multidimensional Analysis for English for Specific Purposes Teaching and Assessment

Authors: Svetlana Strinyuk, Viacheslav Lanin

Abstract:

Academic English has become lingua franca for international scientific community which stimulates universities to introduce English for Specific Purposes (EAP) courses into curriculum. Teaching L2 EAP students might be fulfilled with corpus technologies and digital stylistics. A special software developed to reach the manifold task of teaching, assessing and researching academic writing of L2 students on basis of digital stylistics and multidimensional analysis was created. A set of annotations (style markers) – grammar, lexical and syntactic features most significant of academic writing was built. Contrastive comparison of two corpora “model corpus”, subject domain limited papers published by competent writers in leading academic journals, and “students’ corpus”, subject domain limited papers written by last year students allows to receive data about the features of academic writing underused or overused by L2 EAP student. Both corpora are tagged with a special software created in GATE Developer. Style markers within the framework of research might be replaced depending on the relevance and validity of the result which is achieved from research corpora. Thus, selecting relevant (high frequency) style markers and excluding less relevant, i.e. less frequent annotations, high validity of the model is achieved. Software allows to compare the data received from processing model corpus to students’ corpus and get reports which can be used in teaching and assessment. The less deviation from the model corpus students demonstrates in their writing the higher is academic writing skill acquisition. The research showed that several style markers (hedging devices) were underused by L2 EAP students whereas lexical linking devices were used excessively. A special software implemented into teaching of EAP courses serves as a successful visual aid, makes assessment more valid; it is indicative of the degree of writing skill acquisition, and provides data for further research.

Keywords: corpus technologies in EAP teaching, multidimensional analysis, GATE Developer, corpus stylistics

Procedia PDF Downloads 160
26862 Tagging a corpus of Media Interviews with Diplomats: Challenges and Solutions

Authors: Roberta Facchinetti, Sara Corrizzato, Silvia Cavalieri

Abstract:

Increasing interconnection between data digitalization and linguistic investigation has given rise to unprecedented potentialities and challenges for corpus linguists, who need to master IT tools for data analysis and text processing, as well as to develop techniques for efficient and reliable annotation in specific mark-up languages that encode documents in a format that is both human and machine-readable. In the present paper, the challenges emerging from the compilation of a linguistic corpus will be taken into consideration, focusing on the English language in particular. To do so, the case study of the InterDiplo corpus will be illustrated. The corpus, currently under development at the University of Verona (Italy), represents a novelty in terms both of the data included and of the tag set used for its annotation. The corpus covers media interviews and debates with diplomats and international operators conversing in English with journalists who do not share the same lingua-cultural background as their interviewees. To date, this appears to be the first tagged corpus of international institutional spoken discourse and will be an important database not only for linguists interested in corpus analysis but also for experts operating in international relations. In the present paper, special attention will be dedicated to the structural mark-up, parts of speech annotation, and tagging of discursive traits, that are the innovational parts of the project being the result of a thorough study to find the best solution to suit the analytical needs of the data. Several aspects will be addressed, with special attention to the tagging of the speakers’ identity, the communicative events, and anthropophagic. Prominence will be given to the annotation of question/answer exchanges to investigate the interlocutors’ choices and how such choices impact communication. Indeed, the automated identification of questions, in relation to the expected answers, is functional to understand how interviewers elicit information as well as how interviewees provide their answers to fulfill their respective communicative aims. A detailed description of the aforementioned elements will be given using the InterDiplo-Covid19 pilot corpus. The data yielded by our preliminary analysis of the data will highlight the viable solutions found in the construction of the corpus in terms of XML conversion, metadata definition, tagging system, and discursive-pragmatic annotation to be included via Oxygen.

Keywords: spoken corpus, diplomats’ interviews, tagging system, discursive-pragmatic annotation, english linguistics

Procedia PDF Downloads 153
26861 A Corpus-Based Study on the Styles of Three Translators

Authors: Wang Yunhong

Abstract:

The present paper is preoccupied with the different styles of three translators in their translating a Chinese classical novel Shuihu Zhuan. Based on a parallel corpus, it adopts a target-oriented approach to look into whether and what stylistic differences and shifts the three translations have revealed. The findings show that the three translators demonstrate different styles concerning their word choices and sentence preferences, which implies that identification of recurrent textual patterns may be a basic step for investigating the style of a translator.

Keywords: corpus, lexical choices, sentence characteristics, style

Procedia PDF Downloads 236
26860 How Do L1 Teachers Assess Haitian Immigrant High School Students in Chile?

Authors: Gloria Toledo, Andrea Lizasoain, Leonardo Mena

Abstract:

Immigration has largely increased in Chile in the last 20 years. About 6.6% of our population is foreign, from which 14.3% is Haitian. Haitians are between 15 and 29 years old and have come to Chile escaping from a social crisis. They believe that education and work will help them do better in life. Therefore, rates of Haitian students in the Chilean school system have also increased: there were 3,121 Haitian students enrolled in 2017. This is a challenge for the public school, which takes in young people who must face schooling, social immersion and learning of a second language simultaneously. The linguistic barrier affects both students’ and teachers’ adaptation process, which has an impact on the students’ academic performance and consequent acquisition of Spanish. In order to explore students’ academic performance and interlanguage development, we examined how L1 teachers assess Haitian high school students’ written production in Spanish. With this purpose, teachers were asked to use a specially designed grid to assess correction, accommodation, lexical and analytical complexity, organization and fluency of both Haitian and Chilean students. Parallelly, texts were approached from an error analysis perspective. Results from grids and error analysis were then compared. On the one hand, it has been found that teachers give very little feedback to students apart from scores and grades, which does not contribute to the development of the second language. On the other hand, error analysis has yielded that Haitian students are in a dynamic process of the acquisition of Spanish, which could be enhanced if L1 teacher were aware of the process of interlanguage developmen.

Keywords: assessment, error analysis, grid, immigration, Spanish aquisition, writing

Procedia PDF Downloads 101
26859 Words of Peace in the Speeches of the Egyptian President, Abdulfattah El-Sisi: A Corpus-Based Study

Authors: Mohamed S. Negm, Waleed S. Mandour

Abstract:

The present study aims primarily at investigating words of peace (lexemes of peace) in the formal speeches of the Egyptian president Abdulfattah El-Sisi in a two-year span of time, from 2018 to 2019. This paper attempts to shed light not only on the contextual use of the antonyms, war and peace, but also it underpins quantitative analysis through the current methods of corpus linguistics. As such, the researchers have deployed a corpus-based approach in collecting, encoding, and processing 30 presidential speeches over the stated period (23,411 words and 25,541 tokens in total). Further, semantic fields and collocational networkzs are identified and compared statistically. Results have shown a significant propensity of adopting peace, including its relevant collocation network, textually and therefore, ideationally, at the expense of war concept which in most cases surfaces euphemistically through the noun conflict. The president has not justified the action of war with an honorable cause or a valid reason. Such results, so far, have indicated a positive sociopolitical mindset the Egyptian president possesses and moreover, reveal national and international fair dealing on arising issues.

Keywords: CADS, collocation network, corpus linguistics, critical discourse analysis

Procedia PDF Downloads 114