Search results for: corpus based approach
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 36009

Search results for: corpus based approach

35949 Developing an Intonation Labeled Dataset for Hindi

Authors: Esha Banerjee, Atul Kumar Ojha, Girish Nath Jha

Abstract:

This study aims to develop an intonation labeled database for Hindi. Although no single standard for prosody labeling exists in Hindi, researchers in the past have employed perceptual and statistical methods in literature to draw inferences about the behavior of prosody patterns in Hindi. Based on such existing research and largely agreed upon intonational theories in Hindi, this study attempts to develop a manually annotated prosodic corpus of Hindi speech data, which can be used for training speech models for natural-sounding speech in the future. 100 sentences ( 500 words) each for declarative and interrogative types have been labeled using Praat.

Keywords: speech dataset, Hindi, intonation, labeled corpus

Procedia PDF Downloads 197
35948 Historical Development of Negative Emotive Intensifiers in Hungarian

Authors: Martina Katalin Szabó, Bernadett Lipóczi, Csenge Guba, István Uveges

Abstract:

In this study, an exhaustive analysis was carried out about the historical development of negative emotive intensifiers in the Hungarian language via NLP methods. Intensifiers are linguistic elements which modify or reinforce a variable character in the lexical unit they apply to. Therefore, intensifiers appear with other lexical items, such as adverbs, adjectives, verbs, infrequently with nouns. Due to the complexity of this phenomenon (set of sociolinguistic, semantic, and historical aspects), there are many lexical items which can operate as intensifiers. The group of intensifiers are admittedly one of the most rapidly changing elements in the language. From a linguistic point of view, particularly interesting are a special group of intensifiers, the so-called negative emotive intensifiers, that, on their own, without context, have semantic content that can be associated with negative emotion, but in particular cases, they may function as intensifiers (e.g.borzasztóanjó ’awfully good’, which means ’excellent’). Despite their special semantic features, negative emotive intensifiers are scarcely examined in literature based on large Historical corpora via NLP methods. In order to become better acquainted with trends over time concerning the intensifiers, The exhaustively analysed a specific historical corpus, namely the Magyar TörténetiSzövegtár (Hungarian Historical Corpus). This corpus (containing 3 millions text words) is a collection of texts of various genres and styles, produced between 1772 and 2010. Since the corpus consists of raw texts and does not contain any additional information about the language features of the data (such as stemming or morphological analysis), a large amount of manual work was required to process the data. Thus, based on a lexicon of negative emotive intensifiers compiled in a previous phase of the research, every occurrence of each intensifier was queried, and the results were stored in a separate data frame. Then, basic linguistic processing (POS-tagging, lemmatization etc.) was carried out automatically with the ‘magyarlanc’ NLP-toolkit. Finally, the frequency and collocation features of all the negative emotive words were automatically analyzed in the corpus. Outcomes of the research revealed in detail how these words have proceeded through grammaticalization over time, i.e., they change from lexical elements to grammatical ones, and they slowly go through a delexicalization process (their negative content diminishes over time). What is more, it was also pointed out which negative emotive intensifiers are at the same stage in this process in the same time period. Giving a closer look to the different domains of the analysed corpus, it also became certain that during this process, the pragmatic role’s importance increases: the newer use expresses the speaker's subjective, evaluative opinion at a certain level.

Keywords: historical corpus analysis, historical linguistics, negative emotive intensifiers, semantic changes over time

Procedia PDF Downloads 233
35947 Competition between Verb-Based Implicit Causality and Theme Structure's Influence on Anaphora Bias in Mandarin Chinese Sentences: Evidence from Corpus

Authors: Linnan Zhang

Abstract:

Linguists, as well as psychologists, have shown great interests in implicit causality in reference processing. However, most frequently-used approaches to this issue are psychological experiments (such as eye tracking or self-paced reading, etc.). This research is a corpus-based one and is assisted with statistical tool – software R. The main focus of the present study is about the competition between verb-based implicit causality and theme structure’s influence on anaphora bias in Mandarin Chinese sentences. In Accessibility Theory, it is believed that salience, which is also known as accessibility, and relevance are two important factors in reference processing. Theme structure, which is a special syntactic structure in Chinese, determines the salience of an antecedent on the syntactic level while verb-based implicit causality is a key factor to the relevance between antecedent and anaphora. Therefore, it is a study about anaphora, combining psychology with linguistics. With analysis of the sentences from corpus as well as the statistical analysis of Multinomial Logistic Regression, major findings of the present study are as follows: 1. When the sentence is stated in a ‘cause-effect’ structure, the theme structure will always be the antecedent no matter forward biased verbs or backward biased verbs co-occur; in non-theme structure, the anaphora bias will tend to be the opposite of the verb bias; 2. When the sentence is stated in a ‘effect-cause’ structure, theme structure will not always be the antecedent and the influence of verb-based implicit causality will outweigh that of theme structure; moreover, the anaphora bias will be the same with the bias of verbs. All the results indicate that implicit causality functions conditionally and the noun in theme structure will not be the high-salience antecedent under any circumstances.

Keywords: accessibility theory, anaphora, theme strcture, verb-based implicit causality

Procedia PDF Downloads 198
35946 Linguistic Accessibility and Audiovisual Translation: Corpus Linguistics as a Tool for Analysis

Authors: Juan-Pedro Rica-Peromingo

Abstract:

The important change taking place with respect to the media and the audiovisual world in Europe needs to benefit all populations, in particular those with special needs, such as the deaf and hard-of-hearing population (SDH) and blind and partially-sighted population (AD). This recent interest in the field of audiovisual translation (AVT) can be observed in the teaching and learning of the different modes of AVT in the degree and post-degree courses at Spanish universities, which expand the interest and practice of AVT linguistic accessibility. We present a research project led at the UCM which consists of the compilation of AVT activities for teaching purposes and tries to analyze the creation and reception of SDH and AD: the AVLA Project (Audiovisual Learning Archive), which includes audiovisual materials carried out by the university students on different AVT modes and evaluations from the blind and deaf informants. In this study, we present the materials created by the students. A group of the deaf and blind population has been in charge of testing the student's SDH and AD corpus of audiovisual materials through some questionnaires used to evaluate the students’ production. These questionnaires include information about the reception of the subtitles and the audio descriptions from linguistic and technical points of view. With all the materials compiled in the research project, a corpus with both the students’ production and the recipients’ evaluations is being compiled: the CALING (Corpus de Accesibilidad Lingüística) corpus. Preliminary results will be presented with respect to those aspects, difficulties, and deficiencies in the SDH and AD included in the corpus, specifically with respect to the length of subtitles, the position of the contextual information on the screen, and the text included in the audio descriptions and tone of voice used. These results may suggest some changes and improvements in the quality of the SDH and AD analyzed. In the end, demand for the teaching and learning of AVT and linguistic accessibility at a university level and some important changes in the norms which regulate SDH and AD nationally and internationally will be suggested.

Keywords: audiovisual translation, corpus linguistics, linguistic accessibility, teaching

Procedia PDF Downloads 81
35945 Translating the Gendered Discourse: A Corpus-Based Study of the Chinese Science Fiction The Three Body Problem

Authors: Yi Gu

Abstract:

The Three-Body Problem by Cixin Liu has been a bestseller Chinese Sci-Fi novel for years since 2008. The book was translated into English by Ken Liu in 2014 and won the prestigious 2015 science fiction and fantasy writing Hugo Award, drawing greater attention from wider international communities. The story exposes the horrors of the Chinese Cultural Revolution in the 1960s, in an intriguing narrative for readers at home and abroad. However, without the access to the source text, western readers may not be aware that the original Chinese version of the book is rich in gender-bias. Some Chinese scholars have applied feminist translation theories to their analysis on this book before, based on isolated selected, cherry-picking examples. Thus this paper aims to obtain a more thorough picture of how translators can cope with gender discrimination and reshape the gendered discourse from the source text, by systematically investigating the lexical and syntactic patterns in the translation of Liu’s entire book of 400 pages. The source text and the translation were downloaded into digital files, automatically aligned at paragraph level and then manually post-edited. They were then compiled into a parallel corpus of 114,629 English words and 204,145 Chinese characters using Sketch Engine. Gender-discrimination markers such as the overuse of ‘girl’ to describe an adult woman were searched in the source text, and the alignment made it possible to identify the strategies adopted by the translator to mitigate gender discrimination. The results provide a framework for translators to address gender bias. The study also shows how corpus methods can be used to further research in feminist translation and critical discourse analysis.

Keywords: corpus, discourse analysis, feminist translation, science fiction translation

Procedia PDF Downloads 256
35944 The Use of Corpora in Improving Modal Verb Treatment in English as Foreign Language Textbooks

Authors: Lexi Li, Vanessa H. K. Pang

Abstract:

This study aims to demonstrate how native and learner corpora can be used to enhance modal verb treatment in EFL textbooks in mainland China. It contributes to a corpus-informed and learner-centered design of grammar presentation in EFL textbooks that enhances the authenticity and appropriateness of textbook language for target learners. The linguistic focus is will, would, can, could, may, might, shall, should, must. The native corpus is the spoken component of BNC2014 (hereafter BNCS2014). The spoken part is chosen because pedagogical purpose of the textbooks is communication-oriented. Using the standard query option of CQPweb, 5% of each of the nine modals was sampled from BNCS2014. The learner corpus is the POS-tagged Ten-thousand English Compositions of Chinese Learners (TECCL). All the essays under the 'secondary school' section were selected. A series of five secondary coursebooks comprise the textbook corpus. All the data in both the learner and the textbook corpora are retrieved through the concordance functions of WordSmith Tools (version, 5.0). Data analysis was divided into two parts. The first part compared the patterns of modal verbs in the textbook corpus and BNC2014 with respect to distributional features, semantic functions, and co-occurring constructions to examine whether the textbooks reflect the authentic use of English. Secondly, the learner corpus was analyzed in terms of the use (distributional features, semantic functions, and co-occurring constructions) and the misuse (syntactic errors, e.g., she can sings*.) of the nine modal verbs to uncover potential difficulties that confront learners. The analysis of distribution indicates several discrepancies between the textbook corpus and BNCS2014. The first four most frequent modal verbs in BNCS2014 are can, would, will, could, while can, will, should, could are the top four in the textbooks. Most strikingly, there is an unusually high proportion of can (41.1%) in the textbooks. The results on different meanings shows that will, would and must are the most problematic. For example, for will, the textbooks contain 20% more occurrences of 'volition' and 20% less of 'prediction' than those in BNCS2014. Regarding co-occurring structures, the textbooks over-represented the structure 'modal +do' across the nine modal verbs. Another major finding is that the structure of 'modal +have done' that frequently co-occur with could, would, should, and must is underused in textbooks. Besides, these four modal verbs are the most difficult for learners, as the error analysis shows. This study demonstrates how the synergy of native and learner corpora can be harnessed to improve EFL textbook presentation of modal verbs in a way that textbooks can provide not only authentic language used in natural discourse but also appropriate design tailed for the needs of target learners.

Keywords: English as Foreign Language, EFL textbooks, learner corpus, modal verbs, native corpus

Procedia PDF Downloads 142
35943 We Are the 99 percent – the Occupy-Movement in Social Media

Authors: Wolfram Karg

Abstract:

The Occupy-Movement came into in 2011 existence in the US as a reaction to one of the worst economic crisis since World War II. With cuts in benefits and social services, with people being evicted from their homes on the one hand and high bonuses granted to their managers of the very same companies, a strong feeling of injustice besieged people in the US and caused them to voice their anger peacefully in social media and on the streets. Due to the world-wide-web, users all around the world read about this movement and recognized the same injustice in their own countries, making Occupy a global movement. The vast array of topics covered by Occupy offers a unique chance to carry out a corpus-based discourse analysis based on the DIMEAN-Model. The focus on this paper is limited to two aspects of DIMEAN: intertextual references and the use of connectors in texts. Because the discourse is to a large extent carried out via posts in blogs, online-articles and comments, the paper also analyses, in how far modern (i.e. computer-based media) there is a correlation between the use of connectors in different communicative types used by the Occupy-Movement.

Keywords: discourse, new media, occupy, corpus analysis

Procedia PDF Downloads 493
35942 An Automatic Speech Recognition Tool for the Filipino Language Using the HTK System

Authors: John Lorenzo Bautista, Yoon-Joong Kim

Abstract:

This paper presents the development of a Filipino speech recognition tool using the HTK System. The system was trained from a subset of the Filipino Speech Corpus developed by the DSP Laboratory of the University of the Philippines-Diliman. The speech corpus was both used in training and testing the system by estimating the parameters for phonetic HMM-based (Hidden-Markov Model) acoustic models. Experiments on different mixture-weights were incorporated in the study. The phoneme-level word-based recognition of a 5-state HMM resulted in an average accuracy rate of 80.13 for a single-Gaussian mixture model, 81.13 after implementing a phoneme-alignment, and 87.19 for the increased Gaussian-mixture weight model. The highest accuracy rate of 88.70% was obtained from a 5-state model with 6 Gaussian mixtures.

Keywords: Filipino language, Hidden Markov Model, HTK system, speech recognition

Procedia PDF Downloads 480
35941 Attitude in Academic Writing (CAAW): Corpus Compilation and Annotation

Authors: Hortènsia Curell, Ana Fernández-Montraveta

Abstract:

This paper presents the creation, development, and analysis of a corpus designed to study the presence of attitude markers and author’s stance in research articles in two different areas of linguistics (theoretical linguistics and sociolinguistics). These two disciplines are expected to behave differently in this respect, given the disparity in their discursive conventions. Attitude markers in this work are understood as the linguistic elements (adjectives, nouns and verbs) used to convey the writer's stance towards the content presented in the article, and are crucial in understanding writer-reader interaction and the writer's position. These attitude markers are divided into three broad classes: assessment, significance, and emotion. In addition to them, we also consider first-person singular and plural pronouns and possessives, modal verbs, and passive constructions, which are other linguistic elements expressing the author’s stance. The corpus, Corpus of Attitude in Academic Writing (CAAW), comprises a collection of 21 articles, collected from six journals indexed in JCR. These articles were originally written in English by a single native-speaker author from the UK or USA and were published between 2022 and 2023. The total number of words in the corpus is approximately 222,400, with 106,422 from theoretical linguistics (Lingua, Linguistic Inquiry and Journal of Linguistics) and 116,022 from sociolinguistics journals (International Journal of the Sociology of Language, Language in Society and Journal of Sociolinguistics). Together with the corpus, we present the tool created for the creation and storage of the corpus, along with a tool for automatic annotation. The steps followed in the compilation of the corpus are as follows. First, the articles were selected according to the parameters explained above. Second, they were downloaded and converted to txt format. Finally, examples, direct quotes, section titles and references were eliminated, since they do not involve the author’s stance. The resulting texts were the input for the annotation of the linguistic features related to stance. As for the annotation, two articles (one from each subdiscipline) were annotated manually by the two researchers. An existing list was used as a baseline, and other attitude markers were identified, together with the other elements mentioned above. Once a consensus was reached, the rest of articles were annotated automatically using the tool created for this purpose. The annotated corpus will serve as a resource for scholars working in discourse analysis (both in linguistics and communication) and related fields, since it offers new insights into the expression of attitude. The tools created for the compilation and annotation of the corpus will be useful to study author’s attitude and stance in articles from any academic discipline: new data can be uploaded and the list of markers can be enlarged. Finally, the tool can be expanded to other languages, which will allow cross-linguistic studies of author’s stance.

Keywords: academic writing, attitude, corpus, english

Procedia PDF Downloads 74
35940 Finding Related Scientific Documents Using Formal Concept Analysis

Authors: Nadeem Akhtar, Hira Javed

Abstract:

An important aspect of research is literature survey. Availability of a large amount of literature across different domains triggers the need for optimized systems which provide relevant literature to researchers. We propose a search system based on keywords for text documents. This experimental approach provides a hierarchical structure to the document corpus. The documents are labelled with keywords using KEA (Keyword Extraction Algorithm) and are automatically organized in a lattice structure using Formal Concept Analysis (FCA). This groups the semantically related documents together. The hierarchical structure, based on keywords gives out only those documents which precisely contain them. This approach open doors for multi-domain research. The documents across multiple domains which are indexed by similar keywords are grouped together. A hierarchical relationship between keywords is obtained. To signify the effectiveness of the approach, we have carried out the experiment and evaluation on Semeval-2010 Dataset. Results depict that the presented method is considerably successful in indexing of scientific papers.

Keywords: formal concept analysis, keyword extraction algorithm, scientific documents, lattice

Procedia PDF Downloads 332
35939 Corpus-Based Description of Core English Nouns of Pakistani English, an EFL Learner Perspective at Secondary Level

Authors: Abrar Hussain Qureshi

Abstract:

Vocabulary has been highlighted as a key indicator in any foreign language learning program, especially English as a foreign language (EFL). It is often considered a potential tool in foreign language curriculum, and its deficiency impedes successful communication in the target language. The knowledge of the lexicon is very significant in getting communicative competence and performance. Nouns constitute a considerable bulk of English vocabulary. Rather, they are the bones of the English language and are the main semantic carrier in spoken and written discourse. As nouns dominate the bulk of the English lexicon, their role becomes all the more potential. The undertaken research is a systematic effort in this regard to work out a list of highly frequent list of Pakistani English nouns for the EFL learners at the secondary level. It will encourage autonomy for the EFL learners as well as will save their time. The corpus used for the research has been developed locally from leading English newspapers of Pakistan. Wordsmith Tools has been used to process the research data and to retrieve word list of frequent Pakistani English nouns. The retrieved list of core Pakistani English nouns is supposed to be useful for English language learners at the secondary level as it covers a wide range of speech events.

Keywords: corpus, EFL, frequency list, nouns

Procedia PDF Downloads 103
35938 Cognitive Translation and Conceptual Wine Tasting Metaphors: A Corpus-Based Research

Authors: Christine Demaecker

Abstract:

Many researchers have underlined the importance of metaphors in specialised language. Their use of specific domains helps us understand the conceptualisations used to communicate new ideas or difficult topics. Within the wide area of specialised discourse, wine tasting is a very specific example because it is almost exclusively metaphoric. Wine tasting metaphors express various conceptualisations. They are not linguistic but rather conceptual, as defined by Lakoff & Johnson. They correspond to the linguistic expression of a mental projection from a well-known or more concrete source domain onto the target domain, which is the taste of wine. But unlike most specialised terminologies, the vocabulary is never clearly defined. When metaphorical terms are listed in dictionaries, their definitions remain vague, unclear, and circular. They cannot be replaced by literal linguistic expressions. This makes it impossible to transfer them into another language with the traditional linguistic translation methods. Qualitative research investigates whether wine tasting metaphors could rather be translated with the cognitive translation process, as well described by Nili Mandelblit (1995). The research is based on a corpus compiled from two high-profile wine guides; the Parker’s Wine Buyer’s Guide and its translation into French and the Guide Hachette des Vins and its translation into English. In this small corpus with a total of 68,826 words, 170 metaphoric expressions have been identified in the original English text and 180 in the original French text. They have been selected with the MIPVU Metaphor Identification Procedure developed at the Vrije Universiteit Amsterdam. The selection demonstrates that both languages use the same set of conceptualisations, which are often combined in wine tasting notes, creating conceptual integrations or blends. The comparison of expressions in the source and target texts also demonstrates the use of the cognitive translation approach. In accordance with the principle of relevance, the translation always uses target language conceptualisations, but compared to the original, the highlighting of the projection is often different. Also, when original metaphors are complex with a combination of conceptualisations, at least one element of the original metaphor underlies the target expression. This approach perfectly integrates into Lederer’s interpretative model of translation (2006). In this triangular model, the transfer of conceptualisation could be included at the level of ‘deverbalisation/reverbalisation’, the crucial stage of the model, where the extraction of meaning combines with the encyclopedic background to generate the target text.

Keywords: cognitive translation, conceptual integration, conceptual metaphor, interpretative model of translation, wine tasting metaphor

Procedia PDF Downloads 131
35937 The Analysis of Deceptive and Truthful Speech: A Computational Linguistic Based Method

Authors: Seham El Kareh, Miramar Etman

Abstract:

Recently, detecting liars and extracting features which distinguish them from truth-tellers have been the focus of a wide range of disciplines. To the author’s best knowledge, most of the work has been done on facial expressions and body gestures but only few works have been done on the language used by both liars and truth-tellers. This paper sheds light on four axes. The first axis copes with building an audio corpus for deceptive and truthful speech for Egyptian Arabic speakers. The second axis focuses on examining the human perception of lies and proving our need for computational linguistic-based methods to extract features which characterize truthful and deceptive speech. The third axis is concerned with building a linguistic analysis program that could extract from the corpus the inter- and intra-linguistic cues for deceptive and truthful speech. The program built here is based on selected categories from the Linguistic Inquiry and Word Count program. Our results demonstrated that Egyptian Arabic speakers on one hand preferred to use first-person pronouns and present tense compared to the past tense when lying and their lies lacked of second-person pronouns, and on the other hand, when telling the truth, they preferred to use the verbs related to motion and the nouns related to time. The results also showed that there is a need for bigger data to prove the significance of words related to emotions and numbers.

Keywords: Egyptian Arabic corpus, computational analysis, deceptive features, forensic linguistics, human perception, truthful features

Procedia PDF Downloads 206
35936 A Comparison of the First Language Vocabulary Used by Indonesian Year 4 Students and the Vocabulary Taught to Them in English Language Textbooks

Authors: Fitria Ningsih

Abstract:

This study concerns on the process of making corpus obtained from Indonesian year 4 students’ free writing compared to the vocabulary taught in English language textbooks. 369 students’ sample writings from 19 public elementary schools in Malang, East Java, Indonesia and 5 selected English textbooks were analyzed through corpus in linguistics method using AdTAT -the Adelaide Text Analysis Tool- program. The findings produced wordlists of the top 100 words most frequently used by students and the top 100 words given in English textbooks. There was a 45% match between the two lists. Furthermore, the classifications of the top 100 most frequent words from the two corpora based on part of speech found that both the Indonesian and English languages employed a similar use of nouns, verbs, adjectives, and prepositions. Moreover, to see the contextualizing the vocabulary of learning materials towards the students’ need, a depth-analysis dealing with the content and the cultural views from the vocabulary taught in the textbooks was discussed through the criteria developed from the checklist. Lastly, further suggestions are addressed to language teachers to understand the students’ background such as recognizing the basic words students acquire before teaching them new vocabulary in order to achieve successful learning of the target language.

Keywords: corpus, frequency, English, Indonesian, linguistics, textbooks, vocabulary, wordlists, writing

Procedia PDF Downloads 187
35935 The Noun-Phrase Elements on the Usage of the Zero Article

Authors: Wen Zhen

Abstract:

Compared to content words, function words have been relatively overlooked by English learners especially articles. The article system, to a certain extent, becomes a resistance to know English better, driven by different elements. Three principal factors can be summarized in term of the nature of the articles when referring to the difficulty of the English article system. However, making the article system more complex are difficulties in the second acquisition process, for [-ART] learners have to create another category, causing even most non-native speakers at proficiency level to make errors. According to the sequences of acquisition of the English article, it is showed that the zero article is first acquired and in high inaccuracy. The zero article is often overused in the early stages of L2 acquisition. Although learners at the intermediate level move to underuse the zero article for they realize that the zero article does not cover any case, overproduction of the zero article even occurs among advanced L2 learners. The aim of the study is to investigate noun-phrase factors which give rise to incorrect usage or overuse of the zero article, thus providing suggestions for L2 English acquisition. Moreover, it enables teachers to carry out effective instruction that activate conscious learning of students. The research question will be answered through a corpus-based, data- driven approach to analyze the noun-phrase elements from the semantic context and countability of noun-phrases. Based on the analysis of the International Thurber Thesis corpus, the results show that: (1) Although context of [-definite,-specific] favored the zero article, both[-definite,+specific] and [+definite,-specific] showed less influence. When we reflect on the frequency order of the zero article , prototypicality plays a vital role in it .(2)EFL learners in this study have trouble classifying abstract nouns as countable. We can find that it will bring about overuse of the zero article when learners can not make clear judgements on countability altered from (+definite ) to (-definite).Once a noun is perceived as uncountable by learners, the choice would fall back on the zero article. These findings suggest that learners should be engaged in recognition of the countability of new vocabulary by explaining nouns in lexical phrases and explore more complex aspects such as analysis dependent on discourse.

Keywords: noun phrase, zero article, corpus, second language acquisition

Procedia PDF Downloads 253
35934 Ultrasonic Assessment of Corpora lutea and Plasma Progesterone Levels in Early Pregnant and Non Pregnant Cows

Authors: Abdurraouf O. Gaja, Salah Y. A. Al-Dahash, Guru Solmon Raju, Chikara Kubota

Abstract:

Corpus luteum cross sectional (by ultrasonography) and plasma progesterone (by DELFIA) were estimated in early pregnant and non pregnant cows on days 14th and 20th to 23rd post insemination. On day 14th, corpus luteum sectional area was 348.43 mm2 in pregnant and 387.84mm2 in non pregnant cows. Within days 20th to 23rd, corpus luteum sectional area ranged between 342.06 and 367.90 mm2 in pregnant and between 193.85 and 270.69 mm2 in non pregnant cows. Plasma progesterone level was 2.43 ng/ml in pregnant and 2.46 ng/ml in non pregnant cows on day 14th, while during days 20th to 23rd the level ranged between 2.47 and 2,84 ng/ml in pregnant and between 0.53 and 1.17 ng/ml in non pregnant cows. Results of both luteal tissue areas as well as plasma progesterone levels were highly significantly deferent (P<0.01) between pregnant and non pregnant cows during days 20th to 23rd, but there were no significant differences on day 14th. The correlation between CL cross-sectional area and plasma progesterone level was 0.4 in pregnant cows and 0.99 in non pregnant cow. It is clear, from this study, that ultrasonic assessment of corpora lutea is a viable alternative to determine plasma progesterone levels for early pregnancy diagnosis in cows.

Keywords: progesterone, ultrasonography, corpus luteum, pregnancy diagnosis, cow

Procedia PDF Downloads 308
35933 Exploring the Use of Adverbs in Two Young Learners Written Corpora

Authors: Chrysanthi S. Tiliakou, Katerina T. Frantzi

Abstract:

Writing has always been considered a most demanding skill for English as a Foreign Language learners as well as for native speakers. Novice foreign language writers are asked to handle a limited range of vocabulary to produce writing tasks at lower levels. Adverbs are the parts of speech that are not used extensively in the early stages of English as a Foreign Language writing. An additional problem with learning new adverbs is that, next to learning their meanings, learners are expected to acquire the proper placement of adverbs in a sentence. The use of adverbs is important as they enhance “expressive richness to one’s message”. By exploring the patterns of use of adverbs, researchers and educators can identify types of adverbs, which appear more taxing for young learners or that puzzle novice English as a Foreign Language writers with their placement, and focus on their teaching. To this end, the study examines the use of adverbs on two written Corpora of young learners of English of A1 – A2 levels and determines the types of adverbs used, their frequencies, problems in their use, and whether there is any differentiation between levels. The Antconc concordancing tool was used for the Greek Learner Corpus, and the Corpuscle concordancing tool for the Norwegian Corpus. The research found a similarity in the normalized frequencies of the adverbs used in the A1-A2 level Greek Learner Corpus with the frequencies of the same adverbs in the Norwegian Learner Corpus.

Keywords: learner corpora, young learners, writing, use of adverbs

Procedia PDF Downloads 92
35932 The Construction of Malaysian Airline Tragedies in Malaysian and British Online News: A Multidisciplinary Study

Authors: Theng Theng Ong

Abstract:

This study adopts a multidisciplinary method by combining the corpus-based discourse analysis study and language attitude study to explore the construction of Malaysia airline tragedies: MH370, MH17 and QZ8501 in the selected Malaysian and United Kingdom (UK) online news. The study aims to determine the ways in which Malaysian Airline tragedies MH370, MH17 and QZ8501 are linguistically defined and constructed in terms of keyword and collocation. The study also seeks to identify the types of discourse that are presented in the new articles. The differences or similarities in terms of keywords, topics or issues covered by the selected Malaysian and UK news media will also be examined. Finally, the language attitude study will be carried out to examine the Malaysia and UK university students’ attitudes toward the keywords, topics or issues covered by the selected Malaysian and UK news media pertaining to Malaysian Airline tragedies MH370, MH17 and QZ8501. The analysis is divided into two parts with the first part focusing on corpus-based discourse analysis on the media text. The second part of the study is to investigate Malaysians and UK news readers’ attitudes towards the online news being reported by the Malaysian and UK news media pertaining to the Airline tragedies. The main findings of corpus-based discourse analysis are essential in designing the questions in the questionnaires and interview and therefore led to the identification of the attitudes among Malaysian and UK news. This study adopts a multidisciplinary method by combining the corpus-based discourse analysis study and language attitude study to explore the construction of Malaysia airline tragedies: MH370, MH17 and QZ8501 in the selected Malaysian and United Kingdom (UK) online news. The study aims to determine the ways in which Malaysian Airline tragedies MH370, MH17 and QZ8501 are linguistically defined and constructed in terms of keyword and collocation. The study also seeks to identify the types of discourse that are presented in the new articles. The differences or similarities in terms of keywords, topics or issues covered by the selected Malaysian and UK news media will also be examined. Finally, the language attitude study will be carried out to examine the Malaysia and UK university students’ attitudes toward the keywords, topics or issues covered by the selected Malaysian and UK news media pertaining to Malaysian Airline tragedies MH370, MH17 and QZ8501. The analysis is divided into two parts with the first part focusing on corpus-based discourse analysis on the media text. The second part of the study is to investigate Malaysians and UK news readers’ attitudes towards the online news being reported by the Malaysian and UK news media pertaining to the Airline tragedies. The main findings of corpus-based discourse analysis are essential in designing the questions in the questionnaires and interview and therefore led to the identification of the attitudes among Malaysian and UK news.

Keywords: corpus linguistics, critical discourse analysis, news media, tragedies study

Procedia PDF Downloads 335
35931 Sentence Structure for Free Word Order Languages in Context with Anaphora Resolution: A Case Study of Hindi

Authors: Pardeep Singh, Kamlesh Dutta

Abstract:

Many languages have fixed sentence structure and others are free word order. The accuracy of anaphora resolution of syntax based algorithm depends on structure of the sentence. So, it is important to analyze the structure of any language before implementing these algorithms. In this study, we analyzed the sentence structure exploiting the case marker in Hindi as well as some special tag for subject and object. We also investigated the word order for Hindi. Word order typology refers to the study of the order of the syntactic constituents of a language. We analyzed 165 news items of Ranchi Express from EMILEE corpus of plain text. It consisted of 1745 sentences. Eight file of dialogue based from the same corpus has been analyzed which will have 1521 sentences. The percentages of subject object verb structure (SOV) and object subject verb (OSV) are 66.90 and 33.10, respectively.

Keywords: anaphora resolution, free word order languages, SOV, OSV

Procedia PDF Downloads 472
35930 Concepts of Creation and Destruction as Cognitive Instruments in World View Study

Authors: Perizat Balkhimbekova

Abstract:

Evolutionary changes in cognitive world view taking place in the last decades are followed by changes in perception of the key concepts which are related to the certain lingua-cultural sphere. Also, such concepts reflect the person’s attitude to essential processes in the sphere of concepts, e.g. the opposite operations like creation and destruction. These changes in people’s life and thinking are displayed in a language world view. In order to open the maintenance of mental structures and concepts we should use language means as observable results of people’s cognitive activity. Semantics of words, free phrases and idioms should be considered as an authoritative source of information concerning concepts. The regularized set of concepts in people consciousness forms the sphere of concepts. Cognitive linguistics widely discusses the sphere of concepts as its crucial category defining it as the field of knowledge which is made of concepts. It is considered that a sphere of concepts comprises the various types of association and forms conceptual fields. As a material for the given research, the data from Russian National Corpus and British National Corpus were used. In is necessary to point out that data provided by computational studies, are intrinsic and verifiable; so that we have used them in order to get the reliable results. The procedure of study was based on such techniques as extracting of the context containing concepts of creation|destruction from the Russian National Corpus (RNC), and British National Corpus (BNC); analyzing and interpreting of those context on the basis of cognitive approach; finding of correspondence between the given concepts in the Russian and English world view. The key problem of our study is to find the correspondence between the elements of world view represented by opposite concepts such as creation and destruction. Findings: The concept of "destruction" indicates a process which leads to full or partial destruction of an object. In other words, it is a loss of the object primary essence: structures, properties, distinctive signs and its initial integrity. The concept of "creation", on the contrary, comprises positive characteristics, represents the activity aimed at improvement of the certain object, at the creation of ideal models of the world. On the other hand, destruction is represented much more widely in RNC than creation (1254 cases of the first concept by comparison to 192 cases for the second one). Our hypothesis consists in the antinomy represented by the aforementioned concepts. Being opposite both in respect of semantics and pragmatics, and from the point of view of axiology, they are at the same time complementary and interrelated concepts.

Keywords: creation, destruction, concept, world view

Procedia PDF Downloads 345
35929 Number Variation of the Personal Pronoun we Used by Chinese English Learners

Authors: Qiong Hu, Ming Yue

Abstract:

Language variation signals the newest usage of language community, which might become the developmental trend of that language. However, language textbooks cannot keep up with these emergent usages. Most Chinese English learners nowadays are still exposed to traditional grammar prescribed in the textbook so that some variational usages cannot be acquired. The personal pronoun we is prescribed as a plural pronoun in the textbook grammar, but its number value is more flexible in actual use. Based on the Chinese Learner English Corpus (CLEC), and with the homemade Friends corpus as reference, the present research explores the number value of the first person pronoun we used by Chinese English learners. With consideration of the subjectivity of we, this paper annotated the number value of all the wes in “we+ PCU (Perception-cognation-utterance) verbs” collocations. Results show that though exposed to traditional textbooks which prescribe the plural reference of we, there still exists some unconventional usage (singular or vague in reference) in the writings of Chinese English learners, which is less frequent than that of the native speeches. Corpus data and results from manual semantic annotation show that this could be due to the impact of formulaic sequence on the learners and the positive transfer from their native language. An improved SLA model of native language, target language and interlanguage is put forward to recognize the existence of variation in second language acquisition, which should be given more attention during teaching.

Keywords: Chinese English learners, number, PCU verbs, Personal pronoun we

Procedia PDF Downloads 355
35928 Project and Module Based Teaching and Learning

Authors: Jingyu Hou

Abstract:

This paper proposes a new teaching and learning approach-project and Module Based Teaching and Learning (PMBTL). The PMBTL approach incorporates the merits of project/problem based and module based learning methods, and overcomes the limitations of these methods. The correlation between teaching, learning, practice, and assessment is emphasized in this approach, and new methods have been proposed accordingly. The distinct features of these new methods differentiate the PMBTL approach from conventional teaching approaches. Evaluation of this approach on practical teaching and learning activities demonstrates the effectiveness and stability of the approach in improving the performance and quality of teaching and learning. The approach proposed in this paper is also intuitive to the design of other teaching units.

Keywords: computer science education, project and module based, software engineering, module based teaching and learning

Procedia PDF Downloads 492
35927 Designing a Corpus Database to Enhance the Learning of Old English Language

Authors: Raquel Mateo Mendaza, Carmen Novo Urraca

Abstract:

The current paper presents the elaboration of a corpus database that aligns two different corpora in order to simplify the search of information both for researchers and students of Old English. This database comprises the information contained in two main reference corpora, namely the Dictionary of Old English Corpus (DOEC), compiled at the University of Toronto, and the York-Toronto-Helsinki Parsed Corpus of Old English (YCOE). The first one provides information on all surviving texts written in the Old English language. The latter offers the syntactical and morphological annotation of several texts included in the DOEC. Although both corpora are closely related, as the YCOE includes the DOE source text identifier, the main problem detected is that there is not an alignment of texts that allows for the search of whole fragments to be further analysed in terms of morphology and syntax. The database proposed in this paper gathers all this information and presents it in a simple, more accessible, visual, and educational way. The alignment of fragments has been done in an automatized way. However, some problems have emerged during the creating process particularly related to the lack of correspondence in the division of fragments. For this reason, it has been necessary to revise the whole entries manually to obtain a truthful high-quality product and to carefully indicate the gaps encountered in these corpora. All in all, this database contains more than 60,000 entries corresponding with the DOE fragments annotated by the YCOE. The main strength of the resulting product is its research and teaching implications in the study of Old English. The use of this database will help researchers and students in the study of different aspects of the language, such as inflectional morphology, syntactic behaviour of given words, or translation studies, among others. By means of the search of words or fragments, the annotated information on morphology and syntax will be automatically displayed, automatizing, and speeding up the search of data.

Keywords: alignment, corpus database, morphosyntactic analysis, Old English

Procedia PDF Downloads 134
35926 The Contribution of Corpora to the Investigation of Cross-Linguistic Equivalence in Phraseology: A Contrastive Analysis of Russian and Italian Idioms

Authors: Federica Floridi

Abstract:

The long tradition of contrastive idiom research has essentially been focusing on three domains: the comparison of structural types of idioms (e.g. verbal idioms, idioms with noun-phrase structure, etc.), the description of idioms belonging to the same thematic groups (Sachgruppen), the identification of different types of cross-linguistic equivalents (i.e. full equivalents, partial equivalents, phraseological parallels, non-equivalents). The diastratic, diachronic and diatopic aspects of the compared idioms, as well as their syntactic, pragmatic and semantic properties, have been rather ignored. Corpora (both monolingual and parallel) give the opportunity to investigate the actual use of correlating idioms in authentic texts of L1 and L2. Adopting the corpus-based approach, it is possible to draw attention to the frequency of occurrence of idioms, their syntactic embedding, their potential syntactic transformations (e.g., nominalization, passivization, relativization, etc.), their combinatorial possibilities, the variations of their lexical structure, their connotations in terms of stylistic markedness or register. This paper aims to present the results of a contrastive analysis of Russian and Italian idioms referring to the concepts of ‘beginning’ and ‘end’, that has been carried out by using the Russian National Corpus and the ‘La Repubblica’ corpus. Beyond the digital corpora, bilingual dictionaries, like Skvorcova - Majzel’, Dobrovol’skaja, Kovalev, Čerdanceva, as well as monolingual resources, have been consulted. The study has shown that many of the idioms that have been traditionally indicated as cross-linguistic equivalents on bilingual dictionaries cannot be considered correspondents. The findings demonstrate that even those idioms, that are formally identical in Russian and Italian and are presumably derived from the same source (e.g., conceptual metaphor, Bible, classical mythology, World literature), exhibit differences regarding usage. The ultimate purpose of this article is to highlight that it is necessary to review and improve the existing bilingual dictionaries considering the empirical data collected in corpora. The materials gathered in this research can contribute to this sense.

Keywords: corpora, cross-linguistic equivalence, idioms, Italian, Russian

Procedia PDF Downloads 147
35925 Learning Vocabulary with SkELL: Developing a Methodology with University Students in Japan Using Action Research

Authors: Henry R. Troy

Abstract:

Corpora are becoming more prevalent in the language classroom, especially in the development of dictionaries and course materials. Nevertheless, corpora are still perceived by many educators as difficult to use directly in the classroom, a process which is also known as “data-driven learning” (DDL). Action research has been identified as a method by which DDL’s efficiency can be increased, but it is also an approach few studies on DDL have employed. Studies into the effectiveness of DDL in language education in Japan are also rare, and investigations focused more on student and teacher reactions rather than pre and post-test scores are rarer still. This study investigates the student and teacher reactions to the use of SkELL, a free online corpus designed to be user-friendly, for vocabulary learning at a university in Japan. Action research is utilized to refine the teaching methodology, with changes to the method based on student and teacher feedback received via surveys submitted after each of the four implementations of DDL. After some training, the students used tablets to study the target vocabulary autonomously in pairs and groups, with the teacher acting as facilitator. The results show that the students enjoyed using SkELL and felt it was effective for vocabulary learning, while the teaching methodology grew in efficiency throughout the course. These findings suggest that action research can be a successful method for increasing the efficacy of DDL in the language classroom, especially with teachers and students who are new to the practice.

Keywords: action research, corpus linguistics, data-driven learning, vocabulary learning

Procedia PDF Downloads 246
35924 The Istrian Istrovenetian-Croatian Bilingual Corpus

Authors: Nada Poropat Jeletic, Gordana Hrzica

Abstract:

Bilingual conversational corpora represent a meaningful and the most comprehensive data source for investigating the genuine contact phenomena in non-monitored bi-lingual speech productions. They can be particularly useful for bilingual research since some features of bilingual interaction can hardly be accessed with more traditional methodologies (e.g., elicitation tasks). The method of language sampling provides the resources for describing language interaction in a bilingual community and/or in bilingual situations (e.g. code-switching, amount of languages used, number of languages used, etc.). To capture these phenomena in genuine communication situations, such sampling should be as close as possible to spontaneous communication. Bilingual spoken corpus design is methodologically demanding. Therefore this paper aims at describing the methodological challenges that apply to the corpus design of the conversational corpus design of the Istrian Istrovenetian-Croatian Bilingual Corpus. Croatian is the first official language of the Croatian-Italian officially bilingual Istria County, while Istrovenetian is a diatopic subvariety of Venetian, a longlasting lingua franca in the Istrian peninsula, the mother tongue of the members of the Italian National Community in Istria and the primary code of informal everyday communication among the Istrian Italophone population. Within the CLARIN infrastructure, TalkBank is being used, as it provides relevant procedures for designing and analyzing bilingual corpora. Furthermore, it allows public availability allows for easy replication of studies and cumulative progress as a research community builds up around the corpus, while the tools developed within the field of corpus linguistics enable easy retrieval and analysis of information. The method of language sampling employed is kept at the level of spontaneous communication, in order to maximise the naturalness of the collected conversational data. All speakers have provided written informed consent in which they agree to be recorded at a random point within the period of one month after signing the consent. Participants are administered a background questionnaire providing information about the socioeconomic status and the exposure and language usage in the participants social networks. Recording data are being transcribed, phonologically adapted within a standard-sized orthographic form, coded and segmented (speech streams are being segmented into communication units based on syntactic criteria) and are being marked following the CHAT transcription system and its associated CLAN suite of programmes within the TalkBank toolkit. The corpus consists of transcribed sound recordings of 36 bilingual speakers, while the target is to publish the whole corpus by the end of 2020, by sampling spontaneous conversations among approximately 100 speakers from all the bilingual areas of Istria for ensuring representativeness (the participants are being recruited across three generations of native bilingual speakers in all the bilingual areas of the peninsula). Conversational corpora are still rare in TalkBank, so the Corpus will contribute to BilingBank as a highly relevant and scientifically reliable resource for an internationally established and active research community. The impact of the research of communities with societal bilingualism will contribute to the growing body of research on bilingualism and multilingualism, especially regarding topics of language dominance, language attrition and loss, interference and code-switching etc.

Keywords: conversational corpora, bilingual corpora, code-switching, language sampling, corpus design methodology

Procedia PDF Downloads 145
35923 Identification of Text Domains and Register Variation through the Analysis of Lexical Distribution in a Bangla Mass Media Text Corpus

Authors: Mahul Bhattacharyya, Niladri Sekhar Dash

Abstract:

The present research paper is an experimental attempt to investigate the nature of variation in the register in three major text domains, namely, social, cultural, and political texts collected from the corpus of Bangla printed mass media texts. This present study uses a corpus of a moderate amount of Bangla mass media text that contains nearly one million words collected from different media sources like newspapers, magazines, advertisements, periodicals, etc. The analysis of corpus data reveals that each text has certain lexical properties that not only control their identity but also mark their uniqueness across the domains. At first, the subject domains of the texts are classified into two parameters namely, ‘Genre' and 'Text Type'. Next, some empirical investigations are made to understand how the domains vary from each other in terms of lexical properties like both function and content words. Here the method of comparative-cum-contrastive matching of lexical load across domains is invoked through word frequency count to track how domain-specific words and terms may be marked as decisive indicators in the act of specifying the textual contexts and subject domains. The study shows that the common lexical stock that percolates across all text domains are quite dicey in nature as their lexicological identity does not have any bearing in the act of specifying subject domains. Therefore, it becomes necessary for language users to anchor upon certain domain-specific lexical items to recognize a text that belongs to a specific text domain. The eventual findings of this study confirm that texts belonging to different subject domains in Bangla news text corpus clearly differ on the parameters of lexical load, lexical choice, lexical clustering, lexical collocation. In fact, based on these parameters, along with some statistical calculations, it is possible to classify mass media texts into different types to mark their relation with regard to the domains they should actually belong. The advantage of this analysis lies in the proper identification of the linguistic factors which will give language users a better insight into the method they employ in text comprehension, as well as construct a systemic frame for designing text identification strategy for language learners. The availability of huge amount of Bangla media text data is useful for achieving accurate conclusions with a certain amount of reliability and authenticity. This kind of corpus-based analysis is quite relevant for a resource-poor language like Bangla, as no attempt has ever been made to understand how the structure and texture of Bangla mass media texts vary due to certain linguistic and extra-linguistic constraints that are actively operational to specific text domains. Since mass media language is assumed to be the most 'recent representation' of the actual use of the language, this study is expected to show how the Bangla news texts reflect the thoughts of the society and how they leave a strong impact on the thought process of the speech community.

Keywords: Bangla, corpus, discourse, domains, lexical choice, mass media, register, variation

Procedia PDF Downloads 174
35922 The Diary of Dracula, by Marin Mincu: Inquiries into a Romanian 'Book of Wisdom' as a Fictional Counterpart for Corpus Hermeticum

Authors: Lucian Vasile Bagiu, Paraschiva Bagiu

Abstract:

The novel written in Italian and published in Italy in 1992 by the Romanian scholar Marin Mincu is meant for the foreign reader, aiming apparently at a better knowledge of the historical character of Vlad the Empalor (Vlad Dracul), within the European cultural, political and historical context of 1463. Throughout the very well written tome, one comes to realize that one of the underlining levels of the fiction is the exposing of various fundamental features of the Romanian culture and civilization. The author of the diary, Dracula, makes mention of Corpus Hermeticum no less than fifteen times, suggesting his own diary is some sort of a philosophical counterpart. The essay focuses on several ‘truths’ and ‘wisdom’ revealed in the fictional teachings of Dracula. The boycott of History by the Romanians is identified as an echo of the philosophical approach of the famous Romanian scholar and writer Lucian Blaga. The orality of the Romanian culture is a landmark opposed to written culture of the Western Europe. The religion of the ancient Dacian God Zalmoxis is seen as the basis for the Romanian existential and/or metaphysical ethnic philosophy (a feature tackled by the famous Romanian historian of religion Mircea Eliade), with a suggestion that Hermes Trismegistus may have written his Corpus Hermeticum being influenced by Zalmoxis. The historical figure of the last Dacian king Decebalus (death 106 AD) is a good pretext for a tantalizing Indo-European suggestion that the prehistoric Thraco-Dacian people may have been the ancestors of the first Romans settled in Latium. The lost diary of the Emperor Trajan The Bello Dacico may have proved that the unknown language of the Dacians was very much alike Latin language (a secret well hidden by the Vatican). The attitude towards death of the Dacians, as described by Herodotus, may have later inspired Pitagora, Socrates, the Eleusinian and Orphic Mysteries, etc. All of these within the Humanistic and Renascentist European context of the epoch, Dracula having a close relationship with scholars such as Nicolaus Cusanus, Cosimo de Medici, Marsilio Ficino, Pope Pius II, etc. Thus The Diary of Dracula turns out as exciting and stupefying as Corpus Hermeticum, a book impossible to assimilate entirely, yet a reference not wise to be ignored.

Keywords: Corpus Hermeticum, Dacians, Dracula, Zalmoxis

Procedia PDF Downloads 158
35921 The Development of Chinese-English Homophonic Word Pairs Databases for English Teaching and Learning

Authors: Yuh-Jen Wu, Chun-Min Lin

Abstract:

Homophonic words are common in Mandarin Chinese which belongs to the tonal language family. Using homophonic cues to study foreign languages is one of the learning techniques of mnemonics that can aid the retention and retrieval of information in the human memory. When learning difficult foreign words, some learners transpose them with words in a language they are familiar with to build an association and strengthen working memory. These phonological clues are beneficial means for novice language learners. In the classroom, if mnemonic skills are used at the appropriate time in the instructional sequence, it may achieve their maximum effectiveness. For Chinese-speaking students, proper use of Chinese-English homophonic word pairs may help them learn difficult vocabulary. In this study, a database program is developed by employing Visual Basic. The database contains two corpora, one with Chinese lexical items and the other with English ones. The Chinese corpus contains 59,053 Chinese words that were collected by a web crawler. The pronunciations of this group of words are compared with words in an English corpus based on WordNet, a lexical database for the English language. Words in both databases with similar pronunciation chunks and batches are detected. A total of approximately 1,000 Chinese lexical items are located in the preliminary comparison. These homophonic word pairs can serve as a valuable tool to assist Chinese-speaking students in learning and memorizing new English vocabulary.

Keywords: Chinese, corpus, English, homophonic words, vocabulary

Procedia PDF Downloads 182
35920 Query in Grammatical Forms and Corpus Error Analysis

Authors: Katerina Florou

Abstract:

Two decades after coined the term "learner corpora" as collections of texts created by foreign or second language learners across various language contexts, and some years following suggestion to incorporate "focusing on form" within a Task-Based Learning framework, this study aims to explore how learner corpora, whether annotated with errors or not, can facilitate a focus on form in an educational setting. Argues that analyzing linguistic form serves the purpose of enabling students to delve into language and gain an understanding of different facets of the foreign language. This same objective is applicable when analyzing learner corpora marked with errors or in their raw state, but in this scenario, the emphasis lies on identifying incorrect forms. Teachers should aim to address errors or gaps in the students' second language knowledge while they engage in a task. Building on this recommendation, we compared the written output of two student groups: the first group (G1) employed the focusing on form phase by studying a specific aspect of the Italian language, namely the past participle, through examples from native speakers and grammar rules; the second group (G2) focused on form by scrutinizing their own errors and comparing them with analogous examples from a native speaker corpus. In order to test our hypothesis, we created four learner corpora. The initial two were generated during the task phase, with one representing each group of students, while the remaining two were produced as a follow-up activity at the end of the lesson. The results of the first comparison indicated that students' exposure to their own errors can enhance their grasp of a grammatical element. The study is in its second stage and more results are to be announced.

Keywords: Corpus interlanguage analysis, task based learning, Italian language as F1, learner corpora

Procedia PDF Downloads 53