Search results for: corpora
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 93

Search results for: corpora

93 Adjectives in Academic Discourse: A Comparative Study of Research Articles

Authors: Beata Grymska

Abstract:

The research studies on academic discourse focus in general on lexical bundles, epistemic modality markers, or interactions between writers and readers. Following the research into the written forms of the academic community, this study concentrates on adjectives in research articles. The study investigates the distribution of adjectives in research articles in two academic disciplines: linguistics and medicine. It is corpus-based in design and consists of 100 linguistic and 100 medical research articles all written in English. The aim of the study is to compare the distribution of adjectives between the two corpora and four main parts of articles: IMRD (Introduction, Methods, Results, and Discussion). The second aim is to see if the two corpora share common core adjectives, e.g., different, important, specific, and if there are discipline-specific adjectives. The further part of the paper elaborates on adjectives use in the corpora together with examples. The results indicate that the two corpora do not differ in the distribution of adjectives to a great extent. The occurrences of the most frequently used adjectives depend on the academic discipline of the research articles. The concluding part reflects upon the role of adjectives in academic discourse and also presents how corpora can be helpful in composing academic texts.

Keywords: academic discourse, academic texts, adjectives, corpus analysis, research articles

Procedia PDF Downloads 151
92 Using Corpora in Semantic Studies of English Adjectives

Authors: Oxana Lukoshus

Abstract:

The methods of corpus linguistics, a well-established field of research, are being increasingly applied in cognitive linguistics. Corpora data are especially useful for different quantitative studies of grammatical and other aspects of language. The main objective of this paper is to demonstrate how present-day corpora can be applied in semantic studies in general and in semantic studies of adjectives in particular. Polysemantic adjectives have been the subject of numerous studies. But most of them have been carried out on dictionaries. Undoubtedly, dictionaries are viewed as one of the basic data sources, but only at the initial steps of a research. The author usually starts with the analysis of the lexicographic data after which s/he comes up with a hypothesis. In the research conducted three polysemantic synonyms true, loyal, faithful have been analyzed in terms of differences and similarities in their semantic structure. A corpus-based approach in the study of the above-mentioned adjectives involves the following. After the analysis of the dictionary data there was the reference to the following corpora to study the distributional patterns of the words under study – the British National Corpus (BNC) and the Corpus of Contemporary American English (COCA). These corpora are continually updated and contain thousands of examples of the words under research which make them a useful and convenient data source. For the purpose of this study there were no special needs regarding genre, mode or time of the texts included in the corpora. Out of the range of possibilities offered by corpus-analysis software (e.g. word lists, statistics of word frequencies, etc.), the most useful tool for the semantic analysis was the extracting a list of co-occurrence for the given search words. Searching by lemmas, e.g. true, true to, and grouping the results by lemmas have proved to be the most efficient corpora feature for the adjectives under the study. Following the search process, the corpora provided a list of co-occurrences, which were then to be analyzed and classified. Not every co-occurrence was relevant for the analysis. For example, the phrases like An enormous sense of responsibility to protect the minds and hearts of the faithful from incursions by the state was perceived to be the basic duty of the church leaders or ‘True,’ said Phoebe, ‘but I'd probably get to be a Union Official immediately were left out as in the first example the faithful is a substantivized adjective and in the second example true is used alone with no other parts of speech. The subsequent analysis of the corpora data gave the grounds for the distribution groups of the adjectives under the study which were then investigated with the help of a semantic experiment. To sum it up, the corpora-based approach has proved to be a powerful, reliable and convenient tool to get the data for the further semantic study.

Keywords: corpora, corpus-based approach, polysemantic adjectives, semantic studies

Procedia PDF Downloads 287
91 Discourse Markers in Chinese University Students and Native English Speakers: A Corpus-Based Study

Authors: Dan Xie

Abstract:

The use of discourse markers (DMs) can play a crucial role in representing discourse interaction and pragmatic competence. Learners’ use of DMs and differences between native speakers (NSs) and non-native speakers (NNSs) in the use of various DMs have been the focus of considerable research attention. However, some commonly used DMs, such as you know, have not received as much attention in comparative studies, especially in the Chinese context. This study analyses data in two corpora (COLSEC and Spoken BNC 2014 (14-25)) to investigate how Chinese learners differ from NNSs in their use of the DM you know and its functions in speech. The results show that there is a significant difference between the two corpora in terms of the frequency of use of you know. In terms of the functions of you know, the study shows that six functions can all be present in both corpora, although there are significant differences between the five functional dimensions, especially in introducing a claim linked to the prior discourse and highlighting particular points in the discourse. It is hoped to show empirically how Chinese learners and NSs use DMs differently.

Keywords: you know, discourse marker, native speaker, Chinese learner

Procedia PDF Downloads 29
90 Query in Grammatical Forms and Corpus Error Analysis

Authors: Katerina Florou

Abstract:

Two decades after coined the term "learner corpora" as collections of texts created by foreign or second language learners across various language contexts, and some years following suggestion to incorporate "focusing on form" within a Task-Based Learning framework, this study aims to explore how learner corpora, whether annotated with errors or not, can facilitate a focus on form in an educational setting. Argues that analyzing linguistic form serves the purpose of enabling students to delve into language and gain an understanding of different facets of the foreign language. This same objective is applicable when analyzing learner corpora marked with errors or in their raw state, but in this scenario, the emphasis lies on identifying incorrect forms. Teachers should aim to address errors or gaps in the students' second language knowledge while they engage in a task. Building on this recommendation, we compared the written output of two student groups: the first group (G1) employed the focusing on form phase by studying a specific aspect of the Italian language, namely the past participle, through examples from native speakers and grammar rules; the second group (G2) focused on form by scrutinizing their own errors and comparing them with analogous examples from a native speaker corpus. In order to test our hypothesis, we created four learner corpora. The initial two were generated during the task phase, with one representing each group of students, while the remaining two were produced as a follow-up activity at the end of the lesson. The results of the first comparison indicated that students' exposure to their own errors can enhance their grasp of a grammatical element. The study is in its second stage and more results are to be announced.

Keywords: Corpus interlanguage analysis, task based learning, Italian language as F1, learner corpora

Procedia PDF Downloads 15
89 Frequency of the English Phrasal Verbs Used by Iranian Learners as a Reference to the Style of Writing Adopted by the Learners

Authors: Hamzeh Mazaherylaghab, Mehrangiz Vahabian, Seyyedeh Zahra Asghari

Abstract:

The present study initially focused on the frequency of phrasal verbs used by Iranian learners of English. The results then needed to be compared to the findings from native speaker corpora. After the extraction of phrasal verbs from learner and native-speaker corpora the findings were analysed. The results showed that Iranian learners avoided using phrasal verbs in many cases. Some of the findings proved to be significant. It was also found that the learners used the single-word counterparts of the avoided phrasal verbs to compensate for their lack of knowledge in many cases. Semantic complexity and Lack of L1 counterpart may have been the main reasons for avoidance, but despite the avoidance phenomenon, the learners displayed a tendency to use many other phrasal verbs which may have been due to the increase in the number of multi-word verbs in Persian. The overall scores confirmed the fact that the language produced by the learners illustrates signs of more formal style in comparison with the native speakers of English by using less phrasal verbs and more formal single word verbs instead.

Keywords: corpus, corpora, LOCNESS, phrasal verbs, single-word verb

Procedia PDF Downloads 161
88 The Contribution of Corpora to the Investigation of Cross-Linguistic Equivalence in Phraseology: A Contrastive Analysis of Russian and Italian Idioms

Authors: Federica Floridi

Abstract:

The long tradition of contrastive idiom research has essentially been focusing on three domains: the comparison of structural types of idioms (e.g. verbal idioms, idioms with noun-phrase structure, etc.), the description of idioms belonging to the same thematic groups (Sachgruppen), the identification of different types of cross-linguistic equivalents (i.e. full equivalents, partial equivalents, phraseological parallels, non-equivalents). The diastratic, diachronic and diatopic aspects of the compared idioms, as well as their syntactic, pragmatic and semantic properties, have been rather ignored. Corpora (both monolingual and parallel) give the opportunity to investigate the actual use of correlating idioms in authentic texts of L1 and L2. Adopting the corpus-based approach, it is possible to draw attention to the frequency of occurrence of idioms, their syntactic embedding, their potential syntactic transformations (e.g., nominalization, passivization, relativization, etc.), their combinatorial possibilities, the variations of their lexical structure, their connotations in terms of stylistic markedness or register. This paper aims to present the results of a contrastive analysis of Russian and Italian idioms referring to the concepts of ‘beginning’ and ‘end’, that has been carried out by using the Russian National Corpus and the ‘La Repubblica’ corpus. Beyond the digital corpora, bilingual dictionaries, like Skvorcova - Majzel’, Dobrovol’skaja, Kovalev, Čerdanceva, as well as monolingual resources, have been consulted. The study has shown that many of the idioms that have been traditionally indicated as cross-linguistic equivalents on bilingual dictionaries cannot be considered correspondents. The findings demonstrate that even those idioms, that are formally identical in Russian and Italian and are presumably derived from the same source (e.g., conceptual metaphor, Bible, classical mythology, World literature), exhibit differences regarding usage. The ultimate purpose of this article is to highlight that it is necessary to review and improve the existing bilingual dictionaries considering the empirical data collected in corpora. The materials gathered in this research can contribute to this sense.

Keywords: corpora, cross-linguistic equivalence, idioms, Italian, Russian

Procedia PDF Downloads 109
87 Designing a Corpus Database to Enhance the Learning of Old English Language

Authors: Raquel Mateo Mendaza, Carmen Novo Urraca

Abstract:

The current paper presents the elaboration of a corpus database that aligns two different corpora in order to simplify the search of information both for researchers and students of Old English. This database comprises the information contained in two main reference corpora, namely the Dictionary of Old English Corpus (DOEC), compiled at the University of Toronto, and the York-Toronto-Helsinki Parsed Corpus of Old English (YCOE). The first one provides information on all surviving texts written in the Old English language. The latter offers the syntactical and morphological annotation of several texts included in the DOEC. Although both corpora are closely related, as the YCOE includes the DOE source text identifier, the main problem detected is that there is not an alignment of texts that allows for the search of whole fragments to be further analysed in terms of morphology and syntax. The database proposed in this paper gathers all this information and presents it in a simple, more accessible, visual, and educational way. The alignment of fragments has been done in an automatized way. However, some problems have emerged during the creating process particularly related to the lack of correspondence in the division of fragments. For this reason, it has been necessary to revise the whole entries manually to obtain a truthful high-quality product and to carefully indicate the gaps encountered in these corpora. All in all, this database contains more than 60,000 entries corresponding with the DOE fragments annotated by the YCOE. The main strength of the resulting product is its research and teaching implications in the study of Old English. The use of this database will help researchers and students in the study of different aspects of the language, such as inflectional morphology, syntactic behaviour of given words, or translation studies, among others. By means of the search of words or fragments, the annotated information on morphology and syntax will be automatically displayed, automatizing, and speeding up the search of data.

Keywords: alignment, corpus database, morphosyntactic analysis, Old English

Procedia PDF Downloads 98
86 Corpora in Secondary Schools Training Courses for English as a Foreign Language Teachers

Authors: Francesca Perri

Abstract:

This paper describes a proposal for a teachers’ training course, focused on the introduction of corpora in the EFL didactics (English as a foreign language) of some Italian secondary schools. The training course is conceived as a part of a TEDD participant’s five months internship. TEDD (Technologies for Education: diversity and devices) is an advanced course held by the Department of Engineering and Information Technology at the University of Trento, Italy. Its main aim is to train a selected, heterogeneous group of graduates to engage with the complex interdependence between education and technology in modern society. The educational approach draws on a plural coexistence of various theories as well as socio-constructivism, constructionism, project-based learning and connectivism. TEDD educational model stands as the main reference source to the design of a formative course for EFL teachers, drawing on the digitalization of didactics and creation of learning interactive materials for L2 intermediate students. The training course lasts ten hours, organized into five sessions. In the first part (first and second session) a series of guided and semi-guided activities drive participants to familiarize with corpora through the use of a digital tools kit. Then, during the second part, participants are specifically involved in the realization of a ML (Mistakes Laboratory) where they create, develop and share digital activities according to their teaching goals with the use of corpora, supported by the digital facilitator. The training course takes place into an ICT laboratory where the teachers work either individually or in pairs, with a computer connected to a wi-fi connection, while the digital facilitator shares inputs, materials and digital assistance simultaneously on a whiteboard and on a digital platform where participants interact and work together both synchronically and diachronically. The adoption of good ICT practices is a fundamental step to promote the introduction and use of Corpus Linguistics in EFL teaching and learning processes, in fact dealing with corpora not only promotes L2 learners’ critical thinking and orienteering versus wild browsing when they are looking for ready-made translations or language usage samples, but it also entails becoming confident with digital tools and activities. The paper will explain reasons, limits and resources of the pedagogical approach adopted to engage EFL teachers with the use of corpora in their didactics through the promotion of digital practices.

Keywords: digital didactics, education, language learning, teacher training

Procedia PDF Downloads 124
85 Exploring the Use of Adverbs in Two Young Learners Written Corpora

Authors: Chrysanthi S. Tiliakou, Katerina T. Frantzi

Abstract:

Writing has always been considered a most demanding skill for English as a Foreign Language learners as well as for native speakers. Novice foreign language writers are asked to handle a limited range of vocabulary to produce writing tasks at lower levels. Adverbs are the parts of speech that are not used extensively in the early stages of English as a Foreign Language writing. An additional problem with learning new adverbs is that, next to learning their meanings, learners are expected to acquire the proper placement of adverbs in a sentence. The use of adverbs is important as they enhance “expressive richness to one’s message”. By exploring the patterns of use of adverbs, researchers and educators can identify types of adverbs, which appear more taxing for young learners or that puzzle novice English as a Foreign Language writers with their placement, and focus on their teaching. To this end, the study examines the use of adverbs on two written Corpora of young learners of English of A1 – A2 levels and determines the types of adverbs used, their frequencies, problems in their use, and whether there is any differentiation between levels. The Antconc concordancing tool was used for the Greek Learner Corpus, and the Corpuscle concordancing tool for the Norwegian Corpus. The research found a similarity in the normalized frequencies of the adverbs used in the A1-A2 level Greek Learner Corpus with the frequencies of the same adverbs in the Norwegian Learner Corpus.

Keywords: learner corpora, young learners, writing, use of adverbs

Procedia PDF Downloads 55
84 A Corpus Output Error Analysis of Chinese L2 Learners From America, Myanmar, and Singapore

Authors: Qiao-Yu Warren Cai

Abstract:

Due to the rise of big data, building corpora and using them to analyze ChineseL2 learners’ language output has become a trend. Various empirical research has been conducted using Chinese corpora built by different academic institutes. However, most of the research analyzed the data in the Chinese corpora usingcorpus-based qualitative content analysis with descriptive statistics. Descriptive statistics can be used to make summations about the subjects or samples that research has actually measured to describe the numerical data, but the collected data cannot be generalized to the population. Comte, a Frenchpositivist, has argued since the 19th century that human beings’ knowledge, whether the discipline is humanistic and social science or natural science, should be verified in a scientific way to construct a universal theory to explain the truth and human beings behaviors. Inferential statistics, able to make judgments of the probability of a difference observed between groups being dependable or caused by chance (Free Geography Notes, 2015)and to infer from the subjects or examples what the population might think or behave, is just the right method to support Comte’s argument in the field of TCSOL. Also, inferential statistics is a core of quantitative research, but little research has been conducted by combing corpora with inferential statistics. Little research analyzes the differences in Chinese L2 learners’ language corpus output errors by using theOne-way ANOVA so that the findings of previous research are limited to inferring the population's Chinese errors according to the given samples’ Chinese corpora. To fill this knowledge gap in the professional development of Taiwanese TCSOL, the present study aims to utilize the One-way ANOVA to analyze corpus output errors of Chinese L2 learners from America, Myanmar, and Singapore. The results show that no significant difference exists in ‘shì (是) sentence’ and word order errors, but compared with Americans and Singaporeans, it is significantly easier for Myanmar to have ‘sentence blends.’ Based on the above results, the present study provides an instructional approach and contributes to further exploration of how Chinese L2 learners can have (and use) learning strategies to lower errors.

Keywords: Chinese corpus, error analysis, one-way analysis of variance, Chinese L2 learners, Americans, myanmar, Singaporeans

Procedia PDF Downloads 72
83 A Corpus Based Study of Eileen Chang’s Self-Translating Style: A Case Study on The Rice Sprout Song

Authors: Yi-Wei Huang

Abstract:

Eileen Chang is a well-known writer of modern Chinese literature. She is also a translator that publishes her self-translation The Rice Sprout Song. The purpose of the study is to identify the style of Eileen Chang’s self-translations by corpora, especially in the case of The Rice Sprout Song. The Rice Sprout Song is first written in English and then translated into Chinese by the author herself. The procedure of translation is complicated due to the bilingual transition by the same person. Therefore, the aim of the study is to identify Eileen Chang’s style on her self-translation by comparing her works The Old Man and the Sea, The Rice Sprout Song, and The Rouge of The North. The study uses computer-aided software like AntConc, Notepad++, StanfordCoreNLP, and Python to analyze the style of the works, especially focuses on reduplications and the composition of the sentences. Reduplications are commonly seen in Eileen Chang’s works, and they often appear with colors or onomatopoeia. With these criteria, the style of self-translating can be detected and analyzed.

Keywords: corpora, Eileen Chang, reduplications, self-translation

Procedia PDF Downloads 200
82 Learning to Translate by Learning to Communicate to an Entailment Classifier

Authors: Szymon Rutkowski, Tomasz Korbak

Abstract:

We present a reinforcement-learning-based method of training neural machine translation models without parallel corpora. The standard encoder-decoder approach to machine translation suffers from two problems we aim to address. First, it needs parallel corpora, which are scarce, especially for low-resource languages. Second, it lacks psychological plausibility of learning procedure: learning a foreign language is about learning to communicate useful information, not merely learning to transduce from one language’s 'encoding' to another. We instead pose the problem of learning to translate as learning a policy in a communication game between two agents: the translator and the classifier. The classifier is trained beforehand on a natural language inference task (determining the entailment relation between a premise and a hypothesis) in the target language. The translator produces a sequence of actions that correspond to generating translations of both the hypothesis and premise, which are then passed to the classifier. The translator is rewarded for classifier’s performance on determining entailment between sentences translated by the translator to disciple’s native language. Translator’s performance thus reflects its ability to communicate useful information to the classifier. In effect, we train a machine translation model without the need for parallel corpora altogether. While similar reinforcement learning formulations for zero-shot translation were proposed before, there is a number of improvements we introduce. While prior research aimed at grounding the translation task in the physical world by evaluating agents on an image captioning task, we found that using a linguistic task is more sample-efficient. Natural language inference (also known as recognizing textual entailment) captures semantic properties of sentence pairs that are poorly correlated with semantic similarity, thus enforcing basic understanding of the role played by compositionality. It has been shown that models trained recognizing textual entailment produce high-quality general-purpose sentence embeddings transferrable to other tasks. We use stanford natural language inference (SNLI) dataset as well as its analogous datasets for French (XNLI) and Polish (CDSCorpus). Textual entailment corpora can be obtained relatively easily for any language, which makes our approach more extensible to low-resource languages than traditional approaches based on parallel corpora. We evaluated a number of reinforcement learning algorithms (including policy gradients and actor-critic) to solve the problem of translator’s policy optimization and found that our attempts yield some promising improvements over previous approaches to reinforcement-learning based zero-shot machine translation.

Keywords: agent-based language learning, low-resource translation, natural language inference, neural machine translation, reinforcement learning

Procedia PDF Downloads 93
81 The Use of Corpora in Improving Modal Verb Treatment in English as Foreign Language Textbooks

Authors: Lexi Li, Vanessa H. K. Pang

Abstract:

This study aims to demonstrate how native and learner corpora can be used to enhance modal verb treatment in EFL textbooks in mainland China. It contributes to a corpus-informed and learner-centered design of grammar presentation in EFL textbooks that enhances the authenticity and appropriateness of textbook language for target learners. The linguistic focus is will, would, can, could, may, might, shall, should, must. The native corpus is the spoken component of BNC2014 (hereafter BNCS2014). The spoken part is chosen because pedagogical purpose of the textbooks is communication-oriented. Using the standard query option of CQPweb, 5% of each of the nine modals was sampled from BNCS2014. The learner corpus is the POS-tagged Ten-thousand English Compositions of Chinese Learners (TECCL). All the essays under the 'secondary school' section were selected. A series of five secondary coursebooks comprise the textbook corpus. All the data in both the learner and the textbook corpora are retrieved through the concordance functions of WordSmith Tools (version, 5.0). Data analysis was divided into two parts. The first part compared the patterns of modal verbs in the textbook corpus and BNC2014 with respect to distributional features, semantic functions, and co-occurring constructions to examine whether the textbooks reflect the authentic use of English. Secondly, the learner corpus was analyzed in terms of the use (distributional features, semantic functions, and co-occurring constructions) and the misuse (syntactic errors, e.g., she can sings*.) of the nine modal verbs to uncover potential difficulties that confront learners. The analysis of distribution indicates several discrepancies between the textbook corpus and BNCS2014. The first four most frequent modal verbs in BNCS2014 are can, would, will, could, while can, will, should, could are the top four in the textbooks. Most strikingly, there is an unusually high proportion of can (41.1%) in the textbooks. The results on different meanings shows that will, would and must are the most problematic. For example, for will, the textbooks contain 20% more occurrences of 'volition' and 20% less of 'prediction' than those in BNCS2014. Regarding co-occurring structures, the textbooks over-represented the structure 'modal +do' across the nine modal verbs. Another major finding is that the structure of 'modal +have done' that frequently co-occur with could, would, should, and must is underused in textbooks. Besides, these four modal verbs are the most difficult for learners, as the error analysis shows. This study demonstrates how the synergy of native and learner corpora can be harnessed to improve EFL textbook presentation of modal verbs in a way that textbooks can provide not only authentic language used in natural discourse but also appropriate design tailed for the needs of target learners.

Keywords: English as Foreign Language, EFL textbooks, learner corpus, modal verbs, native corpus

Procedia PDF Downloads 110
80 Grammatical and Lexical Explorations on ‘Outer Circle’ Englishes and ‘Expanding Circle’ Englishes: A Corpus-Based Comparative Analysis

Authors: Orlyn Joyce D. Esquivel

Abstract:

This study analyzed 50 selected research papers from professional language and linguistic academic journals to portray the differences between Kachru’s (1994) outer circle and expanding circle Englishes. The selected outer circle Englishes include those of Bangladesh, Malaysia, the Philippines, India, and Singapore; and the selected expanding circle Englishes are those of China, Indonesia, Japan, Korea, and Thailand. The researcher built ten corpora (five research papers for each corpus) to represent each variety of Englishes. The corpora were examined under grammatical and lexical features using Modified English TreeTagger in Sketch Engine. Results revealed the distinct grammatical and lexical features through the table and textual analyses, illustrated from the most to least dominant linguistic elements. In addition, comparative analyses were done to distinguish the features of each of the selected Englishes. The Language Change Theory was used as a basis in the discussion. Hence, the findings suggest that the ‘outer circle’ Englishes and ‘expanding circle’ Englishes will continue to drift from International English.

Keywords: applied linguistics, English as a global language, expanding circle Englishes, global Englishes, outer circle Englishes

Procedia PDF Downloads 117
79 MicroRNA in Bovine Corpus Luteum during Early Pregnancy

Authors: Rreze Gecaj, Corina Schanzenbach, Benedikt Kirchner, Michael Pfaffl, Bajram Berisha

Abstract:

The maintenance of corpus lutem (CL) during early pregnancy in cattle is a critical and multifarious process. A luteotrophic mechanism originating from the embryo is widely accepted as the triggering signal for the CL maintenance. In the cattle, it is the interferon-tau (IFNT) secretion form conceptus that prevents CL regression and ensures progesterone production for the establishment of pregnancy. In addition to endocrine and paracrine signals, microRNA (miRNA) can also support CL sustainability during early pregnancy. MiRNA are small non-coding nucleic acids that regulate gene expression post-transcriptionally and are shown to be involved in the modulation of CL function. However, the examination of miRNAs in corpus luteum function at the early pregnancy still remains largely uncovered. This study aims at profiling the expression of miRNA in CL during the early pregnancy in cattle by comparing it with the CL form late cycle and with the regressed CL. Corpora lutea were assigned in two different groups during the cycle (C13 group, late CL: days 13-18 and C18, regressed CL group: day >18) and during the early pregnancy (group P: 1-2 month). The estrous cycle was determined by macroscopic examination and to age the fetus crown-rump length measurement was applied. A total of 9 corpora lutea from individual animals were included in the study, three corpora lutea for each group. MiRNAs population was profiled using small RNA next-generation sequencing and biologically significant miRNAs were evaluated for their differential expression using the DESeq2-methodology. We show that 6 differentially expressed miRNAs (bta-mir-2890, -2332, -2441-3p, -148b, -1248 and -29c) are common to both comparisons, P vs C13 and P vs C18. While for each stage individually we have identified unique miRNAs differentially expressed only for the given comparison. bta-miR-23a and -769 were unique miRNAs differentially expressed in P vs C13, whereas forty-four unique miRNAs were identified as differentially expressed in P vs C18. These data confirm that miRNAs are highly abundant in luteal tissue during early pregnancy and potentially regulate the CL maintenance at this stage of fetus development.

Keywords: bovine, corpus luteum, microRNA, pregnancy, RNA-Seq

Procedia PDF Downloads 222
78 The Value of Computerized Corpora in EFL Textbook Design: The Case of Modal Verbs

Authors: Lexi Li

Abstract:

This study aims to contribute to the field of how computer technology can be exploited to enhance EFL textbook design. Specifically, the study demonstrates how computerized native and learner corpora can be used to enhance modal verb treatment in EFL textbooks. The linguistic focus is will, would, can, could, may, might, shall, should, must. The native corpus is the spoken component of BNC2014 (hereafter BNCS2014). The spoken part is chosen because the pedagogical purpose of the textbooks is communication-oriented. Using the standard query option of CQPweb, 5% of each of the nine modals was sampled from BNCS2014. The learner corpus is the POS-tagged Ten-thousand English Compositions of Chinese Learners (TECCL). All the essays under the “secondary school” section were selected. A series of five secondary coursebooks comprise the textbook corpus. All the data in both the learner and the textbook corpora are retrieved through the concordance functions of WordSmith Tools (version, 5.0). Data analysis was divided into two parts. The first part compared the patterns of modal verbs in the textbook corpus and BNC2014 with respect to distributional features, semantic functions, and co-occurring constructions to examine whether the textbooks reflect the authentic use of English. Secondly, the learner corpus was compared with the textbook corpus in terms of the use (distributional features, semantic functions, and co-occurring constructions) in order to examine the degree of influence of the textbook on learners’ use of modal verbs. Moreover, the learner corpus was analyzed for the misuse (syntactic errors, e.g., she can sings*.) of the nine modal verbs to uncover potential difficulties that confront learners. The results indicate discrepancies between the textbook presentation of modal verbs and authentic modal use in natural discourse in terms of distributions of frequencies, semantic functions, and co-occurring structures. Furthermore, there are consistent patterns of use between the learner corpus and the textbook corpus with respect to the three above-mentioned aspects, except could, will and must, partially confirming the correlation between the frequency effects and L2 grammar acquisition. Further analysis reveals that the exceptions are caused by both positive and negative L1 transfer, indicating that the frequency effects can be intercepted by L1 interference. Besides, error analysis revealed that could, would, should and must are the most difficult for Chinese learners due to both inter-linguistic and intra-linguistic interference. The discrepancies between the textbook corpus and the native corpus point to a need to adjust the presentation of modal verbs in the textbooks in terms of frequencies, different meanings, and verb-phrase structures. Along with the adjustment of modal verb treatment based on authentic use, it is important for textbook writers to take into consideration the L1 interference as well as learners’ difficulties in their use of modal verbs. The present study is a methodological showcase of the combination both native and learner corpora in the enhancement of EFL textbook language authenticity and appropriateness for learners.

Keywords: EFL textbooks, learner corpus, modal verbs, native corpus

Procedia PDF Downloads 90
77 Ultrasonic Assessment of Corpora lutea and Plasma Progesterone Levels in Early Pregnant and Non Pregnant Cows

Authors: Abdurraouf O. Gaja, Salah Y. A. Al-Dahash, Guru Solmon Raju, Chikara Kubota

Abstract:

Corpus luteum cross sectional (by ultrasonography) and plasma progesterone (by DELFIA) were estimated in early pregnant and non pregnant cows on days 14th and 20th to 23rd post insemination. On day 14th, corpus luteum sectional area was 348.43 mm2 in pregnant and 387.84mm2 in non pregnant cows. Within days 20th to 23rd, corpus luteum sectional area ranged between 342.06 and 367.90 mm2 in pregnant and between 193.85 and 270.69 mm2 in non pregnant cows. Plasma progesterone level was 2.43 ng/ml in pregnant and 2.46 ng/ml in non pregnant cows on day 14th, while during days 20th to 23rd the level ranged between 2.47 and 2,84 ng/ml in pregnant and between 0.53 and 1.17 ng/ml in non pregnant cows. Results of both luteal tissue areas as well as plasma progesterone levels were highly significantly deferent (P<0.01) between pregnant and non pregnant cows during days 20th to 23rd, but there were no significant differences on day 14th. The correlation between CL cross-sectional area and plasma progesterone level was 0.4 in pregnant cows and 0.99 in non pregnant cow. It is clear, from this study, that ultrasonic assessment of corpora lutea is a viable alternative to determine plasma progesterone levels for early pregnancy diagnosis in cows.

Keywords: progesterone, ultrasonography, corpus luteum, pregnancy diagnosis, cow

Procedia PDF Downloads 276
76 Corpus Stylistics and Multidimensional Analysis for English for Specific Purposes Teaching and Assessment

Authors: Svetlana Strinyuk, Viacheslav Lanin

Abstract:

Academic English has become lingua franca for international scientific community which stimulates universities to introduce English for Specific Purposes (EAP) courses into curriculum. Teaching L2 EAP students might be fulfilled with corpus technologies and digital stylistics. A special software developed to reach the manifold task of teaching, assessing and researching academic writing of L2 students on basis of digital stylistics and multidimensional analysis was created. A set of annotations (style markers) – grammar, lexical and syntactic features most significant of academic writing was built. Contrastive comparison of two corpora “model corpus”, subject domain limited papers published by competent writers in leading academic journals, and “students’ corpus”, subject domain limited papers written by last year students allows to receive data about the features of academic writing underused or overused by L2 EAP student. Both corpora are tagged with a special software created in GATE Developer. Style markers within the framework of research might be replaced depending on the relevance and validity of the result which is achieved from research corpora. Thus, selecting relevant (high frequency) style markers and excluding less relevant, i.e. less frequent annotations, high validity of the model is achieved. Software allows to compare the data received from processing model corpus to students’ corpus and get reports which can be used in teaching and assessment. The less deviation from the model corpus students demonstrates in their writing the higher is academic writing skill acquisition. The research showed that several style markers (hedging devices) were underused by L2 EAP students whereas lexical linking devices were used excessively. A special software implemented into teaching of EAP courses serves as a successful visual aid, makes assessment more valid; it is indicative of the degree of writing skill acquisition, and provides data for further research.

Keywords: corpus technologies in EAP teaching, multidimensional analysis, GATE Developer, corpus stylistics

Procedia PDF Downloads 154
75 Testing the Simplification Hypothesis in Constrained Language Use: An Entropy-Based Approach

Authors: Jiaxin Chen

Abstract:

Translations have been labeled as more simplified than non-translations, featuring less diversified and more frequent lexical items and simpler syntactic structures. Such simplified linguistic features have been identified in other bilingualism-influenced language varieties, including non-native and learner language use. Therefore, it has been proposed that translation could be studied within a broader framework of constrained language, and simplification is one of the universal features shared by constrained language varieties due to similar cognitive-physiological and social-interactive constraints. Yet contradicting findings have also been presented. To address this issue, this study intends to adopt Shannon’s entropy-based measures to quantify complexity in language use. Entropy measures the level of uncertainty or unpredictability in message content, and it has been adapted in linguistic studies to quantify linguistic variance, including morphological diversity and lexical richness. In this study, the complexity of lexical and syntactic choices will be captured by word-form entropy and pos-form entropy, and a comparison will be made between constrained and non-constrained language use to test the simplification hypothesis. The entropy-based method is employed because it captures both the frequency of linguistic choices and their evenness of distribution, which are unavailable when using traditional indices. Another advantage of the entropy-based measure is that it is reasonably stable across languages and thus allows for a reliable comparison among studies on different language pairs. In terms of the data for the present study, one established (CLOB) and two self-compiled corpora will be used to represent native written English and two constrained varieties (L2 written English and translated English), respectively. Each corpus consists of around 200,000 tokens. Genre (press) and text length (around 2,000 words per text) are comparable across corpora. More specifically, word-form entropy and pos-form entropy will be calculated as indicators of lexical and syntactical complexity, and ANOVA tests will be conducted to explore if there is any corpora effect. It is hypothesized that both L2 written English and translated English have lower entropy compared to non-constrained written English. The similarities and divergences between the two constrained varieties may provide indications of the constraints shared by and peculiar to each variety.

Keywords: constrained language use, entropy-based measures, lexical simplification, syntactical simplification

Procedia PDF Downloads 57
74 Learning Vocabulary with SkELL: Developing a Methodology with University Students in Japan Using Action Research

Authors: Henry R. Troy

Abstract:

Corpora are becoming more prevalent in the language classroom, especially in the development of dictionaries and course materials. Nevertheless, corpora are still perceived by many educators as difficult to use directly in the classroom, a process which is also known as “data-driven learning” (DDL). Action research has been identified as a method by which DDL’s efficiency can be increased, but it is also an approach few studies on DDL have employed. Studies into the effectiveness of DDL in language education in Japan are also rare, and investigations focused more on student and teacher reactions rather than pre and post-test scores are rarer still. This study investigates the student and teacher reactions to the use of SkELL, a free online corpus designed to be user-friendly, for vocabulary learning at a university in Japan. Action research is utilized to refine the teaching methodology, with changes to the method based on student and teacher feedback received via surveys submitted after each of the four implementations of DDL. After some training, the students used tablets to study the target vocabulary autonomously in pairs and groups, with the teacher acting as facilitator. The results show that the students enjoyed using SkELL and felt it was effective for vocabulary learning, while the teaching methodology grew in efficiency throughout the course. These findings suggest that action research can be a successful method for increasing the efficacy of DDL in the language classroom, especially with teachers and students who are new to the practice.

Keywords: action research, corpus linguistics, data-driven learning, vocabulary learning

Procedia PDF Downloads 197
73 The Mirage of Progress? a Longitudinal Study of Japanese Students’ L2 Oral Grammar

Authors: Robert Long, Hiroaki Watanabe

Abstract:

This longitudinal study examines the grammatical errors of Japanese university students’ dialogues with a native speaker over an academic year. The L2 interactions of 15 Japanese speakers were taken from the JUSFC2018 corpus (April/May 2018) and the JUSFC2019 corpus (January/February). The corpora were based on a self-introduction monologue and a three-question dialogue; however, this study examines the grammatical accuracy found in the dialogues. Research questions focused on a possible significant difference in grammatical accuracy from the first interview session in 2018 and the second one the following year, specifically regarding errors in clauses per 100 words, global errors and local errors, and with specific errors related to parts of speech. The investigation also focused on which forms showed the least improvement or had worsened? Descriptive statistics showed that error-free clauses/errors per 100 words decreased slightly while clauses with errors/100 words increased by one clause. Global errors showed a significant decline, while local errors increased from 97 to 158 errors. For errors related to parts of speech, a t-test confirmed there was a significant difference between the two speech corpora with more error frequency occurring in the 2019 corpus. This data highlights the difficulty in having students self-edit themselves.

Keywords: clause analysis, global vs. local errors, grammatical accuracy, L2 output, longitudinal study

Procedia PDF Downloads 89
72 The Istrian Istrovenetian-Croatian Bilingual Corpus

Authors: Nada Poropat Jeletic, Gordana Hrzica

Abstract:

Bilingual conversational corpora represent a meaningful and the most comprehensive data source for investigating the genuine contact phenomena in non-monitored bi-lingual speech productions. They can be particularly useful for bilingual research since some features of bilingual interaction can hardly be accessed with more traditional methodologies (e.g., elicitation tasks). The method of language sampling provides the resources for describing language interaction in a bilingual community and/or in bilingual situations (e.g. code-switching, amount of languages used, number of languages used, etc.). To capture these phenomena in genuine communication situations, such sampling should be as close as possible to spontaneous communication. Bilingual spoken corpus design is methodologically demanding. Therefore this paper aims at describing the methodological challenges that apply to the corpus design of the conversational corpus design of the Istrian Istrovenetian-Croatian Bilingual Corpus. Croatian is the first official language of the Croatian-Italian officially bilingual Istria County, while Istrovenetian is a diatopic subvariety of Venetian, a longlasting lingua franca in the Istrian peninsula, the mother tongue of the members of the Italian National Community in Istria and the primary code of informal everyday communication among the Istrian Italophone population. Within the CLARIN infrastructure, TalkBank is being used, as it provides relevant procedures for designing and analyzing bilingual corpora. Furthermore, it allows public availability allows for easy replication of studies and cumulative progress as a research community builds up around the corpus, while the tools developed within the field of corpus linguistics enable easy retrieval and analysis of information. The method of language sampling employed is kept at the level of spontaneous communication, in order to maximise the naturalness of the collected conversational data. All speakers have provided written informed consent in which they agree to be recorded at a random point within the period of one month after signing the consent. Participants are administered a background questionnaire providing information about the socioeconomic status and the exposure and language usage in the participants social networks. Recording data are being transcribed, phonologically adapted within a standard-sized orthographic form, coded and segmented (speech streams are being segmented into communication units based on syntactic criteria) and are being marked following the CHAT transcription system and its associated CLAN suite of programmes within the TalkBank toolkit. The corpus consists of transcribed sound recordings of 36 bilingual speakers, while the target is to publish the whole corpus by the end of 2020, by sampling spontaneous conversations among approximately 100 speakers from all the bilingual areas of Istria for ensuring representativeness (the participants are being recruited across three generations of native bilingual speakers in all the bilingual areas of the peninsula). Conversational corpora are still rare in TalkBank, so the Corpus will contribute to BilingBank as a highly relevant and scientifically reliable resource for an internationally established and active research community. The impact of the research of communities with societal bilingualism will contribute to the growing body of research on bilingualism and multilingualism, especially regarding topics of language dominance, language attrition and loss, interference and code-switching etc.

Keywords: conversational corpora, bilingual corpora, code-switching, language sampling, corpus design methodology

Procedia PDF Downloads 105
71 Dialogism in Research Article Introductions Written by Iranian Non-Native and English Native Speaking Writers

Authors: Moharram Sharifi

Abstract:

Despite a growing interest in the study of the introduction section of Research Articles (RA), there have been few studies to investigate how academic writers engage with other voices and alternative positions in this academic genre. Therefore, the purpose of this study was to show how Native Speaker (NS) and (Non-Native Speaker (NNS) writers take positions and stances in research article introductions. For this purpose, Engagement resources based on the appraisal framework were investigated in sixty articles written by English NS and Iranian NNS published in applied linguistics journals. It was found that the mean occurrences of heteroglossic items in both corpora were larger than those of monoglossic items, but comparing the means of monoglossic engagements between the two corpora, it was revealed that NS writers’ corpus had larger mean occurrences of monoglossic engagements than NNS writers’ corpus implying the native’s stronger authorial stance in the texts. The results also revealed that there was no significant difference in the use of contractive and expansive engagements by NS writers (t (29) = -0.995, p>0.05), indicating a balanced use between the two options. However, the higher mean occurrences of expansive options compared with contractive options in the NNS corpus may suggest that NN writers open up more dialogic room for alternative positions in the RA introductions. The findings of this study may help writers to better perceive the creation of a strong authorial position using appropriate engagement resources in RA introductions.

Keywords: engagement, heteroglossic, monoglossic, introduction

Procedia PDF Downloads 9
70 A BERT-Based Model for Financial Social Media Sentiment Analysis

Authors: Josiel Delgadillo, Johnson Kinyua, Charles Mutigwe

Abstract:

The purpose of sentiment analysis is to determine the sentiment strength (e.g., positive, negative, neutral) from a textual source for good decision-making. Natural language processing in domains such as financial markets requires knowledge of domain ontology, and pre-trained language models, such as BERT, have made significant breakthroughs in various NLP tasks by training on large-scale un-labeled generic corpora such as Wikipedia. However, sentiment analysis is a strong domain-dependent task. The rapid growth of social media has given users a platform to share their experiences and views about products, services, and processes, including financial markets. StockTwits and Twitter are social networks that allow the public to express their sentiments in real time. Hence, leveraging the success of unsupervised pre-training and a large amount of financial text available on social media platforms could potentially benefit a wide range of financial applications. This work is focused on sentiment analysis using social media text on platforms such as StockTwits and Twitter. To meet this need, SkyBERT, a domain-specific language model pre-trained and fine-tuned on financial corpora, has been developed. The results show that SkyBERT outperforms current state-of-the-art models in financial sentiment analysis. Extensive experimental results demonstrate the effectiveness and robustness of SkyBERT.

Keywords: BERT, financial markets, Twitter, sentiment analysis

Procedia PDF Downloads 116
69 Lexical Collocations in Medical Articles of Non-Native vs Native English-Speaking Researchers

Authors: Waleed Mandour

Abstract:

This study presents multidimensional scrutiny of Benson et al.’s seven-category taxonomy of lexical collocations used by Egyptian medical authors and their peers of native-English speakers. It investigates 212 medical papers, all published during a span of 6 years (from 2013 to 2018). The comparison is held to the medical research articles submitted by native speakers of English (25,238 articles in total with over 103 million words) as derived from the Directory of Open Access Journals (a 2.7 billion-word corpus). The non-native speakers compiled corpus was properly annotated and marked-up manually by the researcher according to the standards of Weisser. In terms of statistical comparisons, though, deployed were the conventional frequency-based analysis besides the relevant criteria, such as association measures (AMs) in which LogDice is deployed as per the recommendation of Kilgariff et al. when comparing large corpora. Despite the terminological convergence in the subject corpora, comparison results confirm the previous literature of which the non-native speakers’ compositions reveal limited ranges of lexical collocations in terms of their distribution. However, there is a ubiquitous tendency of overusing the NS-high-frequency multi-words in all lexical categories investigated. Furthermore, Egyptian authors, conversely to their English-speaking peers, tend to embrace more collocations denoting quantitative rather than qualitative analyses in their produced papers. This empirical work, per se, contributes to the English for Academic Purposes (EAP) and English as a Lingua Franca in Academic settings (ELFA). In addition, there are pedagogical implications that would promote a better quality of medical research papers published in Egyptian universities.

Keywords: corpus linguistics, EAP, ELFA, lexical collocations, medical discourse

Procedia PDF Downloads 95
68 Circadian Disruption in Polycystic Ovary Syndrome Model Rats

Authors: Fangfang Wang, Fan Qu

Abstract:

Polycystic ovary syndrome (PCOS), the most common endocrinopathy among women of reproductive age, is characterized by ovarian dysfunction, hyperandrogenism and reduced fecundity. The aim of this study is to investigate whether the circadian disruption is involved in pathogenesis of PCOS in androgen-induced animal model. We established a rat model of PCOS using single subcutaneous injection with testosterone propionate on the ninth day after birth, and confirmed their PCOS-like phenotypes with vaginal smears, ovarian hematoxylin and eosin (HE) staining and serum androgen measurement. The control group rats received the vehicle only. Gene expression was detected by real-time quantitative PCR. (1) Compared with control group, PCOS model rats of 10-week group showed persistently keratinized vaginal cells, while all the control rats showed at least two consecutive estrous cycles. (2) Ovarian HE staining and histological examination showed that PCOS model rats of 10-week group presented many cystic follicles with decreased numbers of granulosa cells and corpora lutea in their ovaries, while the control rats had follicles with normal layers of granulosa cells at various stages of development and several generations of corpora lutea. (3) In the 10-week group, serum free androgen index was notably higher in PCOS model rats than controls. (4) Disturbed mRNA expression patterns of core clock genes were found in ovaries of PCOS model rats of 10-week group. Abnormal expression of key genes associated with circadian rhythm in ovary may be one of the mechanisms for ovarian dysfunction in PCOS model rats induced by androgen.

Keywords: polycystic ovary syndrome, androgen, animal model, circadian disruption

Procedia PDF Downloads 195
67 The Omani Learner of English Corpus: Source and Tools

Authors: Anood Al-Shibli

Abstract:

Designing a learner corpus is not an easy task to accomplish because dealing with learners’ language has many variables which might affect the results of any study based on learners’ language production (spoken and written). Also, it is very essential to systematically design a learner corpus especially when it is aimed to be a reference to language research. Therefore, designing the Omani Learner Corpus (OLEC) has undergone many explicit and systematic considerations. These criteria can be regarded as the foundation to design any learner corpus to be exploited effectively in language use and language learning studies. Added to that, OLEC is manually error-annotated corpus. Error-annotation in learner corpora is very essential; however, it is time-consuming and prone to errors. Consequently, a navigating tool is designed to help the annotators to insert errors’ codes in order to make the error-annotation process more efficient and consistent. To assure accuracy, error annotation procedure is followed to annotate OLEC and some preliminary findings are noted. One of the main results of this procedure is creating an error-annotation system based on the Omani learners of English language production. Because OLEC is still in the first stages, the primary findings are related to only one level of proficiency and one error type which is verb related errors. It is found that Omani learners in OLEC has the tendency to have more errors in forming the verb and followed by problems in agreement of verb. Comparing the results to other error-based studies indicate that the Omani learners tend to have basic verb errors which can found in lower-level of proficiency. To this end, it is essential to admit that examining learners’ errors can give insights to language acquisition and language learning and most errors do not happen randomly but they occur systematically among language learners.

Keywords: error-annotation system, error-annotation manual, learner corpora, verbs related errors

Procedia PDF Downloads 104
66 Partial Triphallia: The First Case Report of External and Internal Penile Triplication in a Cadaver

Authors: Madeleine Gadd, Rose How, Edward Mathews, John Buchanan, Vicky Cottrell, Andre Coetzee, Karuna Katti

Abstract:

Introduction: Triphallia, a congenital anomaly describing the presence of three distinct penile shafts, has been reported only once in the literature. This case report describes the serendipitous discovery of the first reported human case of partial orthotopic triphallia during cadaveric dissection. Case Summary: Despite the normal appearance of external genitalia on examination, the dissection of a 78-year-old male revealed a remarkable anatomical variation: two small supernumerary penises situated in a transverse orientation postero inferiorly to the primary penis. The main and the larger supernumerary penile shafts displayed their own corpora cavernosa and glans penis, sharing a single urethra, which coursed through the secondary penis prior to its passage through the primary penis. The smallest of the supernumerary penises was similar in dimension to the secondary penis, at 3.7cm long and 1.2cm wide (compared to the secondary penis at 3.8cm long and 1.3cm wide). However, it lacked a urethra and a typical arrangement of the corpora cavernosa and spongiosum, making this a case of partial triphallia rather than true triphallia. Conclusion: This case report provides a comprehensive anatomical description of partial triphallia in a cadaver, shedding light on the morphology, embryology, and clinical implications of this anomaly. This case report underscores the importance of meticulous anatomical dissections, particularly since, without dissection, this anatomical variation would have remained undiscovered. Although we can only speculate the functional implications of this condition, understanding such anatomical variations contributes to both knowledge of human anatomy and clinical management, should the condition be encountered in living individuals.

Keywords: triphallia, diphallia, congenital abnormalities, genitourinary abnormalities, urology

Procedia PDF Downloads 32
65 Readability Facing the Irreducible Otherness: Translation as a Third Dimension toward a Multilingual Higher Education

Authors: Noury Bakrim

Abstract:

From the point of view of language morphodynamics, interpretative Readability of the text-result (the stasis) is not the external hermeneutics of its various potential reading events but the paradigmatic, semantic immanence of its dynamics. In other words, interpretative Readability articulates the potential tension between projection (intentionality of the discursive event) and the result (Readability within the syntagmatic stasis). We then consider that translation represents much more a metalinguistic conversion of neurocognitive bilingual sub-routines and modular relations than a semantic equivalence. Furthermore, the actualizing Readability (the process of rewriting a target text within a target language/genre) builds upon the descriptive level between the generative syntax/semantic from and its paradigmatic potential translatability. Translation corpora reveal the evidence of a certain focusing on the positivist stasis of the source text at the expense of its interpretative Readability. For instance, Fluchere's brilliant translation of Miller's Tropic of cancer into French realizes unconsciously an inversion of the hierarchical relations between Life Thought and Fable: From Life Thought (fable) into Fable (Life Thought). We could regard the translation of Bernard Kreiss basing on Canetti's work die englischen Jahre (les annees anglaises) as another inversion of the historical scale from individual history into Hegelian history. In order to describe and test both translation process and result, we focus on the pedagogical practice which enables various principles grounding in interpretative/actualizing Readability. Henceforth, establishing the analytical uttering dynamics of the source text could be widened by other practices. The reversibility test (target - source text) or the comparison with a second translation in a third language (tertium comparationis A/B and A/C) point out the evidence of an impossible event. Therefore, it doesn't imply an uttering idealistic/absolute source but the irreducible/non-reproducible intentionality of its production event within the experience of world/discourse. The aim of this paper is to conceptualize translation as the tension between interpretative and actualizing Readability in a new approach grounding in morphodynamics of language and Translatability (mainly into French) within literary and non-literary texts articulating theoretical and described pedagogical corpora.

Keywords: readability, translation as deverbalization, translation as conversion, Tertium Comparationis, uttering actualization, translation pedagogy

Procedia PDF Downloads 134
64 Variables, Annotation, and Metadata Schemas for Early Modern Greek

Authors: Eleni Karantzola, Athanasios Karasimos, Vasiliki Makri, Ioanna Skouvara

Abstract:

Historical linguistics unveils the historical depth of languages and traces variation and change by analyzing linguistic variables over time. This field of linguistics usually deals with a closed data set that can only be expanded by the (re)discovery of previously unknown manuscripts or editions. In some cases, it is possible to use (almost) the entire closed corpus of a language for research, as is the case with the Thesaurus Linguae Graecae digital library for Ancient Greek, which contains most of the extant ancient Greek literature. However, concerning ‘dynamic’ periods when the production and circulation of texts in printed as well as manuscript form have not been fully mapped, representative samples and corpora of texts are needed. Such material and tools are utterly lacking for Early Modern Greek (16th-18th c.). In this study, the principles of the creation of EMoGReC, a pilot representative corpus of Early Modern Greek (16th-18th c.) are presented. Its design follows the fundamental principles of historical corpora. The selection of texts aims to create a representative and balanced corpus that gives insight into diachronic, diatopic and diaphasic variation. The pilot sample includes data derived from fully machine-readable vernacular texts, which belong to 4-5 different textual genres and come from different geographical areas. We develop a hierarchical linguistic annotation scheme, further customized to fit the characteristics of our text corpus. Regarding variables and their variants, we use as a point of departure the bundle of twenty-four features (or categories of features) for prose demotic texts of the 16th c. Tags are introduced bearing the variants [+old/archaic] or [+novel/vernacular]. On the other hand, further phenomena that are underway (cf. The Cambridge Grammar of Medieval and Early Modern Greek) are selected for tagging. The annotated texts are enriched with metalinguistic and sociolinguistic metadata to provide a testbed for the development of the first comprehensive set of tools for the Greek language of that period. Based on a relational management system with interconnection of data, annotations, and their metadata, the EMoGReC database aspires to join a state-of-the-art technological ecosystem for the research of observed language variation and change using advanced computational approaches.

Keywords: early modern Greek, variation and change, representative corpus, diachronic variables.

Procedia PDF Downloads 27