Search results for: spoken corpus
585 The Effect of Problem-Based Mobile-Assisted Tasks on Spoken Intelligibility of English as a Foreign Language Learners
Authors: Loghman Ansarian, Teoh Mei Lin
Abstract:
In an attempt to increase oral proficiency of Iranian EFL learners, the researchers compared the effect of problem-based mobile-assisted language learning with the conventional language learning approach (Communicative Language Teaching) in Iran. The experimental group (n=37) went through PBL instruction and the control group (n=33) went through conventional instruction. The results of quantitative data analysis after 26 sessions of treatment revealed that PBL could positively affect participants' knowledge of grammar, vocabulary, spoken fluency, and pronunciation; however, in terms of task achievement, no significant effect was found. This study can have pedagogical implications for language teachers, and material developers.Keywords: problem-based learning, spoken intelligibility, Iranian EFL context, cognitive learning
Procedia PDF Downloads 174584 A Corpus-Based Analysis of "MeToo" Discourse in South Korea: Coverage Representation in Korean Newspapers
Authors: Sun-Hee Lee, Amanda Kraley
Abstract:
The “MeToo” movement is a social movement against sexual abuse and harassment. Though the hashtag went viral in 2017 following different cultural flashpoints in different countries, the initial response was quiet in South Korea. This radically changed in January 2018, when a high-ranking senior prosecutor, Seo Ji-hyun, gave a televised interview discussing being sexually assaulted by a colleague. Acknowledging public anger, particularly among women, on the long-existing problems of sexual harassment and abuse, the South Korean media have focused on several high-profile cases. Analyzing the media representation of these cases is a window into the evolving South Korean discourse around “MeToo.” This study presents a linguistic analysis of “MeToo” discourse in South Korea by utilizing a corpus-based approach. The term corpus (pl. corpora) is used to refer to electronic language data, that is, any collection of recorded instances of spoken or written language. A “MeToo” corpus has been collected by extracting newspaper articles containing the keyword “MeToo” from BIGKinds, big data analysis, and service and Nexis Uni, an online academic database search engine, to conduct this language analysis. The corpus analysis explores how Korean media represent accusers and the accused, victims and perpetrators. The extracted data includes 5,885 articles from four broadsheet newspapers (Chosun, JoongAng, Hangyore, and Kyunghyang) and 88 articles from two Korea-based English newspapers (Korea Times and Korea Herald) between January 2017 and November 2020. The information includes basic data analysis with respect to keyword frequency and network analysis and adds refined examinations of select corpus samples through naming strategies, semantic relations, and pragmatic properties. Along with the exponential increase of the number of articles containing the keyword “MeToo” from 104 articles in 2017 to 3,546 articles in 2018, the network and keyword analysis highlights ‘US,’ ‘Harvey Weinstein’, and ‘Hollywood,’ as keywords for 2017, with articles in 2018 highlighting ‘Seo Ji-Hyun, ‘politics,’ ‘President Moon,’ ‘An Ui-Jeong, ‘Lee Yoon-taek’ (the names of perpetrators), and ‘(Korean) society.’ This outcome demonstrates the shift of media focus from international affairs to domestic cases. Another crucial finding is that word ‘defamation’ is widely distributed in the “MeToo” corpus. This relates to the South Korean legal system, in which a person who defames another by publicly alleging information detrimental to their reputation—factual or fabricated—is punishable by law (Article 307 of the Criminal Act of Korea). If the defamation occurs on the internet, it is subject to aggravated punishment under the Act on Promotion of Information and Communications Network Utilization and Information Protection. These laws, in particular, have been used against accusers who have publicly come forward in the wake of “MeToo” in South Korea, adding an extra dimension of risk. This corpus analysis of “MeToo” newspaper articles contributes to the analysis of the media representation of the “MeToo” movement and sheds light on the shifting landscape of gender relations in the public sphere in South Korea.Keywords: corpus linguistics, MeToo, newspapers, South Korea
Procedia PDF Downloads 223583 Passive Voice in SLA: Armenian Learners’ Case Study
Authors: Emma Nemishalyan
Abstract:
It is believed that learners’ mother tongue (L1 hereafter) has a huge impact on their second language acquisition (L2 hereafter). This hypothesis has been exposed to both positive and negative criticism. Based on research results of a wide range of learners’ corpora (Chinese, Japanese, Spanish among others) the hypothesis has either been proved or disproved. However, no such study has been conducted on the Armenian learners. The aim of this paper is to understand the implication of the hypothesis on the Armenian learners’ corpus in terms of the use of the passive voice. To this end, the method of Contrastive Interlanguage Analysis (hereafter CIA) has been used on native speakers’ corpus (Louvain Corpus of Native English Essays (LOCNESS)) and Armenian learners’ corpus which has been compiled by me in compliance with International Corpus of Learner English (ICLE) guidelines. CIA compares the interlanguage (the language produced by learners) with the one produced by native speakers. With the help of this method, it is possible not only to highlight the mistakes that learners make, but also to underline the under or overuses. The choice of the grammar issue (passive voice) is conditioned by the fact that typologically Armenian and English are drastically different as they belong to different branches. Moreover, the passive voice is considered to be one of the most problematic grammar topics to be acquired by learners of the English language. Based on this difference, we hypothesized that Armenian learners would either overuse or underuse some types of the passive voice. With the help of Lancsbox software, we have identified the frequency rates of passive voice usage in LOCNESS and Armenian learners’ corpus to understand whether the latter have the same usage pattern of the passive voice as the native speakers. Secondly, we have identified the types of the passive voice used by the Armenian leaners trying to track down the reasons in their mother tongue. The results of the study showed that Armenian learners underused the passive voices in contrast to native speakers. Furthermore, the hypothesis that learners’ L1 has an impact on learners’ L2 acquisition and production was proved.Keywords: corpus linguistics, applied linguistics, second language acquisition, corpus compilation
Procedia PDF Downloads 107582 The Repetition of New Words and Information in Mandarin-Speaking Children: A Corpus-Based Study
Authors: Jian-Jun Gao
Abstract:
Repetition is used for a variety of functions in conversation. When young children first learn to speak, they often repeat words from the adult’s recent utterance with the learning and social function. The objective of this study was to ascertain whether the repetitions are equivalent in indicating attention to new words and the initial repeat of information in conversation. Based on the observation of naturally occurring language use in Taiwan Corpus of Child Mandarin (TCCM), the results in this study provided empirical support to the previous findings that children are more likely to repeat new words they are offered than to repeat new information. When children get older, there would be a drop in the repetition of both new words and new information.Keywords: acquisition, corpus, mandarin, new words, new information, repetition
Procedia PDF Downloads 148581 Chinese Students’ Use of Corpus Tools in an English for Academic Purposes Writing Course: Influence on Learning Behaviour, Performance Outcomes and Perceptions
Authors: Jingwen Ou
Abstract:
Writing for academic purposes in a second or foreign language poses a significant challenge for non-native speakers, particularly at the tertiary level, where English academic writing for L2 students is often hindered by difficulties in academic discourse, including vocabulary, academic register, and organization. The past two decades have witnessed a rising popularity in the application of the data-driven learning (DDL) approach in EAP writing instruction. In light of such a trend, this study aims to enhance the integration of DDL into English for academic purposes (EAP) writing classrooms by investigating the perception of Chinese college students regarding the use of corpus tools for improving EAP writing. Additionally, the research explores their corpus consultation behaviors during training to provide insights into corpus-assisted EAP instruction for DDL practitioners. Given the uprising popularity of DDL, this research aims to investigate Chinese university students’ use of corpus tools with three main foci: 1) the influence of corpus tools on learning behaviours, 2) the influence of corpus tools on students’ academic writing performance outcomes, and 3) students’ perceptions and potential perceptional changes towards the use of such tools. Three corpus tools, CQPWeb, Sketch Engine, and LancsBox X, are selected for investigation due to the scarcity of empirical research on patterns of learners’ engagement with a combination of multiple corpora. The research adopts a pre-test / post-test design for the evaluation of students’ academic writing performance before and after the intervention. Twenty participants will be divided into two groups: an intervention and a non-intervention group. Three corpus training workshops will be delivered at the beginning, middle, and end of a semester. An online survey and three separate focus group interviews are designed to investigate students’ perceptions of the use of corpus tools for improving academic writing skills, particularly the rhetorical functions in different essay sections. Insights from students’ consultation sessions indicated difficulties with DDL practice, including insufficiency of time to complete all tasks, struggle with technical set-up, unfamiliarity with the DDL approach and difficulty with some advanced corpus functions. Findings from the main study aim to provide pedagogical insights and training resources for EAP practitioners and learners.Keywords: corpus linguistics, data-driven learning, English for academic purposes, tertiary education in China
Procedia PDF Downloads 57580 Corpus-Based Model of Key Concepts Selection for the Master English Language Course "Government Relations"
Authors: Elena Pozdnyakova
Abstract:
“Government Relations” is a field of knowledge presently taught at the majority of universities around the globe. English as the default language can become the language of teaching since the issues discussed are both global and national in character. However for this field of knowledge key concepts and their word representations in English don’t often coincide with those in other languages. International master’s degree students abroad as well as students, taught the course in English at their national universities, are exposed to difficulties, connected with correct conceptualizing of terminology of GR in British and American academic traditions. The study was carried out during the GR English language course elaboration (pilot research: 2013 -2015) at Moscow State Institute of Foreign Relations (University), Russian Federation. Within this period, English language instructors designed and elaborated the three-semester course of GR. Methodologically the course design was based on elaboration model with the special focus on conceptual elaboration sequence and theoretical elaboration sequence. The course designers faced difficulties in concept selection and theoretical elaboration sequence. To improve the results and eliminate the problems with concept selection, a new, corpus-based approach was worked out. The computer-based tool WordSmith 6.0 was used with the aim to build a model of key concept selection. The corpus of GR English texts consisted of 1 million words (the study corpus). The approach was based on measuring effect size, i.e. the percent difference of the frequency of a word in the study corpus when compared to that in the reference corpus. The results obtained proved significant improvement in the process of concept selection. The corpus-based model also facilitated theoretical elaboration of teaching materials.Keywords: corpus-based study, English as the default language, key concepts, measuring effect size, model of key concept selection
Procedia PDF Downloads 305579 OPEN-EmoRec-II-A Multimodal Corpus of Human-Computer Interaction
Authors: Stefanie Rukavina, Sascha Gruss, Steffen Walter, Holger Hoffmann, Harald C. Traue
Abstract:
OPEN-EmoRecII is an open multimodal corpus with experimentally induced emotions. In the first half of the experiment, emotions were induced with standardized picture material and in the second half during a human-computer interaction (HCI), realized with a wizard-of-oz design. The induced emotions are based on the dimensional theory of emotions (valence, arousal and dominance). These emotional sequences - recorded with multimodal data (mimic reactions, speech, audio and physiological reactions) during a naturalistic-like HCI-environment one can improve classification methods on a multimodal level. This database is the result of an HCI-experiment, for which 30 subjects in total agreed to a publication of their data including the video material for research purposes. The now available open corpus contains sensory signal of: video, audio, physiology (SCL, respiration, BVP, EMG Corrugator supercilii, EMG Zygomaticus Major) and mimic annotations.Keywords: open multimodal emotion corpus, annotated labels, intelligent interaction
Procedia PDF Downloads 414578 Compilation and Statistical Analysis of an Arabic-English Legal Corpus in Sketch Engine
Authors: C. Brierley, H. El-Farahaty, A. Farhan
Abstract:
The Leeds Parallel Corpus of Arabic-English Constitutions is a parallel corpus for the Arabic legal domain. Analysis of legal language via Corpus Linguistics techniques is an important development. In legal proceedings, a corpus-based approach to disambiguating meaning is set to replace the dictionary as an interpretative tool, and legal scholarship in the States is now attuned to the potential for Text Analytics over vast quantities of text-based legal material, following the business and medical industries. This trend is reflected in Europe: the interdisciplinary research group in Computer Assisted Legal Linguistics mines big data collections of legal and non-legal texts to analyse: legal interpretations; legal discourse; the comprehensibility of legal texts; conflict resolution; and linguistic human rights. This paper focuses on ‘dignity’ as an important aspect of the overarching concept of human rights in current constitutions across the Arab world. We have compiled a parallel, Arabic-English raw text corpus (169,861 Arabic words and 205,893 English words) from reputable websites such as the World Intellectual Property Organisation and CONSTITUTE, and uploaded and queried our corpus in Sketch Engine. Our most challenging task was sentence-level alignment of Arabic-English data. This entailed manual intervention to ensure correspondence on a one-to-many basis since Arabic sentences differ from English in length and punctuation. We have searched for morphological variants of ‘dignity’ (رامة ك, karāma) in the Arabic data and inspected their English translation equivalents. The term occurs most frequently in the Sudanese constitution (10 instances), and not at all in the constitution of Palestine. Its most frequent collocate, determined via the logDice statistic in Sketch Engine, is ‘human’ as in ‘human dignity’.Keywords: Arabic constitution, corpus-based legal linguistics, human rights, parallel Arabic-English legal corpora
Procedia PDF Downloads 181577 Spatial Deictics in Face-to-Face Communication: Findings in Baltic Languages
Authors: Gintare Judzentyte
Abstract:
The present research is aimed to discuss semantics and pragmatics of spatial deictics (deictic adverbs of place and demonstrative pronouns) in the Baltic languages: in spoken Lithuanian and in spoken Latvian. The following objectives have been identified to achieve the aim: 1) to determine the usage of adverbs of place in spoken Lithuanian and Latvian and to verify their meanings in face-to-face communication; 2) to determine the usage of demonstrative pronouns in spoken Lithuanian and Latvian and to verify their meanings in face-to-face communication; 3) to compare the systems between the two spoken languages and to identify the main tendencies. As meanings of demonstratives (adverbs of place and demonstrative pronouns) are context-bound, it is necessary to verify their usage in spontaneous interaction. Besides, deictic gestures play a very important role in face-to-face communication. Therefore, an experimental method is necessary to collect the data. Video material representing spoken Lithuanian and spoken Latvian was recorded by means of the method of a qualitative interview (a semi-structured interview: an empirical research is all about asking right questions). The collected material was transcribed and evaluated taking into account several approaches: 1) physical distance (location of the referent, visual accessibility of the referent); 2) deictic gestures (the combination of language and gesture is especially characteristic of the exophoric use); 3) representation of mental spaces in physical space (a speaker sometimes wishes to mark something that is psychically close as psychologically distant and vice versa). The research of the collected data revealed that in face-to-face communication the participants choose deictic adverbs of place instead of demonstrative pronouns to locate/identify entities in situations where the demonstrative pronouns would be expected in spoken Lithuanian and in spoken Latvian. The analysis showed that visual accessibility of the referent is very important in face-to-face communication, but the main criterion while localizing objects and entities is the need for contrast: lith. čia ‘here’, šis ‘this’, latv. šeit ‘here’, šis ‘this’ usually identify distant entities and are used instead of distal demonstratives (lith. ten ‘there’, tas ‘that’, latv. tur ‘there’, tas ‘that’), because the referred objects/subjects contrast to further entities. Furthermore, the interlocutors in examples from a spontaneously situated interaction usually extend their space and can refer to a ‘distal’ object/subject with a ‘proximal’ demonstrative based on the psychological choice. As the research of the spoken Baltic languages confirmed, the choice of spatial deictics in face-to-face communication is strongly effected by a complex of criteria. Although there are some main tendencies, the exact meaning of spatial deictics in the spoken Baltic languages is revealed and is relevant only in a certain context.Keywords: Baltic languages, face-to-face communication, pragmatics, semantics, spatial deictics
Procedia PDF Downloads 289576 The Application of Cognitive Linguistics to Teaching EFL Students to Understand Spoken Coinages: Based on an Experiment with Speakers of Russian
Authors: Ekaterina Lukianchenko
Abstract:
The present article addresses the nuances of teaching English vocabulary to Russian-speaking students. The experiment involving 39 participants aged 17 to 21 proves that the key to understanding spoken coinages is not only the knowledge of their constituents, but rather the understanding of the context and co-text. The volunteers who took part knew the constituents, but did not know the meaning of the words. The assumption of the authors consists in the fact that the structure of the concept has a direct relation with the form of the particular vocabulary unit, but its form is secondary to its meaning, if the word is a spoken coinage, which is partly proved by the fact that in modern slang words have multiple meanings, as well as one notion can have various embodiments that have virtually nothing in common. The choice of vocabulary items that youngsters use is not exactly arbitrary, but, even if complex nominals are taken into consideration, whose meaning seems clear, as it looks like a sum of their constituents’ meanings, they are still impossible to understand without any context or co-text, as a lot of them are idiomatic, non-transparent. It is further explained what methods might be effective in teaching students how to deal with new words they encounter in real-life situations and how student’s knowledge of vocabulary might be enhanced.Keywords: spoken language, cognitive linguistics, complex nominals, nominals with the incorporated object, concept, EFL, communicative language teaching
Procedia PDF Downloads 276575 Corpus-Assisted Study of Gender Related Tiger Metaphors in the Chinese Context
Authors: Na Xiao
Abstract:
Animal metaphors have many different connotations, ranging from loving emotions to derogatory epithets, but gender expressions using animal metaphors are often imbalanced. Generally, animal metaphors related to females tend to be negative. Little known about the reasons for the negative expressions of animal female metaphors in Chinese contexts still have not been quantified. The Modern Chinese Corpus at the Center for Chinese Linguistics at Peking University (CCL Corpus) provided the data for this research, which aims to identify the influencing variables of gender differences in the description of animal metaphors mapping humans in Chinese by observing the percentage of "tiger" metaphor, which is based on the conceptual metaphor theory. A quantitative research method was used in this study to statistically examine the gender attitude percentage of the "tiger" metaphor using corpus data. This study has proved that the tiger metaphors associated with humans in the Chinese context tend to be negative. Importantly, this study has also shown that the high proportion of tiger metaphorical idioms is what causes the high proportion of negative tiger metaphors that are related to women. This finding can be used as crucial information for future studies on other gender-related animal metaphorical idioms and can offer additional insights for understanding trends in other animal metaphors.Keywords: Chinese, CCL corpus, gender differences, metaphorical idioms, tigers
Procedia PDF Downloads 107574 Redundancy in Malay Morphology: School Grammar versus Corpus Grammar
Authors: Zaharani Ahmad, Nor Hashimah Jalaluddin
Abstract:
The aim of this paper is to examine and identify the issue of linguistic redundancy in two competing grammars of Malay, namely the school grammar and the corpus grammar. The former is a normative grammar which is formally and prescriptively taught in the classroom, whereas the latter is a descriptive grammar that is informally acquired and mastered by the students as native speakers of the language outside the classroom. Corpus grammar is depicted based on its actual used in natural occurring texts, as attested in the corpus. It is observed that the grammar taught in schools is incompatible with the grammar used in the corpus. For instance, a noun phrase containing nominal reduplicated form which denotes plurality (i.e. murid-murid ‘students’ which is derived from murid ‘student’) and a modifier categorized as quantifiers (i.e. semua ‘all’, seluruh ‘entire’, and kebanyakan ‘most’) is not acceptable in the school grammar because the formation (i.e. semua murid-murid ‘all the students’ kebanyakan pelajar-pelajar ‘most of the students’) is claimed to be redundant, and redundancy is prohibited in the grammar. Redundancy is generally construed as the property of speech and language by which more information is provided than is precisely required for the message to be understood, so that, if some information is omitted, the remaining information will still be sufficient for the message to be comprehended. Thus, the correct construction to be used is strictly the reduplicated form (i.e. murid-murid ‘students’) or the quantifier plus the root (i.e. semua murid ‘all the students’) with the intention that the grammatical meaning of plural is not repeated. Nevertheless, the so-called redundant form (i.e. kebanyakan pelajar-pelajar ‘most of the students’) is frequently used in the corpus grammar. This study shows that there are a number of redundant forms occur in the morphology of the language, particularly in affixation, reduplication and combination of both. Apparently, the so-called redundancy has grammatical and socio-cultural functions in communication that is to give emphasis and to stress the importance of the information delivered by the speakers or writers.Keywords: corpus grammar, morphology, redundancy, school grammar
Procedia PDF Downloads 340573 The Automatisation of Dictionary-Based Annotation in a Parallel Corpus of Old English
Authors: Ana Elvira Ojanguren Lopez, Javier Martin Arista
Abstract:
The aims of this paper are to present the automatisation procedure adopted in the implementation of a parallel corpus of Old English, as well as, to assess the progress of automatisation with respect to tagging, annotation, and lemmatisation. The corpus consists of an aligned parallel text with word-for-word comparison Old English-English that provides the Old English segment with inflectional form tagging (gloss, lemma, category, and inflection) and lemma annotation (spelling, meaning, inflectional class, paradigm, word-formation and secondary sources). This parallel corpus is intended to fill a gap in the field of Old English, in which no parallel and/or lemmatised corpora are available, while the average amount of corpus annotation is low. With this background, this presentation has two main parts. The first part, which focuses on tagging and annotation, selects the layouts and fields of lexical databases that are relevant for these tasks. Most information used for the annotation of the corpus can be retrieved from the lexical and morphological database Nerthus and the database of secondary sources Freya. These are the sources of linguistic and metalinguistic information that will be used for the annotation of the lemmas of the corpus, including morphological and semantic aspects as well as the references to the secondary sources that deal with the lemmas in question. Although substantially adapted and re-interpreted, the lemmatised part of these databases draws on the standard dictionaries of Old English, including The Student's Dictionary of Anglo-Saxon, An Anglo-Saxon Dictionary, and A Concise Anglo-Saxon Dictionary. The second part of this paper deals with lemmatisation. It presents the lemmatiser Norna, which has been implemented on Filemaker software. It is based on a concordance and an index to the Dictionary of Old English Corpus, which comprises around three thousand texts and three million words. In its present state, the lemmatiser Norna can assign lemma to around 80% of textual forms on an automatic basis, by searching the index and the concordance for prefixes, stems and inflectional endings. The conclusions of this presentation insist on the limits of the automatisation of dictionary-based annotation in a parallel corpus. While the tagging and annotation are largely automatic even at the present stage, the automatisation of alignment is pending for future research. Lemmatisation and morphological tagging are expected to be fully automatic in the near future, once the database of secondary sources Freya and the lemmatiser Norna have been completed.Keywords: corpus linguistics, historical linguistics, old English, parallel corpus
Procedia PDF Downloads 211572 Statistical Comparison of Machine and Manual Translation: A Corpus-Based Study of Gone with the Wind
Authors: Yanmeng Liu
Abstract:
This article analyzes and compares the linguistic differences between machine translation and manual translation, through a case study of the book Gone with the Wind. As an important carrier of human feeling and thinking, the literature translation poses a huge difficulty for machine translation, and it is supposed to expose distinct translation features apart from manual translation. In order to display linguistic features objectively, tentative uses of computerized and statistical evidence to the systematic investigation of large scale translation corpora by using quantitative methods have been deployed. This study compiles bilingual corpus with four versions of Chinese translations of the book Gone with the Wind, namely, Piao by Chunhai Fan, Piao by Huairen Huang, translations by Google Translation and Baidu Translation. After processing the corpus with the software of Stanford Segmenter, Stanford Postagger, and AntConc, etc., the study analyzes linguistic data and answers the following questions: 1. How does the machine translation differ from manual translation linguistically? 2. Why do these deviances happen? This paper combines translation study with the knowledge of corpus linguistics, and concretes divergent linguistic dimensions in translated text analysis, in order to present linguistic deviances in manual and machine translation. Consequently, this study provides a more accurate and more fine-grained understanding of machine translation products, and it also proposes several suggestions for machine translation development in the future.Keywords: corpus-based analysis, linguistic deviances, machine translation, statistical evidence
Procedia PDF Downloads 142571 Theater Metaphor in Event Quantification: A Corpus Study
Authors: Zhuo Jing-Schmidt, Jun Lang
Abstract:
Numeral classifiers are common in Asian languages. Research on numeral classifiers primarily focuses on noun classifiers that quantify and individuate nominal referents. There is a scarcity of research on event quantification using verb classifiers. This study aims to understand the semantic and conceptual basis of event quantification in Chinese. From a usage-based Construction Grammar perspective, this study presents a corpus analysis of event quantification in Chinese. Drawing on a large balanced corpus of contemporary Chinese, we analyze 667 NOUN col-lexemes totaling 31136 tokens of a productive numeral classifier construction in Chinese. Using collostructional analysis of the collexemes, the results show that the construction quantifies and classifies dramatic events using a theater-based conceptual metaphor. We argue that the usage patterns reflect the cultural entrenchment of theater as in Chinese conceptualization and the construal of theatricality in linguistic expression. The study has implications for cognitive semantics and construction grammar.Keywords: event quantification, classifier, corpus, metaphor
Procedia PDF Downloads 84570 An Online Corpus-Based Bilingual Collocations Dictionary for Second/Foreign Language Learners
Authors: Adriane Orenha-Ottaiano
Abstract:
Collocations are conventionalized, recurrent and arbitrary lexical combinations. Due to the fact that they are highly specific for a particular language and may be contextually restricted, collocations pose a problem to EFL/ESL learners with regard to production or encoding. Taking that into account, the compilation of monolingual and bilingual collocations dictionaries for the referred audience is highly crucial and significant. Thus, the aim of this paper is to discuss the importance of the compilation of an Online Corpus-based Bilingual Collocations Dictionary, in the English-Portuguese and Portuguese-English directions. On a first phase, with the use of WordSmith Tools, the collocations were extracted from a Translation Learner Corpus (TLC), a parallel corpus made up of university students’ translations in the Portuguese-English direction, with approximately 100,000 words. In a second stage, based on the keywords analyzed from the TLC, more collocational patterns were extracted using the Sketch Engine. In order to include more collocations as well as to ensure dictionary users will have access to more frequent and recurrent collocations, we also use the frequency list from The Corpus of Contemporary American English, with the purpose of extracting more patterns. The dictionary focuses on all types of collocations (verbal, noun, adjectival and adverbial collocations), in order to help the referred audience use them more accurately and productively – so far the dictionary has more than 330 entries, and more than 3,500 collocations extracted. The idea of having the proposed dictionary in online format may allow to incorporate more qualitatively and quantitatively collocational information. Besides, more examples may be included, different from conventional printed collocations dictionaries. Being the first bilingual collocations dictionary in the aforementioned directions, it is hoped to achieve the challenge of meeting learners’ collocational needs as the collocations have been selected according to learners’ difficulties regarding the use of collocations.Keywords: Corpus-Based Collocations Dictionary, Collocations , Bilingual Collocations Dictionary, Collocational Patterns
Procedia PDF Downloads 309569 Corpus Linguistics as a Tool for Translation Studies Analysis: A Bilingual Parallel Corpus of Students’ Translations
Authors: Juan-Pedro Rica-Peromingo
Abstract:
Nowadays, corpus linguistics has become a key research methodology for Translation Studies, which broadens the scope of cross-linguistic studies. In the case of the study presented here, the approach used focuses on learners with little or no experience to study, at an early stage, general mistakes and errors, the correct or incorrect use of translation strategies, and to improve the translational competence of the students. Led by Sylviane Granger and Marie-Aude Lefer of the Centre for English Corpus Linguistics of the University of Louvain, the MUST corpus (MUltilingual Student Translation Corpus) is an international project which brings together partners from Europe and worldwide universities and connects Learner Corpus Research (LCR) and Translation Studies (TS). It aims to build a corpus of translations carried out by students including both direct (L2 > L1) an indirect (L1 > L2) translations, from a great variety of text types, genres, and registers in a wide variety of languages: audiovisual translations (including dubbing, subtitling for hearing population and for deaf population), scientific, humanistic, literary, economic and legal translation texts. This paper focuses on the work carried out by the Spanish team from the Complutense University (UCMA), which is part of the MUST project, and it describes the specific features of the corpus built by its members. All the texts used by UCMA are either direct or indirect translations between English and Spanish. Students’ profiles comprise translation trainees, foreign language students with a major in English, engineers studying EFL and MA students, all of them with different English levels (from B1 to C1); for some of the students, this would be their first experience with translation. The MUST corpus is searchable via Hypal4MUST, a web-based interface developed by Adam Obrusnik from Masaryk University (Czech Republic), which includes a translation-oriented annotation system (TAS). A distinctive feature of the interface is that it allows source texts and target texts to be aligned, so we can be able to observe and compare in detail both language structures and study translation strategies used by students. The initial data obtained point out the kind of difficulties encountered by the students and reveal the most frequent strategies implemented by the learners according to their level of English, their translation experience and the text genres. We have also found common errors in the graduate and postgraduate university students’ translations: transfer errors, lexical errors, grammatical errors, text-specific translation errors, and cultural-related errors have been identified. Analyzing all these parameters will provide more material to bring better solutions to improve the quality of teaching and the translations produced by the students.Keywords: corpus studies, students’ corpus, the MUST corpus, translation studies
Procedia PDF Downloads 146568 Corpus Stylistics and Multidimensional Analysis for English for Specific Purposes Teaching and Assessment
Authors: Svetlana Strinyuk, Viacheslav Lanin
Abstract:
Academic English has become lingua franca for international scientific community which stimulates universities to introduce English for Specific Purposes (EAP) courses into curriculum. Teaching L2 EAP students might be fulfilled with corpus technologies and digital stylistics. A special software developed to reach the manifold task of teaching, assessing and researching academic writing of L2 students on basis of digital stylistics and multidimensional analysis was created. A set of annotations (style markers) – grammar, lexical and syntactic features most significant of academic writing was built. Contrastive comparison of two corpora “model corpus”, subject domain limited papers published by competent writers in leading academic journals, and “students’ corpus”, subject domain limited papers written by last year students allows to receive data about the features of academic writing underused or overused by L2 EAP student. Both corpora are tagged with a special software created in GATE Developer. Style markers within the framework of research might be replaced depending on the relevance and validity of the result which is achieved from research corpora. Thus, selecting relevant (high frequency) style markers and excluding less relevant, i.e. less frequent annotations, high validity of the model is achieved. Software allows to compare the data received from processing model corpus to students’ corpus and get reports which can be used in teaching and assessment. The less deviation from the model corpus students demonstrates in their writing the higher is academic writing skill acquisition. The research showed that several style markers (hedging devices) were underused by L2 EAP students whereas lexical linking devices were used excessively. A special software implemented into teaching of EAP courses serves as a successful visual aid, makes assessment more valid; it is indicative of the degree of writing skill acquisition, and provides data for further research.Keywords: corpus technologies in EAP teaching, multidimensional analysis, GATE Developer, corpus stylistics
Procedia PDF Downloads 196567 Construction and Analysis of Tamazight (Berber) Text Corpus
Authors: Zayd Khayi
Abstract:
This paper deals with the construction and analysis of the Tamazight text corpus. The grammatical structure of the Tamazight remains poorly understood, and a lack of comparative grammar leads to linguistic issues. In order to fill this gap, even though it is small, by constructed the diachronic corpus of the Tamazight language, and elaborated the program tool. In addition, this work is devoted to constructing that tool to analyze the different aspects of the Tamazight, with its different dialects used in the north of Africa, specifically in Morocco. It also focused on three Moroccan dialects: Tamazight, Tarifiyt, and Tachlhit. The Latin version was good choice because of the many sources it has. The corpus is based on the grammatical parameters and features of that language. The text collection contains more than 500 texts that cover a long historical period. It is free, and it will be useful for further investigations. The texts were transformed into an XML-format standardization goal. The corpus counts more than 200,000 words. Based on the linguistic rules and statistical methods, the original user interface and software prototype were developed by combining the technologies of web design and Python. The corpus presents more details and features about how this corpus provides users with the ability to distinguish easily between feminine/masculine nouns and verbs. The interface used has three languages: TMZ, FR, and EN. Selected texts were not initially categorized. This work was done in a manual way. Within corpus linguistics, there is currently no commonly accepted approach to the classification of texts. Texts are distinguished into ten categories. To describe and represent the texts in the corpus, we elaborated the XML structure according to the TEI recommendations. Using the search function may provide us with the types of words we would search for, like feminine/masculine nouns and verbs. Nouns are divided into two parts. The gender in the corpus has two forms. The neutral form of the word corresponds to masculine, while feminine is indicated by a double t-t affix (the prefix t- and the suffix -t), ex: Tarbat (girl), Tamtut (woman), Taxamt (tent), and Tislit (bride). However, there are some words whose feminine form contains only the prefix t- and the suffix –a, ex: Tasa (liver), tawja (family), and tarwa (progenitors). Generally, Tamazight masculine words have prefixes that distinguish them from other words. For instance, 'a', 'u', 'i', ex: Asklu (tree), udi (cheese), ighef (head). Verbs in the corpus are for the first person singular and plural that have suffixes 'agh','ex', 'egh', ex: 'ghrex' (I study), 'fegh' (I go out), 'nadagh' (I call). The program tool permits the following characteristics of this corpus: list of all tokens; list of unique words; lexical diversity; realize different grammatical requests. To conclude, this corpus has only focused on a small group of parts of speech in Tamazight language verbs, nouns. Work is still on the adjectives, prounouns, adverbs and others.Keywords: Tamazight (Berber) language, corpus linguistic, grammar rules, statistical methods
Procedia PDF Downloads 62566 Using A Corpus Approach To Investigate Positive University Images: A Comparison Between Chinese And ESC Universities
Authors: Han Hongmei
Abstract:
University image is receiving attention because of its key role in influencing student choice, faculty loyalty, and social recognition. Therefore, all universities strive to promote their positive images. However, for most people, the positive image of a university is often from fragmented perceptual understanding. Since universities’ official websites are important channels for image promotion, a corpus approach to university profiles in their official websites can reveal holistic positive images of universities. This study aims to compare positive images of high-level universities in China and English-speaking countries based on a profile corpus of theseuniversities. It is found that the positive images revealed in these university profiles are similar, with some minor differences. The similarities are reflected in the campus environment, historical achievements, comprehensive characteristics, scientific research institutions, and diversified faculty; while the differences are reflected in their unique characteristics. Furthermore, the findings also reveal a gap between Chinese universities and high-level universities in the English-speaking countries.Keywords: university image, positive image, corpus of university profiles, comparative analysis, high-frequency words
Procedia PDF Downloads 106565 Online Multilingual Dictionary Using Hamburg Notation for Avatar-Based Indian Sign Language Generation System
Authors: Sugandhi, Parteek Kumar, Sanmeet Kaur
Abstract:
Sign Language (SL) is used by deaf and other people who cannot speak but can hear or have a problem with spoken languages due to some disability. It is a visual gesture language that makes use of either one hand or both hands, arms, face, body to convey meanings and thoughts. SL automation system is an effective way which provides an interface to communicate with normal people using a computer. In this paper, an avatar based dictionary has been proposed for text to Indian Sign Language (ISL) generation system. This research work will also depict a literature review on SL corpus available for various SL s over the years. For ISL generation system, a written form of SL is required and there are certain techniques available for writing the SL. The system uses Hamburg sign language Notation System (HamNoSys) and Signing Gesture Mark-up Language (SiGML) for ISL generation. It is developed in PHP using Web Graphics Library (WebGL) technology for 3D avatar animation. A multilingual ISL dictionary is developed using HamNoSys for both English and Hindi Language. This dictionary will be used as a database to associate signs with words or phrases of a spoken language. It provides an interface for admin panel to manage the dictionary, i.e., modification, addition, or deletion of a word. Through this interface, HamNoSys can be developed and stored in a database and these notations can be converted into its corresponding SiGML file manually. The system takes natural language input sentence in English and Hindi language and generate 3D sign animation using an avatar. SL generation systems have potential applications in many domains such as healthcare sector, media, educational institutes, commercial sectors, transportation services etc. This research work will help the researchers to understand various techniques used for writing SL and generation of Sign Language systems.Keywords: avatar, dictionary, HamNoSys, hearing impaired, Indian sign language (ISL), sign language
Procedia PDF Downloads 230564 Words of Peace in the Speeches of the Egyptian President, Abdulfattah El-Sisi: A Corpus-Based Study
Authors: Mohamed S. Negm, Waleed S. Mandour
Abstract:
The present study aims primarily at investigating words of peace (lexemes of peace) in the formal speeches of the Egyptian president Abdulfattah El-Sisi in a two-year span of time, from 2018 to 2019. This paper attempts to shed light not only on the contextual use of the antonyms, war and peace, but also it underpins quantitative analysis through the current methods of corpus linguistics. As such, the researchers have deployed a corpus-based approach in collecting, encoding, and processing 30 presidential speeches over the stated period (23,411 words and 25,541 tokens in total). Further, semantic fields and collocational networkzs are identified and compared statistically. Results have shown a significant propensity of adopting peace, including its relevant collocation network, textually and therefore, ideationally, at the expense of war concept which in most cases surfaces euphemistically through the noun conflict. The president has not justified the action of war with an honorable cause or a valid reason. Such results, so far, have indicated a positive sociopolitical mindset the Egyptian president possesses and moreover, reveal national and international fair dealing on arising issues.Keywords: CADS, collocation network, corpus linguistics, critical discourse analysis
Procedia PDF Downloads 153563 Linguistic Accessibility and Audiovisual Translation: Corpus Linguistics as a Tool for Analysis
Authors: Juan-Pedro Rica-Peromingo
Abstract:
The important change taking place with respect to the media and the audiovisual world in Europe needs to benefit all populations, in particular those with special needs, such as the deaf and hard-of-hearing population (SDH) and blind and partially-sighted population (AD). This recent interest in the field of audiovisual translation (AVT) can be observed in the teaching and learning of the different modes of AVT in the degree and post-degree courses at Spanish universities, which expand the interest and practice of AVT linguistic accessibility. We present a research project led at the UCM which consists of the compilation of AVT activities for teaching purposes and tries to analyze the creation and reception of SDH and AD: the AVLA Project (Audiovisual Learning Archive), which includes audiovisual materials carried out by the university students on different AVT modes and evaluations from the blind and deaf informants. In this study, we present the materials created by the students. A group of the deaf and blind population has been in charge of testing the student's SDH and AD corpus of audiovisual materials through some questionnaires used to evaluate the students’ production. These questionnaires include information about the reception of the subtitles and the audio descriptions from linguistic and technical points of view. With all the materials compiled in the research project, a corpus with both the students’ production and the recipients’ evaluations is being compiled: the CALING (Corpus de Accesibilidad Lingüística) corpus. Preliminary results will be presented with respect to those aspects, difficulties, and deficiencies in the SDH and AD included in the corpus, specifically with respect to the length of subtitles, the position of the contextual information on the screen, and the text included in the audio descriptions and tone of voice used. These results may suggest some changes and improvements in the quality of the SDH and AD analyzed. In the end, demand for the teaching and learning of AVT and linguistic accessibility at a university level and some important changes in the norms which regulate SDH and AD nationally and internationally will be suggested.Keywords: audiovisual translation, corpus linguistics, linguistic accessibility, teaching
Procedia PDF Downloads 80562 Developing Active Learners and Efficient Users: A Study on the Implementation of Spoken Interaction Skill in the Malay Language Curriculum in Singapore
Authors: Pairah Bte Satariman
Abstract:
This study is carried out to evaluate Malay Language Curriculum for secondary schools in Singapore. The evaluation focuses on the implementation of Spoken Interaction Skill which was recommended by the Curriculum Review Committee in 2010. The study found that the students face difficulty in communicating interactively with others in their daily activities. The purpose of the study is to evaluate the results (products) on the implementation of this skill since 2011. The research used a qualitative method which includes oral test and interview with students and teachers teaching the subject. Preliminary findings show that generally, the students are not able to communicate interactively and fluently in the oral test unless they are given enough prompts. The teachers feel that the implementation of the skill is timely as students are more keen to use English in their daily communication even in Malay Language Classes. Teachers also mentioned the challenges in the implementation such as insufficient curriculum time and teaching materials.Keywords: evaluation, Malay language curriculum, spoken interaction skills, communication, implementation
Procedia PDF Downloads 144561 Healthcare in COVID-19 and It’s Impact on Children with Cochlear Implants
Authors: Amirreza Razzaghipour, Mahdi Khalili
Abstract:
References from the World Health Organization and the Center for Disease Control for deceleration the spread of the Novel COVID-19, comprises social estrangement, frequent handwashing, and covering your mouth when around others. As hearing healthcare specialists, the influence of existenceinvoluntary to boundary social interactions on persons with hearing impairment was significant for us to understand. We found ourselves delaying cochlear implant (CI) surgeries. All children, and chiefly those with hearing loss, are susceptible to reductions in spoken communication. Hearing plans, such as cochlear implants, provide children with hearing loss access to spoken communication and provision language development. when provided early and used consistently, these supplies help children with hearing loss to engage in spoken connections. Cochlear implant (CI) is a standard medical-surgical treatment for bilateral severe to profound hearing loss with no advantage with the hearing aid. Hearing is one of the most important senses in humans. Pediatric hearing loss establishes one of the most important public health challenges. Children with hearing loss are recognized early and habilitated via hearing aids or with cochlear implants (CIs). Suitable care and maintenance as well as continuous auditory verbal therapy (AVT) are also essential in reaching for the successful attainment of language acquisition. Children with hearing loss posture important challenges to their parents, particularly when there is limited admission to their hearing care providers. The disruption in the routine of their hearing and therapy follow-up services has had substantial effects on the children as well as their parents.Keywords: healthcare, covid-19, cochlear implants, spoken communication, hearing loss
Procedia PDF Downloads 165560 Perceiving Casual Speech: A Gating Experiment with French Listeners of L2 English
Authors: Naouel Zoghlami
Abstract:
Spoken-word recognition involves the simultaneous activation of potential word candidates which compete with each other for final correct recognition. In continuous speech, the activation-competition process gets more complicated due to speech reductions existing at word boundaries. Lexical processing is more difficult in L2 than in L1 because L2 listeners often lack phonetic, lexico-semantic, syntactic, and prosodic knowledge in the target language. In this study, we investigate the on-line lexical segmentation hypotheses that French listeners of L2 English form and then revise as subsequent perceptual evidence is revealed. Our purpose is to shed further light on the processes of L2 spoken-word recognition in context and better understand L2 listening difficulties through a comparison of skilled and unskilled reactions at the point where their working hypothesis is rejected. We use a variant of the gating experiment in which subjects transcribe an English sentence presented in increments of progressively greater duration. The spoken sentence was “And this amazing athlete has just broken another world record”, chosen mainly because it included common reductions and phonetic features in English, such as elision and assimilation. Our preliminary results show that there is an important difference in the manner in which proficient and less-proficient L2 listeners handle connected speech. Less-proficient listeners delay recognition of words as they wait for lexical and syntactic evidence to appear in the gates. Further statistical results are currently being undertaken.Keywords: gating paradigm, spoken word recognition, online lexical segmentation, L2 listening
Procedia PDF Downloads 462559 Attitude in Academic Writing (CAAW): Corpus Compilation and Annotation
Authors: Hortènsia Curell, Ana Fernández-Montraveta
Abstract:
This paper presents the creation, development, and analysis of a corpus designed to study the presence of attitude markers and author’s stance in research articles in two different areas of linguistics (theoretical linguistics and sociolinguistics). These two disciplines are expected to behave differently in this respect, given the disparity in their discursive conventions. Attitude markers in this work are understood as the linguistic elements (adjectives, nouns and verbs) used to convey the writer's stance towards the content presented in the article, and are crucial in understanding writer-reader interaction and the writer's position. These attitude markers are divided into three broad classes: assessment, significance, and emotion. In addition to them, we also consider first-person singular and plural pronouns and possessives, modal verbs, and passive constructions, which are other linguistic elements expressing the author’s stance. The corpus, Corpus of Attitude in Academic Writing (CAAW), comprises a collection of 21 articles, collected from six journals indexed in JCR. These articles were originally written in English by a single native-speaker author from the UK or USA and were published between 2022 and 2023. The total number of words in the corpus is approximately 222,400, with 106,422 from theoretical linguistics (Lingua, Linguistic Inquiry and Journal of Linguistics) and 116,022 from sociolinguistics journals (International Journal of the Sociology of Language, Language in Society and Journal of Sociolinguistics). Together with the corpus, we present the tool created for the creation and storage of the corpus, along with a tool for automatic annotation. The steps followed in the compilation of the corpus are as follows. First, the articles were selected according to the parameters explained above. Second, they were downloaded and converted to txt format. Finally, examples, direct quotes, section titles and references were eliminated, since they do not involve the author’s stance. The resulting texts were the input for the annotation of the linguistic features related to stance. As for the annotation, two articles (one from each subdiscipline) were annotated manually by the two researchers. An existing list was used as a baseline, and other attitude markers were identified, together with the other elements mentioned above. Once a consensus was reached, the rest of articles were annotated automatically using the tool created for this purpose. The annotated corpus will serve as a resource for scholars working in discourse analysis (both in linguistics and communication) and related fields, since it offers new insights into the expression of attitude. The tools created for the compilation and annotation of the corpus will be useful to study author’s attitude and stance in articles from any academic discipline: new data can be uploaded and the list of markers can be enlarged. Finally, the tool can be expanded to other languages, which will allow cross-linguistic studies of author’s stance.Keywords: academic writing, attitude, corpus, english
Procedia PDF Downloads 72558 Discourse Markers in Chinese University Students and Native English Speakers: A Corpus-Based Study
Authors: Dan Xie
Abstract:
The use of discourse markers (DMs) can play a crucial role in representing discourse interaction and pragmatic competence. Learners’ use of DMs and differences between native speakers (NSs) and non-native speakers (NNSs) in the use of various DMs have been the focus of considerable research attention. However, some commonly used DMs, such as you know, have not received as much attention in comparative studies, especially in the Chinese context. This study analyses data in two corpora (COLSEC and Spoken BNC 2014 (14-25)) to investigate how Chinese learners differ from NNSs in their use of the DM you know and its functions in speech. The results show that there is a significant difference between the two corpora in terms of the frequency of use of you know. In terms of the functions of you know, the study shows that six functions can all be present in both corpora, although there are significant differences between the five functional dimensions, especially in introducing a claim linked to the prior discourse and highlighting particular points in the discourse. It is hoped to show empirically how Chinese learners and NSs use DMs differently.Keywords: you know, discourse marker, native speaker, Chinese learner
Procedia PDF Downloads 78557 The Relation between Cognitive Fluency and Utterance Fluency in Second Language Spoken Fluency: Studying Fluency through a Psycholinguistic Lens
Authors: Tannistha Dasgupta
Abstract:
This study explores the aspects of second language (L2) spoken fluency that are related to L2 linguistic knowledge and processing skill. It draws on Levelt’s ‘blueprint’ of the L2 speaker which discusses the cognitive issues underlying the act of speaking. However, L2 speaking assessments have largely neglected the underlying mechanism involved in language production; emphasis is given on the relationship between subjective ratings of L2 speech sample and objectively measured aspects of fluency. Hence, in this study, the relation between L2 linguistic knowledge and processing skill i.e. Cognitive Fluency (CF), and objectively measurable aspects of L2 spoken fluency i.e. Utterance Fluency (UF) is examined. The participants of the study are L2 learners of English, studying at high school level in Hyderabad, India. 50 participants with intermediate level of proficiency in English performed several lexical retrieval tasks and attention-shifting tasks to measure CF, and 8 oral tasks to measure UF. Each aspect of UF (speed, pause, and repair) were measured against the scores of CF to find out those aspects of UF which are reliable indicators of CF. Quantitative analysis of the data shows that among the three aspects of UF; speed is the best predictor of CF, and pause is weakly related to CF. The study suggests that including the speed aspect of UF could make L2 fluency assessment more reliable, valid, and objective. Thus, incorporating the assessment of psycholinguistic mechanisms into L2 spoken fluency testing, could result in fairer evaluation.Keywords: attention-shifting, cognitive fluency, lexical retrieval, utterance fluency
Procedia PDF Downloads 711556 The Rendering of Sex-Related Expressions by Court Interpreters in Hong Kong: A Corpus-Based Approach
Authors: Yee Yan Crystal Kwong
Abstract:
The essence of rape is the absence of consent to sexual intercourse. Yet, the definition of consent is not absolute and allows for subjectivity. In this case, the accuracy of oral interpretation becomes very important as the narratives of events and situation, as well as the register and style of speakers would influence the juror decision making. This paper first adopts a corpus-based approach to investigate how court interpreters in Hong Kong handle expressions that refer to sexual activities. The data of this study will be based on online corpus :From legislation to translation, from translation to interpretation: The narrative of sexual offences. The corpus comprises the transcription of five separate rape trials and all of these trials were heard with the presence of an interpreter. Since there are plenty of sex-related expressions used by witnesses and defendants in the five cases, emphasis will be put on those which have an impact on the definition of rape. With an in-depth analysis of the interpreted utterances, different interpreting approaches will be identified to observe how interpreters retain the intended meanings. Interviews with experienced court interpreters will also be conducted to revisit the validity of the traditional verbatim standard. At the end of this research, various interpreting approaches will be compared and evaluated. A redefinition of interpreters' institutional role, as well as recommendations for interpreting learners will be provided.Keywords: court interpreting, interpreters, legal translation, slangs
Procedia PDF Downloads 262