Search results for: corpus of university profiles
5713 Using A Corpus Approach To Investigate Positive University Images: A Comparison Between Chinese And ESC Universities
Authors: Han Hongmei
Abstract:
University image is receiving attention because of its key role in influencing student choice, faculty loyalty, and social recognition. Therefore, all universities strive to promote their positive images. However, for most people, the positive image of a university is often from fragmented perceptual understanding. Since universities’ official websites are important channels for image promotion, a corpus approach to university profiles in their official websites can reveal holistic positive images of universities. This study aims to compare positive images of high-level universities in China and English-speaking countries based on a profile corpus of theseuniversities. It is found that the positive images revealed in these university profiles are similar, with some minor differences. The similarities are reflected in the campus environment, historical achievements, comprehensive characteristics, scientific research institutions, and diversified faculty; while the differences are reflected in their unique characteristics. Furthermore, the findings also reveal a gap between Chinese universities and high-level universities in the English-speaking countries.Keywords: university image, positive image, corpus of university profiles, comparative analysis, high-frequency words
Procedia PDF Downloads 1075712 Corpus Linguistics as a Tool for Translation Studies Analysis: A Bilingual Parallel Corpus of Students’ Translations
Authors: Juan-Pedro Rica-Peromingo
Abstract:
Nowadays, corpus linguistics has become a key research methodology for Translation Studies, which broadens the scope of cross-linguistic studies. In the case of the study presented here, the approach used focuses on learners with little or no experience to study, at an early stage, general mistakes and errors, the correct or incorrect use of translation strategies, and to improve the translational competence of the students. Led by Sylviane Granger and Marie-Aude Lefer of the Centre for English Corpus Linguistics of the University of Louvain, the MUST corpus (MUltilingual Student Translation Corpus) is an international project which brings together partners from Europe and worldwide universities and connects Learner Corpus Research (LCR) and Translation Studies (TS). It aims to build a corpus of translations carried out by students including both direct (L2 > L1) an indirect (L1 > L2) translations, from a great variety of text types, genres, and registers in a wide variety of languages: audiovisual translations (including dubbing, subtitling for hearing population and for deaf population), scientific, humanistic, literary, economic and legal translation texts. This paper focuses on the work carried out by the Spanish team from the Complutense University (UCMA), which is part of the MUST project, and it describes the specific features of the corpus built by its members. All the texts used by UCMA are either direct or indirect translations between English and Spanish. Students’ profiles comprise translation trainees, foreign language students with a major in English, engineers studying EFL and MA students, all of them with different English levels (from B1 to C1); for some of the students, this would be their first experience with translation. The MUST corpus is searchable via Hypal4MUST, a web-based interface developed by Adam Obrusnik from Masaryk University (Czech Republic), which includes a translation-oriented annotation system (TAS). A distinctive feature of the interface is that it allows source texts and target texts to be aligned, so we can be able to observe and compare in detail both language structures and study translation strategies used by students. The initial data obtained point out the kind of difficulties encountered by the students and reveal the most frequent strategies implemented by the learners according to their level of English, their translation experience and the text genres. We have also found common errors in the graduate and postgraduate university students’ translations: transfer errors, lexical errors, grammatical errors, text-specific translation errors, and cultural-related errors have been identified. Analyzing all these parameters will provide more material to bring better solutions to improve the quality of teaching and the translations produced by the students.Keywords: corpus studies, students’ corpus, the MUST corpus, translation studies
Procedia PDF Downloads 1475711 Linguistic Accessibility and Audiovisual Translation: Corpus Linguistics as a Tool for Analysis
Authors: Juan-Pedro Rica-Peromingo
Abstract:
The important change taking place with respect to the media and the audiovisual world in Europe needs to benefit all populations, in particular those with special needs, such as the deaf and hard-of-hearing population (SDH) and blind and partially-sighted population (AD). This recent interest in the field of audiovisual translation (AVT) can be observed in the teaching and learning of the different modes of AVT in the degree and post-degree courses at Spanish universities, which expand the interest and practice of AVT linguistic accessibility. We present a research project led at the UCM which consists of the compilation of AVT activities for teaching purposes and tries to analyze the creation and reception of SDH and AD: the AVLA Project (Audiovisual Learning Archive), which includes audiovisual materials carried out by the university students on different AVT modes and evaluations from the blind and deaf informants. In this study, we present the materials created by the students. A group of the deaf and blind population has been in charge of testing the student's SDH and AD corpus of audiovisual materials through some questionnaires used to evaluate the students’ production. These questionnaires include information about the reception of the subtitles and the audio descriptions from linguistic and technical points of view. With all the materials compiled in the research project, a corpus with both the students’ production and the recipients’ evaluations is being compiled: the CALING (Corpus de Accesibilidad Lingüística) corpus. Preliminary results will be presented with respect to those aspects, difficulties, and deficiencies in the SDH and AD included in the corpus, specifically with respect to the length of subtitles, the position of the contextual information on the screen, and the text included in the audio descriptions and tone of voice used. These results may suggest some changes and improvements in the quality of the SDH and AD analyzed. In the end, demand for the teaching and learning of AVT and linguistic accessibility at a university level and some important changes in the norms which regulate SDH and AD nationally and internationally will be suggested.Keywords: audiovisual translation, corpus linguistics, linguistic accessibility, teaching
Procedia PDF Downloads 815710 A Self-Built Corpus-Based Study of Four-Word Lexical Bundles in Native English Teachers’ EFL Classroom Discourse in Northeast China: The Significance of Stance
Authors: Fang Tan
Abstract:
This research focuses on the appropriate use of lexical bundles in spoken discourse, particularly in English as a Foreign Language (EFL) classrooms in Northeast China. While previous studies have mainly examined lexical bundles in written discourse, there is a need to investigate their usage in spoken discourse due to the limited availability of spoken discourse corpora. English teachers’ use of lexical bundles is crucial for effective teaching and communication in the EFL classroom. The aim of this study is to investigate the functions of four-word lexical bundles in native English teachers’ EFL oral English classes in Northeast China. Specifically, the research focuses on the usage of stance bundles, which were found to be the most significant type of bundle in the analyzed corpus. By comparing the self-built university spoken English classroom discourse corpus with the other self-built university English for General Purposes (EGP) corpus, the study aims to highlight the difference in bundle usage between native and non-native teachers in EFL classrooms. The research employs a corpus-based study. The observed corpus consists of more than 300,000 tokens, in which the data has been collected in the past five years. The reference corpus is composed of over 800,000 tokens, in which the data has been collected over 12 years. All the primary data collection involved transcribing and annotating spoken English classes taught by native English teachers. The analysis procedures included identifying and categorizing four-word lexical bundles, with specific emphasis on stance bundles. Frequency counts, and comparisons with the Chinese English teachers’ corpus were conducted to identify patterns and differences in bundle usage. The research addresses the following questions: 1) What are the functions of four-word lexical bundles in native English teachers’ EFL oral English classes? 2) How do stance bundles differ in usage between native and non-native English teachers’ classes? 3) What implications can be drawn for English teachers’ professional development based on the findings? In conclusion, this study provides valuable insights into the usage of four-word lexical bundles, particularly stance bundles, in native English teachers’ EFL oral English classes in Northeast China. The research highlights the difference in bundle usage between native and non-native English teachers’ classes and provides implications for English teachers’ professional development. The findings contribute to the understanding of lexical bundle usage in EFL classroom discourse and have theoretical importance for language teaching methodologies. The self-built university English classroom discourse corpus used in this research is a valuable resource for future studies in this field.Keywords: EFL classroom discourse, four-word lexical bundles, stance, implication
Procedia PDF Downloads 655709 A Preliminary Study for Building an Arabic Corpus of Pair Questions-Texts from the Web: Aqa-Webcorp
Authors: Wided Bakari, Patrce Bellot, Mahmoud Neji
Abstract:
With the development of electronic media and the heterogeneity of Arabic data on the Web, the idea of building a clean corpus for certain applications of natural language processing, including machine translation, information retrieval, question answer, become more and more pressing. In this manuscript, we seek to create and develop our own corpus of pair’s questions-texts. This constitution then will provide a better base for our experimentation step. Thus, we try to model this constitution by a method for Arabic insofar as it recovers texts from the web that could prove to be answers to our factual questions. To do this, we had to develop a java script that can extract from a given query a list of html pages. Then clean these pages to the extent of having a database of texts and a corpus of pair’s question-texts. In addition, we give preliminary results of our proposal method. Some investigations for the construction of Arabic corpus are also presented in this document.Keywords: Arabic, web, corpus, search engine, URL, question, corpus building, script, Google, html, txt
Procedia PDF Downloads 3235708 Native Language Identification with Cross-Corpus Evaluation Using Social Media Data: ’Reddit’
Authors: Yasmeen Bassas, Sandra Kuebler, Allen Riddell
Abstract:
Native language identification is one of the growing subfields in natural language processing (NLP). The task of native language identification (NLI) is mainly concerned with predicting the native language of an author’s writing in a second language. In this paper, we investigate the performance of two types of features; content-based features vs. content independent features, when they are evaluated on a different corpus (using social media data “Reddit”). In this NLI task, the predefined models are trained on one corpus (TOEFL), and then the trained models are evaluated on different data using an external corpus (Reddit). Three classifiers are used in this task; the baseline, linear SVM, and logistic regression. Results show that content-based features are more accurate and robust than content independent ones when tested within the corpus and across corpus.Keywords: NLI, NLP, content-based features, content independent features, social media corpus, ML
Procedia PDF Downloads 1375707 Semantic Preference across Research Articles: A Corpus-Based Study of Adjectives in English
Authors: Valdênia Carvalho e Almeida
Abstract:
The goal of the present study is to investigate the semantic preference of the most frequent adjectives in research articles through a corpus-based analysis of texts published in journals in Applied Linguistics (AL). The corpus used in this study contains texts published in the period from 2014 to 2018 in the three journals: Language Learning and Technology; English for Academic Purposes, and TESOL Quaterly, totaling more than one million words. A corpus-based analysis was carried out on the corpus to identify the most frequent adjectives that co-occurred in the three journals. By observing the concordance lines of the adjectives and analyzing the words they associated with, the semantic preferences of each adjective were determined. Later, the AL corpus analysis was compared to the investigation of the same adjectives in a corpus of Chemistry. This second part of the study aimed to identify possible differences and similarities between the two corpora in relation to the use of the adjectives in research articles from both areas. The results show that there are some preferences which seem to be closely related not only to the academic genre of the texts but also to the specific domain of the discipline and, to a lesser extent, to the context of research in each journal. This research illustrates a possible contribution of Corpus Linguistics to explore the concept of semantic preference in more detail, considering the complex nature of the phenomenon.Keywords: applied linguistics, corpus linguistics, chemistry, research article, semantic preference
Procedia PDF Downloads 1855706 Corpus-Assisted Study of Gender Related Tiger Metaphors in the Chinese Context
Authors: Na Xiao
Abstract:
Animal metaphors have many different connotations, ranging from loving emotions to derogatory epithets, but gender expressions using animal metaphors are often imbalanced. Generally, animal metaphors related to females tend to be negative. Little known about the reasons for the negative expressions of animal female metaphors in Chinese contexts still have not been quantified. The Modern Chinese Corpus at the Center for Chinese Linguistics at Peking University (CCL Corpus) provided the data for this research, which aims to identify the influencing variables of gender differences in the description of animal metaphors mapping humans in Chinese by observing the percentage of "tiger" metaphor, which is based on the conceptual metaphor theory. A quantitative research method was used in this study to statistically examine the gender attitude percentage of the "tiger" metaphor using corpus data. This study has proved that the tiger metaphors associated with humans in the Chinese context tend to be negative. Importantly, this study has also shown that the high proportion of tiger metaphorical idioms is what causes the high proportion of negative tiger metaphors that are related to women. This finding can be used as crucial information for future studies on other gender-related animal metaphorical idioms and can offer additional insights for understanding trends in other animal metaphors.Keywords: Chinese, CCL corpus, gender differences, metaphorical idioms, tigers
Procedia PDF Downloads 1085705 Corpus-Based Model of Key Concepts Selection for the Master English Language Course "Government Relations"
Authors: Elena Pozdnyakova
Abstract:
“Government Relations” is a field of knowledge presently taught at the majority of universities around the globe. English as the default language can become the language of teaching since the issues discussed are both global and national in character. However for this field of knowledge key concepts and their word representations in English don’t often coincide with those in other languages. International master’s degree students abroad as well as students, taught the course in English at their national universities, are exposed to difficulties, connected with correct conceptualizing of terminology of GR in British and American academic traditions. The study was carried out during the GR English language course elaboration (pilot research: 2013 -2015) at Moscow State Institute of Foreign Relations (University), Russian Federation. Within this period, English language instructors designed and elaborated the three-semester course of GR. Methodologically the course design was based on elaboration model with the special focus on conceptual elaboration sequence and theoretical elaboration sequence. The course designers faced difficulties in concept selection and theoretical elaboration sequence. To improve the results and eliminate the problems with concept selection, a new, corpus-based approach was worked out. The computer-based tool WordSmith 6.0 was used with the aim to build a model of key concept selection. The corpus of GR English texts consisted of 1 million words (the study corpus). The approach was based on measuring effect size, i.e. the percent difference of the frequency of a word in the study corpus when compared to that in the reference corpus. The results obtained proved significant improvement in the process of concept selection. The corpus-based model also facilitated theoretical elaboration of teaching materials.Keywords: corpus-based study, English as the default language, key concepts, measuring effect size, model of key concept selection
Procedia PDF Downloads 3065704 Specialized Translation Teaching Strategies: A Corpus-Based Approach
Authors: Yingying Ding
Abstract:
This study presents a methodology of specialized translation with the objective of helping teachers to improve the strategies in teaching translation. In order to allow students to acquire skills to translate specialized texts, they need to become familiar with the semantic and syntactic features of source texts and target texts. The aim of our study is to use a corpus-based approach in the teaching of specialized translation between Chinese and Italian. This study proposes to construct a specialized Chinese - Italian comparable corpus that consists of 50 economic contracts from the domain of food. With the help of AntConc, we propose to compile a comparable corpus in for translation teaching purposes. This paper attempts to provide insight into how teachers could benefit from comparable corpus in the teaching of specialized translation from Italian into Chinese and through some examples of passive sentences how students could learn to apply different strategies for translating appropriately the voice.Keywords: contrastive studies, specialised translation, corpus-based approach, teaching
Procedia PDF Downloads 3705703 Exploring Reading into Writing: A Corpus-Based Analysis of Postgraduate Students’ Literature Review Essays
Authors: Tanzeela Anbreen, Ammara Maqsood
Abstract:
Reading into writing is one of university students' most required academic skills. The current study explored postgraduate university students’ writing quality using a corpus-based approach. Twelve postgraduate students’ literature review essays were chosen for the corpus-based analysis. These essays were chosen because students had to incorporate multiple reading sources in these essays, which was a new writing exercise for them. The students were provided feedback at least two times which comprised of the written comments by the tutor highlighting the areas of improvement and also by using the ‘track changes’ function. This exercise was repeated two times, and students submitted two drafts. This investigation included only the finally submitted work of the students. A corpus-based approach was adopted to analyse the essays because it promotes autonomous discovery and personalised learning. The aim of this analysis was to understand the existing level of students’ writing before the start of their postgraduate thesis. Text Inspector was used to analyse the quality of essays. With the help of the Text Inspector tool, the vocabulary used in the essays was compared to the English Vocabulary Profile (EVP), which describes what learners know and can do at each Common European Framework of Reference (CEFR) level. Writing quality was also measured for the Flesch reading ease score, which is a standard to describe the ease of understanding the writing content. The results reflected that students found writing essays using multiple sources challenging. In most essays, the vocabulary level achieved was between B1-B2 of the CEFL level. The study recommends that students need extensive training in developing academic writing skills, particularly in writing the literature review type assignment, which requires multiple sources citations.Keywords: literature review essays, postgraduate students, corpus-based analysis, vocabulary proficiency
Procedia PDF Downloads 735702 Grammatically Coded Corpus of Spoken Lithuanian: Methodology and Development
Authors: L. Kamandulytė-Merfeldienė
Abstract:
The paper deals with the main issues of methodology of the Corpus of Spoken Lithuanian which was started to be developed in 2006. At present, the corpus consists of 300,000 grammatically annotated word forms. The creation of the corpus consists of three main stages: collecting the data, the transcription of the recorded data, and the grammatical annotation. Collecting the data was based on the principles of balance and naturality. The recorded speech was transcribed according to the CHAT requirements of CHILDES. The transcripts were double-checked and annotated grammatically using CHILDES. The development of the Corpus of Spoken Lithuanian has led to the constant increase in studies on spontaneous communication, and various papers have dealt with a distribution of parts of speech, use of different grammatical forms, variation of inflectional paradigms, distribution of fillers, syntactic functions of adjectives, the mean length of utterances.Keywords: CHILDES, corpus of spoken Lithuanian, grammatical annotation, grammatical disambiguation, lexicon, Lithuanian
Procedia PDF Downloads 2365701 Chinese Students’ Use of Corpus Tools in an English for Academic Purposes Writing Course: Influence on Learning Behaviour, Performance Outcomes and Perceptions
Authors: Jingwen Ou
Abstract:
Writing for academic purposes in a second or foreign language poses a significant challenge for non-native speakers, particularly at the tertiary level, where English academic writing for L2 students is often hindered by difficulties in academic discourse, including vocabulary, academic register, and organization. The past two decades have witnessed a rising popularity in the application of the data-driven learning (DDL) approach in EAP writing instruction. In light of such a trend, this study aims to enhance the integration of DDL into English for academic purposes (EAP) writing classrooms by investigating the perception of Chinese college students regarding the use of corpus tools for improving EAP writing. Additionally, the research explores their corpus consultation behaviors during training to provide insights into corpus-assisted EAP instruction for DDL practitioners. Given the uprising popularity of DDL, this research aims to investigate Chinese university students’ use of corpus tools with three main foci: 1) the influence of corpus tools on learning behaviours, 2) the influence of corpus tools on students’ academic writing performance outcomes, and 3) students’ perceptions and potential perceptional changes towards the use of such tools. Three corpus tools, CQPWeb, Sketch Engine, and LancsBox X, are selected for investigation due to the scarcity of empirical research on patterns of learners’ engagement with a combination of multiple corpora. The research adopts a pre-test / post-test design for the evaluation of students’ academic writing performance before and after the intervention. Twenty participants will be divided into two groups: an intervention and a non-intervention group. Three corpus training workshops will be delivered at the beginning, middle, and end of a semester. An online survey and three separate focus group interviews are designed to investigate students’ perceptions of the use of corpus tools for improving academic writing skills, particularly the rhetorical functions in different essay sections. Insights from students’ consultation sessions indicated difficulties with DDL practice, including insufficiency of time to complete all tasks, struggle with technical set-up, unfamiliarity with the DDL approach and difficulty with some advanced corpus functions. Findings from the main study aim to provide pedagogical insights and training resources for EAP practitioners and learners.Keywords: corpus linguistics, data-driven learning, English for academic purposes, tertiary education in China
Procedia PDF Downloads 605700 An Online Corpus-Based Bilingual Collocations Dictionary for Second/Foreign Language Learners
Authors: Adriane Orenha-Ottaiano
Abstract:
Collocations are conventionalized, recurrent and arbitrary lexical combinations. Due to the fact that they are highly specific for a particular language and may be contextually restricted, collocations pose a problem to EFL/ESL learners with regard to production or encoding. Taking that into account, the compilation of monolingual and bilingual collocations dictionaries for the referred audience is highly crucial and significant. Thus, the aim of this paper is to discuss the importance of the compilation of an Online Corpus-based Bilingual Collocations Dictionary, in the English-Portuguese and Portuguese-English directions. On a first phase, with the use of WordSmith Tools, the collocations were extracted from a Translation Learner Corpus (TLC), a parallel corpus made up of university students’ translations in the Portuguese-English direction, with approximately 100,000 words. In a second stage, based on the keywords analyzed from the TLC, more collocational patterns were extracted using the Sketch Engine. In order to include more collocations as well as to ensure dictionary users will have access to more frequent and recurrent collocations, we also use the frequency list from The Corpus of Contemporary American English, with the purpose of extracting more patterns. The dictionary focuses on all types of collocations (verbal, noun, adjectival and adverbial collocations), in order to help the referred audience use them more accurately and productively – so far the dictionary has more than 330 entries, and more than 3,500 collocations extracted. The idea of having the proposed dictionary in online format may allow to incorporate more qualitatively and quantitatively collocational information. Besides, more examples may be included, different from conventional printed collocations dictionaries. Being the first bilingual collocations dictionary in the aforementioned directions, it is hoped to achieve the challenge of meeting learners’ collocational needs as the collocations have been selected according to learners’ difficulties regarding the use of collocations.Keywords: Corpus-Based Collocations Dictionary, Collocations , Bilingual Collocations Dictionary, Collocational Patterns
Procedia PDF Downloads 3095699 Intellectual Capital Disclosure: Profiles of Spanish Public Universities
Authors: Yolanda Ramírez, Ángel Tejada, Agustín Baidez
Abstract:
In the higher education setting, there is a current trend in society toward greater openness and transparency. The economic, social and political changes that have occurred in recent years in public sector universities (particularly the New Public Management, the Bologna Process and the emergence of the “third mission”) call for a wider disclosure of value created by universities to support fundraising activities, to ensure accountability in the use of public funds and the outcomes of research and teaching, as well as close relationships with industries and territories. The paper has two purposes: 1) to explore the intellectual capital (IC) disclosure in Spanish universities through their websites, and 2) to identify university profiles. This study applies a content analysis to analyze the institutional websites of Spanish public universities and a cluster analysis. The analysis reveals that Spanish universities’ website content usually relates to human capital, while structural and relational capitals are less widely disclosed. Our research identifies three behavioral profiles of Spanish universities with regard to the online disclosure of IC (universities more proactive, universities less proactive and universities adopt a middle position in this regard. The results can serve as encouragement to university managers to enhance online IC disclosure to meet the information needs of university stakeholders.Keywords: universities, intellectual capital, disclosure, internet
Procedia PDF Downloads 1585698 Corporate Cautionary Statement: A Genre of Professional Communication
Authors: Chie Urawa
Abstract:
Cautionary statements or disclaimers in corporate annual reports need to be carefully designed because clear cautionary statements may protect a company in the case of legal disputes and may undermine positive impressions. This study compares the language of cautionary statements using two corpora, Sony’s cautionary statement corpus (S-corpus) and Panasonic’s cautionary statement corpus (P-corpus), illustrating the differences and similarities in relation to the use of meaningful cautionary statements and critically analyzing why practitioners use the way. The findings describe the distinct differences between the two companies in the presentation of the risk factors and the way how they make the statements. The word ability is used more for legal protection in S-corpus whereas the word possibility is used more to convey a better impression in P-corpus. The main similarities are identified in the use of lexical words and pronouns, and almost the same wordings for eight years. The findings show how they make the statements unique to the company in the presentation of risk factors, and the characteristics of specific genre of professional communication. Important implications of this study are that more comprehensive approach can be applied in other contexts, and be used by companies to reflect upon their cautionary statements.Keywords: cautionary statements, corporate annual reports, corpus, risk factors
Procedia PDF Downloads 1715697 Tagging a corpus of Media Interviews with Diplomats: Challenges and Solutions
Authors: Roberta Facchinetti, Sara Corrizzato, Silvia Cavalieri
Abstract:
Increasing interconnection between data digitalization and linguistic investigation has given rise to unprecedented potentialities and challenges for corpus linguists, who need to master IT tools for data analysis and text processing, as well as to develop techniques for efficient and reliable annotation in specific mark-up languages that encode documents in a format that is both human and machine-readable. In the present paper, the challenges emerging from the compilation of a linguistic corpus will be taken into consideration, focusing on the English language in particular. To do so, the case study of the InterDiplo corpus will be illustrated. The corpus, currently under development at the University of Verona (Italy), represents a novelty in terms both of the data included and of the tag set used for its annotation. The corpus covers media interviews and debates with diplomats and international operators conversing in English with journalists who do not share the same lingua-cultural background as their interviewees. To date, this appears to be the first tagged corpus of international institutional spoken discourse and will be an important database not only for linguists interested in corpus analysis but also for experts operating in international relations. In the present paper, special attention will be dedicated to the structural mark-up, parts of speech annotation, and tagging of discursive traits, that are the innovational parts of the project being the result of a thorough study to find the best solution to suit the analytical needs of the data. Several aspects will be addressed, with special attention to the tagging of the speakers’ identity, the communicative events, and anthropophagic. Prominence will be given to the annotation of question/answer exchanges to investigate the interlocutors’ choices and how such choices impact communication. Indeed, the automated identification of questions, in relation to the expected answers, is functional to understand how interviewers elicit information as well as how interviewees provide their answers to fulfill their respective communicative aims. A detailed description of the aforementioned elements will be given using the InterDiplo-Covid19 pilot corpus. The data yielded by our preliminary analysis of the data will highlight the viable solutions found in the construction of the corpus in terms of XML conversion, metadata definition, tagging system, and discursive-pragmatic annotation to be included via Oxygen.Keywords: spoken corpus, diplomats’ interviews, tagging system, discursive-pragmatic annotation, english linguistics
Procedia PDF Downloads 1855696 A Corpus-Based Study on the Styles of Three Translators
Authors: Wang Yunhong
Abstract:
The present paper is preoccupied with the different styles of three translators in their translating a Chinese classical novel Shuihu Zhuan. Based on a parallel corpus, it adopts a target-oriented approach to look into whether and what stylistic differences and shifts the three translations have revealed. The findings show that the three translators demonstrate different styles concerning their word choices and sentence preferences, which implies that identification of recurrent textual patterns may be a basic step for investigating the style of a translator.Keywords: corpus, lexical choices, sentence characteristics, style
Procedia PDF Downloads 2685695 English for Academic and Specific Purposes: A Corpus-Informed Approach to Designing Vocabulary Teaching Materials
Authors: Said Ahmed Zohairy
Abstract:
Significant shifts in the theory and practice of teaching vocabulary affect teachers’ decisions about learning materials’ design. Relevant literature supports teaching specialised, authentic, and multi-word lexical items rather than focusing on single-word vocabulary lists. Corpora, collections of texts stored in a database, presents a reliable source of teaching and learning materials. Although corpus-informed studies provided guidance for teachers to identify useful language chunks and phraseological units, there is a scarcity in the literature discussing the use of corpora in teaching English for academic and specific purposes (EASP). The aim of this study is to improve teaching practices and provide a description of the pedagogical choices and procedures of an EASP tutor in an attempt to offer guidance for novice corpus users. It draws on the researcher’s experience of utilising corpus linguistic tools to design vocabulary learning activities without focusing on students’ learning outcomes. Hence, it adopts a self-study research methodology which is based on five methodological components suggested by other self-study researchers. The findings of the study noted that designing specialised and corpus-informed vocabulary learning activities could be challenging for teachers, as they require technical knowledge of how to navigate corpora and utilise corpus analysis tools. Findings also include a description of the researcher’s approach to building and analysing a specialised corpus for the benefit of novice corpus users; they should be able to start their own journey of designing corpus-based activities.Keywords: corpora, corpus linguistics, corpus-informed, English for academic and specific purposes, agribusiness, vocabulary, phraseological units, materials design
Procedia PDF Downloads 245694 A Corpus-Assisted Discourse Analysis of Adjectival Collocation of the Word 'Education' in the American Context
Authors: Ngan Nguyen
Abstract:
The study analyses adjectives collocating with the word ‘education’ in the American language of the Corpus of Global Web-based English using a combination of corpus linguistic and discourse analytical methods to examine not only language patterns but also social political ideologies around the topic. Significant conclusions are deduced: (1) there are a large number of adjectival collocates of the word education which have been identified and classified into four categories representing four different aspects of education: level, quality, forms and types of education; (2) education, as in combination with three first categories, carries the meaning as the act and process of teaching and learning while with the last category having the meaning of a particular kind of teaching or training; (3) higher education is the topic that gains most concerns from the American public; (4) five most significant ideologies are discovered from the corpus: higher education associates with financial affairs, higher education is an industry, monetary policy of the government on higher education, people require greater accessibility to higher education and people value higher education. The study contributes to the field of developing meanings of words through corpus analysis and the field of discourse analysis.Keywords: adjectival collocation, American context, corpus linguistics, discourse analysis, education
Procedia PDF Downloads 3465693 Hematological Profiles of Visceral Leishmaniasis Patients before and after Treatment of Anti-Leishmanial Drugs at University of Gondar Leishmania Research and Treatment Center Northwest, Ethiopia
Authors: Fitsumbrhan Tajebe, Fadil Murad, Mitikie Tigabie, Mareye Abebaw, Tadele Alemu, Sefanit Abate, Rezika Mohammedw, Arega Yeshanew, Elias Shiferaw
Abstract:
Background: Visceral leshimaniasis is a parasitic disease characterized by a systemic infection of phagocytic cells. Hematological parameters of these patients may be affected by the progress of the disease or treatment. Thus, the current study aimed to assess the hematological profiles of visceral leishmaniasis patients before and after treatment. Method: An institutional based retrospective cohort study was conducted among visceral leishmaniasis patients at University of Gondar Comprehensive Specialized Referral Hospital Leishmaniasis Research and Treatment Center from 2013 to 2018. Hematological profiles before initiation and after completion of treatment were extracted from registration book. Descriptive statics was presented using frequency and percentage. Paired t-test and Wilcoxon Signed rank test were used for comparing mean difference for normally and non- normally distributed data, respectively. Spearman and Pearson correlation analysis was used to describe the correlation of hematological parameters with different variables. P value < 0.05 was considered as statistically significant. Result: Except absolute nerutrophil count, post treatment hematological parameters show a significant increment compared to pretreatment one. The prevalence of anemia, leucopenia and thrombocytopenia was 85.5%, 83.4% and 75.8% prior to treatment and it was 58.3%, 38.2% and 19.2% after treatment, respectively. Moreover, parasite load of the disease showed statistically significant negative correlation with hematological profiles mainly with white blood cell and red blood cell. Conclusion: Majority of hematological profiles of patients with active VL have been restored after treatment, which might be associated with treatment effect on parasite proliferation and concentration of parasite in visceral organ, which directly affect hematological profiles.Keywords: visceral leshimaniasis, hematological profile, anti-leshimanial drug, Gondar
Procedia PDF Downloads 1285692 Saudi Twitter Corpus for Sentiment Analysis
Authors: Adel Assiri, Ahmed Emam, Hmood Al-Dossari
Abstract:
Sentiment analysis (SA) has received growing attention in Arabic language research. However, few studies have yet to directly apply SA to Arabic due to lack of a publicly available dataset for this language. This paper partially bridges this gap due to its focus on one of the Arabic dialects which is the Saudi dialect. This paper presents annotated data set of 4700 for Saudi dialect sentiment analysis with (K= 0.807). Our next work is to extend this corpus and creation a large-scale lexicon for Saudi dialect from the corpus.Keywords: Arabic, sentiment analysis, Twitter, annotation
Procedia PDF Downloads 6295691 Designing a Corpus Database to Enhance the Learning of Old English Language
Authors: Raquel Mateo Mendaza, Carmen Novo Urraca
Abstract:
The current paper presents the elaboration of a corpus database that aligns two different corpora in order to simplify the search of information both for researchers and students of Old English. This database comprises the information contained in two main reference corpora, namely the Dictionary of Old English Corpus (DOEC), compiled at the University of Toronto, and the York-Toronto-Helsinki Parsed Corpus of Old English (YCOE). The first one provides information on all surviving texts written in the Old English language. The latter offers the syntactical and morphological annotation of several texts included in the DOEC. Although both corpora are closely related, as the YCOE includes the DOE source text identifier, the main problem detected is that there is not an alignment of texts that allows for the search of whole fragments to be further analysed in terms of morphology and syntax. The database proposed in this paper gathers all this information and presents it in a simple, more accessible, visual, and educational way. The alignment of fragments has been done in an automatized way. However, some problems have emerged during the creating process particularly related to the lack of correspondence in the division of fragments. For this reason, it has been necessary to revise the whole entries manually to obtain a truthful high-quality product and to carefully indicate the gaps encountered in these corpora. All in all, this database contains more than 60,000 entries corresponding with the DOE fragments annotated by the YCOE. The main strength of the resulting product is its research and teaching implications in the study of Old English. The use of this database will help researchers and students in the study of different aspects of the language, such as inflectional morphology, syntactic behaviour of given words, or translation studies, among others. By means of the search of words or fragments, the annotated information on morphology and syntax will be automatically displayed, automatizing, and speeding up the search of data.Keywords: alignment, corpus database, morphosyntactic analysis, Old English
Procedia PDF Downloads 1345690 Corpus Linguistic Methods in a Theoretical Study of Quran Verb Tense and Aspect in Translations from Arabic to English
Authors: Jawharah Alasmari
Abstract:
In inflectional morphology of verb, tense and aspect indicate action’s time either past/present or future and their period whether completed or not. The usage and meaning of tense and aspect differ in Arabic and English, therefore is no simple one -to- one mapping from an Arabic verb inflected form an appropriate English translation depends on a range of features, including immediate and wider context of use. The Quranic Arabic Corpus includes seven alternative expertly crafted English translations of each Arabic verses, which provides a test dataset for the study of appropriate Arabic to English translations of verb tense and aspect. We applied Corpus Linguistics Methods in a theoretical study of exemplary verbs, to elicit candidate verbal contexts which influence the choice of English inflection for each verse.Keywords: Corpus linguistics methods, Arabic verb, tense and aspect, English translations
Procedia PDF Downloads 3915689 Combining Corpus Linguistics and Critical Discourse Analysis to Study Power Relations in Hindi Newspapers
Authors: Vandana Mishra, Niladri Sekhar Dash, Jayshree Charkraborty
Abstract:
This present paper focuses on the application of corpus linguistics techniques for critical discourse analysis (CDA) of Hindi newspapers. While Corpus linguistics is the study of language as expressed in corpora (samples) of 'real world' text, CDA is an interdisciplinary approach to the study of discourse that views language as a form of social practice. CDA has mainly been studied from a qualitative perspective. However, we can say that recent studies have begun combining corpus linguistics with CDA in analyzing large volumes of text for the study of existing power relations in society. The corpus under our study is also of a sizable amount (1 million words of Hindi newspaper texts) and its analysis requires an alternative analytical procedure. So, we have combined both the quantitative approach i.e. the use of corpus techniques with CDA’s traditional qualitative analysis. In this context, we have focused on the Keyword Analysis Sorting Concordance Lines of the selected Keywords and calculating collocates of the keywords. We have made use of the Wordsmith Tool for all these analysis. The analysis starts with identifying the keywords in the political news corpus when compared with the main news corpus. The keywords are extracted from the corpus based on their keyness calculated through statistical tests like chi-squared test and log-likelihood test on the frequent words of the corpus. Some of the top occurring keywords are मोदी (Modi), भाजपा (BJP), कांग्रेस (Congress), सरकार (Government) and पार्टी (Political party). This is followed by the concordance analysis of these keywords which generates thousands of lines but we have to select few lines and examine them based on our objective. We have also calculated the collocates of the keywords based on their Mutual Information (MI) score. Both concordance and collocation help to identify lexical patterns in the political texts. Finally, all these quantitative results derived from the corpus techniques will be subjectively interpreted in accordance to the CDA’s theory to examine the ways in which political news discourse produces social and political inequality, power abuse or domination.Keywords: critical discourse analysis, corpus linguistics, Hindi newspapers, power relations
Procedia PDF Downloads 2245688 A Corpus-Based Discourse Analysis of the Disappearance of MH370 in Malaysia and United Kingdom Newspapers: A Pilot Study
Authors: Theng Theng Ong
Abstract:
This pilot study adopts a corpus-based discourse analysis to explore the construction of Malaysia airline tragedy MH370 in the selected Malaysian and United Kingdom (UK) newspapers. Fairclough’s three-dimensional model is adopted in the study to support the corpus-based analysis. The analysis aims to determine the ways in which Malaysian Airline tragedy MH370 is linguistically defined and constructed in terms of keywords and collocation. The study also seeks to identify the types of discourse that are presented in the news articles. In addition, the differences or similarities in terms of keywords, topics or issues covered by the selected Malaysian and UK news media are examined.Keywords: corpus, CDA, newspapers, airline tragedies
Procedia PDF Downloads 3005687 Social Data Aggregator and Locator of Knowledge (STALK)
Authors: Rashmi Raghunandan, Sanjana Shankar, Rakshitha K. Bhat
Abstract:
Social media contributes a vast amount of data and information about individuals to the internet. This project will greatly reduce the need for unnecessary manual analysis of large and diverse social media profiles by filtering out and combining the useful information from various social media profiles, eliminating irrelevant data. It differs from the existing social media aggregators in that it does not provide a consolidated view of various profiles. Instead, it provides consolidated INFORMATION derived from the subject’s posts and other activities. It also allows analysis over multiple profiles and analytics based on several profiles. We strive to provide a query system to provide a natural language answer to questions when a user does not wish to go through the entire profile. The information provided can be filtered according to the different use cases it is used for.Keywords: social network, analysis, Facebook, Linkedin, git, big data
Procedia PDF Downloads 4445686 Passive Voice in SLA: Armenian Learners’ Case Study
Authors: Emma Nemishalyan
Abstract:
It is believed that learners’ mother tongue (L1 hereafter) has a huge impact on their second language acquisition (L2 hereafter). This hypothesis has been exposed to both positive and negative criticism. Based on research results of a wide range of learners’ corpora (Chinese, Japanese, Spanish among others) the hypothesis has either been proved or disproved. However, no such study has been conducted on the Armenian learners. The aim of this paper is to understand the implication of the hypothesis on the Armenian learners’ corpus in terms of the use of the passive voice. To this end, the method of Contrastive Interlanguage Analysis (hereafter CIA) has been used on native speakers’ corpus (Louvain Corpus of Native English Essays (LOCNESS)) and Armenian learners’ corpus which has been compiled by me in compliance with International Corpus of Learner English (ICLE) guidelines. CIA compares the interlanguage (the language produced by learners) with the one produced by native speakers. With the help of this method, it is possible not only to highlight the mistakes that learners make, but also to underline the under or overuses. The choice of the grammar issue (passive voice) is conditioned by the fact that typologically Armenian and English are drastically different as they belong to different branches. Moreover, the passive voice is considered to be one of the most problematic grammar topics to be acquired by learners of the English language. Based on this difference, we hypothesized that Armenian learners would either overuse or underuse some types of the passive voice. With the help of Lancsbox software, we have identified the frequency rates of passive voice usage in LOCNESS and Armenian learners’ corpus to understand whether the latter have the same usage pattern of the passive voice as the native speakers. Secondly, we have identified the types of the passive voice used by the Armenian leaners trying to track down the reasons in their mother tongue. The results of the study showed that Armenian learners underused the passive voices in contrast to native speakers. Furthermore, the hypothesis that learners’ L1 has an impact on learners’ L2 acquisition and production was proved.Keywords: corpus linguistics, applied linguistics, second language acquisition, corpus compilation
Procedia PDF Downloads 1085685 The Association of Anthropometric Measurements, Blood Pressure Measurements, and Lipid Profiles with Mental Health Symptoms in University Students
Authors: Ammaarah Gamieldien
Abstract:
Depression is a very common and serious mental illness that has a significant impact on both the social and economic aspects of sufferers worldwide. This study aimed to investigate the association between body mass index (BMI), blood pressure, and lipid profiles with mental health symptoms in university students. Secondary objectives included the associations between the variables (BMI, blood pressure, and lipids) with themselves, as they are key factors in cardiometabolic disease. Sixty-three (63) students participated in the study. Thirty-two (32) were assigned to the control group (minimal-mild depressive symptoms), while 31 were assigned to the depressive group (moderate to severe depressive symptoms). Montgomery-Asberg Depression Rating Scale (MADRS) and Beck Depression Inventory (BDI) were used to assess depressive scores. Anthropometric measurements such as weight (kg), height (m), waist circumference (WC), and hip circumference were measured. Body mass index (BMI) and ratios such as waist-to-hip ratio (WHR) and waist-to-height ratio (WtHR) were also calculated. Blood pressure was measured using an automated AfriMedics blood pressure machine, while lipids were measured using a CardioChek plus analyzer machine. Statistics were analyzed via the SPSS statistics program. There were no significant associations between anthropometric measurements and depressive scores (p > 0.05). There were no significant correlations between lipid profiles and depression when running a Spearman’s rho correlation (P > 0.05). However, total cholesterol and LDL-C were negatively associated with depression, and triglycerides were positively associated with depression after running a point-biserial correlation (P < 0.05). Overall, there were no significant associations between blood pressure measurements and depression (P > 0.05). However, there was a significant moderate positive correlation between systolic blood pressure and MADRS scores in males (P < 0.05). Depressive scores positively and strongly correlated to how long it takes participants to fall asleep. There were also significant associations with regard to the secondary objectives. This study indicates the importance of determining the prevalence of depression among university students in South Africa. If the prevalence and factors associated with depression are addressed, depressive symptoms in university students may be improved.Keywords: depression, blood pressure, body mass index, lipid profiles, mental health symptoms
Procedia PDF Downloads 625684 The Repetition of New Words and Information in Mandarin-Speaking Children: A Corpus-Based Study
Authors: Jian-Jun Gao
Abstract:
Repetition is used for a variety of functions in conversation. When young children first learn to speak, they often repeat words from the adult’s recent utterance with the learning and social function. The objective of this study was to ascertain whether the repetitions are equivalent in indicating attention to new words and the initial repeat of information in conversation. Based on the observation of naturally occurring language use in Taiwan Corpus of Child Mandarin (TCCM), the results in this study provided empirical support to the previous findings that children are more likely to repeat new words they are offered than to repeat new information. When children get older, there would be a drop in the repetition of both new words and new information.Keywords: acquisition, corpus, mandarin, new words, new information, repetition
Procedia PDF Downloads 149