Search results for: representative corpus
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1206

Search results for: representative corpus

1176 The Value of Computerized Corpora in EFL Textbook Design: The Case of Modal Verbs

Authors: Lexi Li

Abstract:

This study aims to contribute to the field of how computer technology can be exploited to enhance EFL textbook design. Specifically, the study demonstrates how computerized native and learner corpora can be used to enhance modal verb treatment in EFL textbooks. The linguistic focus is will, would, can, could, may, might, shall, should, must. The native corpus is the spoken component of BNC2014 (hereafter BNCS2014). The spoken part is chosen because the pedagogical purpose of the textbooks is communication-oriented. Using the standard query option of CQPweb, 5% of each of the nine modals was sampled from BNCS2014. The learner corpus is the POS-tagged Ten-thousand English Compositions of Chinese Learners (TECCL). All the essays under the “secondary school” section were selected. A series of five secondary coursebooks comprise the textbook corpus. All the data in both the learner and the textbook corpora are retrieved through the concordance functions of WordSmith Tools (version, 5.0). Data analysis was divided into two parts. The first part compared the patterns of modal verbs in the textbook corpus and BNC2014 with respect to distributional features, semantic functions, and co-occurring constructions to examine whether the textbooks reflect the authentic use of English. Secondly, the learner corpus was compared with the textbook corpus in terms of the use (distributional features, semantic functions, and co-occurring constructions) in order to examine the degree of influence of the textbook on learners’ use of modal verbs. Moreover, the learner corpus was analyzed for the misuse (syntactic errors, e.g., she can sings*.) of the nine modal verbs to uncover potential difficulties that confront learners. The results indicate discrepancies between the textbook presentation of modal verbs and authentic modal use in natural discourse in terms of distributions of frequencies, semantic functions, and co-occurring structures. Furthermore, there are consistent patterns of use between the learner corpus and the textbook corpus with respect to the three above-mentioned aspects, except could, will and must, partially confirming the correlation between the frequency effects and L2 grammar acquisition. Further analysis reveals that the exceptions are caused by both positive and negative L1 transfer, indicating that the frequency effects can be intercepted by L1 interference. Besides, error analysis revealed that could, would, should and must are the most difficult for Chinese learners due to both inter-linguistic and intra-linguistic interference. The discrepancies between the textbook corpus and the native corpus point to a need to adjust the presentation of modal verbs in the textbooks in terms of frequencies, different meanings, and verb-phrase structures. Along with the adjustment of modal verb treatment based on authentic use, it is important for textbook writers to take into consideration the L1 interference as well as learners’ difficulties in their use of modal verbs. The present study is a methodological showcase of the combination both native and learner corpora in the enhancement of EFL textbook language authenticity and appropriateness for learners.

Keywords: EFL textbooks, learner corpus, modal verbs, native corpus

Procedia PDF Downloads 96
1175 Using A Corpus Approach To Investigate Positive University Images: A Comparison Between Chinese And ESC Universities

Authors: Han Hongmei

Abstract:

University image is receiving attention because of its key role in influencing student choice, faculty loyalty, and social recognition. Therefore, all universities strive to promote their positive images. However, for most people, the positive image of a university is often from fragmented perceptual understanding. Since universities’ official websites are important channels for image promotion, a corpus approach to university profiles in their official websites can reveal holistic positive images of universities. This study aims to compare positive images of high-level universities in China and English-speaking countries based on a profile corpus of theseuniversities. It is found that the positive images revealed in these university profiles are similar, with some minor differences. The similarities are reflected in the campus environment, historical achievements, comprehensive characteristics, scientific research institutions, and diversified faculty; while the differences are reflected in their unique characteristics. Furthermore, the findings also reveal a gap between Chinese universities and high-level universities in the English-speaking countries.

Keywords: university image, positive image, corpus of university profiles, comparative analysis, high-frequency words

Procedia PDF Downloads 82
1174 Words of Peace in the Speeches of the Egyptian President, Abdulfattah El-Sisi: A Corpus-Based Study

Authors: Mohamed S. Negm, Waleed S. Mandour

Abstract:

The present study aims primarily at investigating words of peace (lexemes of peace) in the formal speeches of the Egyptian president Abdulfattah El-Sisi in a two-year span of time, from 2018 to 2019. This paper attempts to shed light not only on the contextual use of the antonyms, war and peace, but also it underpins quantitative analysis through the current methods of corpus linguistics. As such, the researchers have deployed a corpus-based approach in collecting, encoding, and processing 30 presidential speeches over the stated period (23,411 words and 25,541 tokens in total). Further, semantic fields and collocational networkzs are identified and compared statistically. Results have shown a significant propensity of adopting peace, including its relevant collocation network, textually and therefore, ideationally, at the expense of war concept which in most cases surfaces euphemistically through the noun conflict. The president has not justified the action of war with an honorable cause or a valid reason. Such results, so far, have indicated a positive sociopolitical mindset the Egyptian president possesses and moreover, reveal national and international fair dealing on arising issues.

Keywords: CADS, collocation network, corpus linguistics, critical discourse analysis

Procedia PDF Downloads 119
1173 Linguistic Accessibility and Audiovisual Translation: Corpus Linguistics as a Tool for Analysis

Authors: Juan-Pedro Rica-Peromingo

Abstract:

The important change taking place with respect to the media and the audiovisual world in Europe needs to benefit all populations, in particular those with special needs, such as the deaf and hard-of-hearing population (SDH) and blind and partially-sighted population (AD). This recent interest in the field of audiovisual translation (AVT) can be observed in the teaching and learning of the different modes of AVT in the degree and post-degree courses at Spanish universities, which expand the interest and practice of AVT linguistic accessibility. We present a research project led at the UCM which consists of the compilation of AVT activities for teaching purposes and tries to analyze the creation and reception of SDH and AD: the AVLA Project (Audiovisual Learning Archive), which includes audiovisual materials carried out by the university students on different AVT modes and evaluations from the blind and deaf informants. In this study, we present the materials created by the students. A group of the deaf and blind population has been in charge of testing the student's SDH and AD corpus of audiovisual materials through some questionnaires used to evaluate the students’ production. These questionnaires include information about the reception of the subtitles and the audio descriptions from linguistic and technical points of view. With all the materials compiled in the research project, a corpus with both the students’ production and the recipients’ evaluations is being compiled: the CALING (Corpus de Accesibilidad Lingüística) corpus. Preliminary results will be presented with respect to those aspects, difficulties, and deficiencies in the SDH and AD included in the corpus, specifically with respect to the length of subtitles, the position of the contextual information on the screen, and the text included in the audio descriptions and tone of voice used. These results may suggest some changes and improvements in the quality of the SDH and AD analyzed. In the end, demand for the teaching and learning of AVT and linguistic accessibility at a university level and some important changes in the norms which regulate SDH and AD nationally and internationally will be suggested.

Keywords: audiovisual translation, corpus linguistics, linguistic accessibility, teaching

Procedia PDF Downloads 52
1172 The Use of Corpora in Improving Modal Verb Treatment in English as Foreign Language Textbooks

Authors: Lexi Li, Vanessa H. K. Pang

Abstract:

This study aims to demonstrate how native and learner corpora can be used to enhance modal verb treatment in EFL textbooks in mainland China. It contributes to a corpus-informed and learner-centered design of grammar presentation in EFL textbooks that enhances the authenticity and appropriateness of textbook language for target learners. The linguistic focus is will, would, can, could, may, might, shall, should, must. The native corpus is the spoken component of BNC2014 (hereafter BNCS2014). The spoken part is chosen because pedagogical purpose of the textbooks is communication-oriented. Using the standard query option of CQPweb, 5% of each of the nine modals was sampled from BNCS2014. The learner corpus is the POS-tagged Ten-thousand English Compositions of Chinese Learners (TECCL). All the essays under the 'secondary school' section were selected. A series of five secondary coursebooks comprise the textbook corpus. All the data in both the learner and the textbook corpora are retrieved through the concordance functions of WordSmith Tools (version, 5.0). Data analysis was divided into two parts. The first part compared the patterns of modal verbs in the textbook corpus and BNC2014 with respect to distributional features, semantic functions, and co-occurring constructions to examine whether the textbooks reflect the authentic use of English. Secondly, the learner corpus was analyzed in terms of the use (distributional features, semantic functions, and co-occurring constructions) and the misuse (syntactic errors, e.g., she can sings*.) of the nine modal verbs to uncover potential difficulties that confront learners. The analysis of distribution indicates several discrepancies between the textbook corpus and BNCS2014. The first four most frequent modal verbs in BNCS2014 are can, would, will, could, while can, will, should, could are the top four in the textbooks. Most strikingly, there is an unusually high proportion of can (41.1%) in the textbooks. The results on different meanings shows that will, would and must are the most problematic. For example, for will, the textbooks contain 20% more occurrences of 'volition' and 20% less of 'prediction' than those in BNCS2014. Regarding co-occurring structures, the textbooks over-represented the structure 'modal +do' across the nine modal verbs. Another major finding is that the structure of 'modal +have done' that frequently co-occur with could, would, should, and must is underused in textbooks. Besides, these four modal verbs are the most difficult for learners, as the error analysis shows. This study demonstrates how the synergy of native and learner corpora can be harnessed to improve EFL textbook presentation of modal verbs in a way that textbooks can provide not only authentic language used in natural discourse but also appropriate design tailed for the needs of target learners.

Keywords: English as Foreign Language, EFL textbooks, learner corpus, modal verbs, native corpus

Procedia PDF Downloads 116
1171 Attitude in Academic Writing (CAAW): Corpus Compilation and Annotation

Authors: Hortènsia Curell, Ana Fernández-Montraveta

Abstract:

This paper presents the creation, development, and analysis of a corpus designed to study the presence of attitude markers and author’s stance in research articles in two different areas of linguistics (theoretical linguistics and sociolinguistics). These two disciplines are expected to behave differently in this respect, given the disparity in their discursive conventions. Attitude markers in this work are understood as the linguistic elements (adjectives, nouns and verbs) used to convey the writer's stance towards the content presented in the article, and are crucial in understanding writer-reader interaction and the writer's position. These attitude markers are divided into three broad classes: assessment, significance, and emotion. In addition to them, we also consider first-person singular and plural pronouns and possessives, modal verbs, and passive constructions, which are other linguistic elements expressing the author’s stance. The corpus, Corpus of Attitude in Academic Writing (CAAW), comprises a collection of 21 articles, collected from six journals indexed in JCR. These articles were originally written in English by a single native-speaker author from the UK or USA and were published between 2022 and 2023. The total number of words in the corpus is approximately 222,400, with 106,422 from theoretical linguistics (Lingua, Linguistic Inquiry and Journal of Linguistics) and 116,022 from sociolinguistics journals (International Journal of the Sociology of Language, Language in Society and Journal of Sociolinguistics). Together with the corpus, we present the tool created for the creation and storage of the corpus, along with a tool for automatic annotation. The steps followed in the compilation of the corpus are as follows. First, the articles were selected according to the parameters explained above. Second, they were downloaded and converted to txt format. Finally, examples, direct quotes, section titles and references were eliminated, since they do not involve the author’s stance. The resulting texts were the input for the annotation of the linguistic features related to stance. As for the annotation, two articles (one from each subdiscipline) were annotated manually by the two researchers. An existing list was used as a baseline, and other attitude markers were identified, together with the other elements mentioned above. Once a consensus was reached, the rest of articles were annotated automatically using the tool created for this purpose. The annotated corpus will serve as a resource for scholars working in discourse analysis (both in linguistics and communication) and related fields, since it offers new insights into the expression of attitude. The tools created for the compilation and annotation of the corpus will be useful to study author’s attitude and stance in articles from any academic discipline: new data can be uploaded and the list of markers can be enlarged. Finally, the tool can be expanded to other languages, which will allow cross-linguistic studies of author’s stance.

Keywords: academic writing, attitude, corpus, english

Procedia PDF Downloads 31
1170 The Omani Learner of English Corpus: Source and Tools

Authors: Anood Al-Shibli

Abstract:

Designing a learner corpus is not an easy task to accomplish because dealing with learners’ language has many variables which might affect the results of any study based on learners’ language production (spoken and written). Also, it is very essential to systematically design a learner corpus especially when it is aimed to be a reference to language research. Therefore, designing the Omani Learner Corpus (OLEC) has undergone many explicit and systematic considerations. These criteria can be regarded as the foundation to design any learner corpus to be exploited effectively in language use and language learning studies. Added to that, OLEC is manually error-annotated corpus. Error-annotation in learner corpora is very essential; however, it is time-consuming and prone to errors. Consequently, a navigating tool is designed to help the annotators to insert errors’ codes in order to make the error-annotation process more efficient and consistent. To assure accuracy, error annotation procedure is followed to annotate OLEC and some preliminary findings are noted. One of the main results of this procedure is creating an error-annotation system based on the Omani learners of English language production. Because OLEC is still in the first stages, the primary findings are related to only one level of proficiency and one error type which is verb related errors. It is found that Omani learners in OLEC has the tendency to have more errors in forming the verb and followed by problems in agreement of verb. Comparing the results to other error-based studies indicate that the Omani learners tend to have basic verb errors which can found in lower-level of proficiency. To this end, it is essential to admit that examining learners’ errors can give insights to language acquisition and language learning and most errors do not happen randomly but they occur systematically among language learners.

Keywords: error-annotation system, error-annotation manual, learner corpora, verbs related errors

Procedia PDF Downloads 114
1169 The Rendering of Sex-Related Expressions by Court Interpreters in Hong Kong: A Corpus-Based Approach

Authors: Yee Yan Crystal Kwong

Abstract:

The essence of rape is the absence of consent to sexual intercourse. Yet, the definition of consent is not absolute and allows for subjectivity. In this case, the accuracy of oral interpretation becomes very important as the narratives of events and situation, as well as the register and style of speakers would influence the juror decision making. This paper first adopts a corpus-based approach to investigate how court interpreters in Hong Kong handle expressions that refer to sexual activities. The data of this study will be based on online corpus :From legislation to translation, from translation to interpretation: The narrative of sexual offences. The corpus comprises the transcription of five separate rape trials and all of these trials were heard with the presence of an interpreter. Since there are plenty of sex-related expressions used by witnesses and defendants in the five cases, emphasis will be put on those which have an impact on the definition of rape. With an in-depth analysis of the interpreted utterances, different interpreting approaches will be identified to observe how interpreters retain the intended meanings. Interviews with experienced court interpreters will also be conducted to revisit the validity of the traditional verbatim standard. At the end of this research, various interpreting approaches will be compared and evaluated. A redefinition of interpreters' institutional role, as well as recommendations for interpreting learners will be provided.

Keywords: court interpreting, interpreters, legal translation, slangs

Procedia PDF Downloads 239
1168 A Self-Built Corpus-Based Study of Four-Word Lexical Bundles in Native English Teachers’ EFL Classroom Discourse in Northeast China: The Significance of Stance

Authors: Fang Tan

Abstract:

This research focuses on the appropriate use of lexical bundles in spoken discourse, particularly in English as a Foreign Language (EFL) classrooms in Northeast China. While previous studies have mainly examined lexical bundles in written discourse, there is a need to investigate their usage in spoken discourse due to the limited availability of spoken discourse corpora. English teachers’ use of lexical bundles is crucial for effective teaching and communication in the EFL classroom. The aim of this study is to investigate the functions of four-word lexical bundles in native English teachers’ EFL oral English classes in Northeast China. Specifically, the research focuses on the usage of stance bundles, which were found to be the most significant type of bundle in the analyzed corpus. By comparing the self-built university spoken English classroom discourse corpus with the other self-built university English for General Purposes (EGP) corpus, the study aims to highlight the difference in bundle usage between native and non-native teachers in EFL classrooms. The research employs a corpus-based study. The observed corpus consists of more than 300,000 tokens, in which the data has been collected in the past five years. The reference corpus is composed of over 800,000 tokens, in which the data has been collected over 12 years. All the primary data collection involved transcribing and annotating spoken English classes taught by native English teachers. The analysis procedures included identifying and categorizing four-word lexical bundles, with specific emphasis on stance bundles. Frequency counts, and comparisons with the Chinese English teachers’ corpus were conducted to identify patterns and differences in bundle usage. The research addresses the following questions: 1) What are the functions of four-word lexical bundles in native English teachers’ EFL oral English classes? 2) How do stance bundles differ in usage between native and non-native English teachers’ classes? 3) What implications can be drawn for English teachers’ professional development based on the findings? In conclusion, this study provides valuable insights into the usage of four-word lexical bundles, particularly stance bundles, in native English teachers’ EFL oral English classes in Northeast China. The research highlights the difference in bundle usage between native and non-native English teachers’ classes and provides implications for English teachers’ professional development. The findings contribute to the understanding of lexical bundle usage in EFL classroom discourse and have theoretical importance for language teaching methodologies. The self-built university English classroom discourse corpus used in this research is a valuable resource for future studies in this field.

Keywords: EFL classroom discourse, four-word lexical bundles, stance, implication

Procedia PDF Downloads 33
1167 The Acquisition of Case in Biological Domain Based on Text Mining

Authors: Shen Jian, Hu Jie, Qi Jin, Liu Wei Jie, Chen Ji Yi, Peng Ying Hong

Abstract:

In order to settle the problem of acquiring case in biological related to design problems, a biometrics instance acquisition method based on text mining is presented. Through the construction of corpus text vector space and knowledge mining, the feature selection, similarity measure and case retrieval method of text in the field of biology are studied. First, we establish a vector space model of the corpus in the biological field and complete the preprocessing steps. Then, the corpus is retrieved by using the vector space model combined with the functional keywords to obtain the biological domain examples related to the design problems. Finally, we verify the validity of this method by taking the example of text.

Keywords: text mining, vector space model, feature selection, biologically inspired design

Procedia PDF Downloads 228
1166 Differences in Assessing Hand-Written and Typed Student Exams: A Corpus-Linguistic Study

Authors: Jutta Ransmayr

Abstract:

The digital age has long arrived at Austrian schools, so both society and educationalists demand that digital means should be integrated accordingly to day-to-day school routines. Therefore, the Austrian school-leaving exam (A-levels) can now be written either by hand or by using a computer. However, the choice of writing medium (pen and paper or computer) for written examination papers, which are considered 'high-stakes' exams, raises a number of questions that have not yet been adequately investigated and answered until recently, such as: What effects do the different conditions of text production in the written German A-levels have on the component of normative linguistic accuracy? How do the spelling skills of German A-level papers written with a pen differ from those that the students wrote on the computer? And how is the teacher's assessment related to this? Which practical desiderata for German didactics can be derived from this? In a trilateral pilot project of the Austrian Center for Digital Humanities (ACDH) of the Austrian Academy of Sciences and the University of Vienna in cooperation with the Austrian Ministry of Education and the Council for German Orthography, these questions were investigated. A representative Austrian learner corpus, consisting of around 530 German A-level papers from all over Austria (pen and computer written), was set up in order to subject it to a quantitative (corpus-linguistic and statistical) and qualitative investigation with regard to the spelling and punctuation performance of the high school graduates and the differences between pen- and computer-written papers and their assessments. Relevant studies are currently available mainly from the Anglophone world. These have shown that writing on the computer increases the motivation to write, has positive effects on the length of the text, and, in some cases, also on the quality of the text. Depending on the writing situation and other technical aids, better results in terms of spelling and punctuation could also be found in the computer-written texts as compared to the handwritten ones. Studies also point towards a tendency among teachers to rate handwritten texts better than computer-written texts. In this paper, the first comparable results from the German-speaking area are to be presented. Research results have shown that, on the one hand, there are significant differences between handwritten and computer-written work with regard to performance in orthography and punctuation. On the other hand, the corpus linguistic investigation and the subsequent statistical analysis made it clear that not only the teachers' assessments of the students’ spelling performance vary enormously but also the overall assessments of the exam papers – the factor of the production medium (pen and paper or computer) also seems to play a decisive role.

Keywords: exam paper assessment, pen and paper or computer, learner corpora, linguistics

Procedia PDF Downloads 138
1165 Ultrasonic Assessment of Corpora lutea and Plasma Progesterone Levels in Early Pregnant and Non Pregnant Cows

Authors: Abdurraouf O. Gaja, Salah Y. A. Al-Dahash, Guru Solmon Raju, Chikara Kubota

Abstract:

Corpus luteum cross sectional (by ultrasonography) and plasma progesterone (by DELFIA) were estimated in early pregnant and non pregnant cows on days 14th and 20th to 23rd post insemination. On day 14th, corpus luteum sectional area was 348.43 mm2 in pregnant and 387.84mm2 in non pregnant cows. Within days 20th to 23rd, corpus luteum sectional area ranged between 342.06 and 367.90 mm2 in pregnant and between 193.85 and 270.69 mm2 in non pregnant cows. Plasma progesterone level was 2.43 ng/ml in pregnant and 2.46 ng/ml in non pregnant cows on day 14th, while during days 20th to 23rd the level ranged between 2.47 and 2,84 ng/ml in pregnant and between 0.53 and 1.17 ng/ml in non pregnant cows. Results of both luteal tissue areas as well as plasma progesterone levels were highly significantly deferent (P<0.01) between pregnant and non pregnant cows during days 20th to 23rd, but there were no significant differences on day 14th. The correlation between CL cross-sectional area and plasma progesterone level was 0.4 in pregnant cows and 0.99 in non pregnant cow. It is clear, from this study, that ultrasonic assessment of corpora lutea is a viable alternative to determine plasma progesterone levels for early pregnancy diagnosis in cows.

Keywords: progesterone, ultrasonography, corpus luteum, pregnancy diagnosis, cow

Procedia PDF Downloads 281
1164 Understanding the Top Questions Asked about Hong Kong by Travellers Worldwide through a Corpus-Based Discourse Analytic Approach

Authors: Phoenix W. Y. Lam

Abstract:

As one of the most important service-oriented industries in contemporary society, tourism has increasingly seen the influence of the Internet on all aspects of travelling. Travellers nowadays habitually research online before making travel-related decisions. One platform on which such research is conducted is destination forums. The emergence of such online destination forums in the last decade has allowed tourists to share their travel experiences quickly and easily with a large number of online users around the world. As such, these destination forums also provide invaluable data for tourism bodies to better understand travellers’ views on their destinations. Collecting posts from the Hong Kong travel forum on the world’s largest travel website TripAdvisor®, the present study identifies the top questions asked by TripAdvisor users about Hong Kong through a corpus-based discourse analytic approach. Based on questions posted on the forum and their associated meta-data gathered in a one-year period, the study examines the top questions asked by travellers around the world to identify the key geographical locations in which users have shown the greatest interest in the city. Questions raised by travellers from different geographical locations are also compared to see if traveller communities by location vary in terms of their areas of interest. This analysis involves the study of key words and concordance of frequently-occurring items and a close reading of representative examples in context. Findings from the present study show that travellers who asked the most questions about Hong Kong are from North America and Asia, and that travellers from different locations have different concerns and interests, which are clearly reflected in the language of the questions asked on the travel forum. These findings can therefore provide tourism organisations with useful information about the key markets that should be targeted for promotional purposes, and can also allow such organisations to design advertising campaigns which better address the specific needs of such markets. The present study thus demonstrates the value of applying linguistic knowledge and methodologies to the domain of tourism to address practical issues.

Keywords: corpus, hong kong, online travel forum, tourism, TripAdvisor

Procedia PDF Downloads 155
1163 Exploring the Use of Adverbs in Two Young Learners Written Corpora

Authors: Chrysanthi S. Tiliakou, Katerina T. Frantzi

Abstract:

Writing has always been considered a most demanding skill for English as a Foreign Language learners as well as for native speakers. Novice foreign language writers are asked to handle a limited range of vocabulary to produce writing tasks at lower levels. Adverbs are the parts of speech that are not used extensively in the early stages of English as a Foreign Language writing. An additional problem with learning new adverbs is that, next to learning their meanings, learners are expected to acquire the proper placement of adverbs in a sentence. The use of adverbs is important as they enhance “expressive richness to one’s message”. By exploring the patterns of use of adverbs, researchers and educators can identify types of adverbs, which appear more taxing for young learners or that puzzle novice English as a Foreign Language writers with their placement, and focus on their teaching. To this end, the study examines the use of adverbs on two written Corpora of young learners of English of A1 – A2 levels and determines the types of adverbs used, their frequencies, problems in their use, and whether there is any differentiation between levels. The Antconc concordancing tool was used for the Greek Learner Corpus, and the Corpuscle concordancing tool for the Norwegian Corpus. The research found a similarity in the normalized frequencies of the adverbs used in the A1-A2 level Greek Learner Corpus with the frequencies of the same adverbs in the Norwegian Learner Corpus.

Keywords: learner corpora, young learners, writing, use of adverbs

Procedia PDF Downloads 61
1162 Verb Bias in Mandarin: The Corpus Based Study of Children

Authors: Jou-An Chung

Abstract:

The purpose of this study is to investigate the verb bias of the Mandarin verbs in children’s reading materials and provide the criteria for categorization. Verb bias varies cross-linguistically. As Mandarin and English are typological different, this study hopes to shed light on Mandarin verb bias with the use of corpus and provide thorough and detailed criteria for analysis. Moreover, this study focuses on children’s reading materials since it is a significant issue in understanding children’s sentence processing. Therefore, investigating verb bias of Mandarin verbs in children’s reading materials is also an important issue and can provide further insights into children’s sentence processing. The small corpus is built up for this study. The corpus consists of the collection of school textbooks and Mandarin Daily News for children. The files are then segmented and POS tagged by JiebaR (Chinese segmentation with R). For the ease of analysis, the one-word character verbs and intransitive verbs are excluded beforehand. The total of 20 high frequency verbs are hand-coded and are further categorized into one of the three types, namely DO type, SC type and other category. If the frequency of taking Other Type exceeds the threshold of 25%, the verb is excluded from the study. The results show that 10 verbs are direct object bias verbs, and six verbs are sentential complement bias verbs. The paired T-test was done to assure the statistical significance (p = 0.0001062 for DO bias verb, p=0.001149 for SC bias verb). The result has shown that in children’s reading materials, the DO biased verbs are used more than the SC bias verbs since the simplest structure of sentences is easier for children’s sentence comprehension or processing. In sum, this study not only discussed verb bias in child's reading materials but also provided basic coding criteria for verb bias analysis in Mandarin and underscored the role of context. Sentences are easier for children’s sentence comprehension or processing. In sum, this study not only discussed verb bias in child corpus, but also provided basic coding criteria for verb bias analysis in Mandarin and underscored the role of context.

Keywords: corpus linguistics, verb bias, child language, psycholinguistics

Procedia PDF Downloads 249
1161 Designing a Corpus Database to Enhance the Learning of Old English Language

Authors: Raquel Mateo Mendaza, Carmen Novo Urraca

Abstract:

The current paper presents the elaboration of a corpus database that aligns two different corpora in order to simplify the search of information both for researchers and students of Old English. This database comprises the information contained in two main reference corpora, namely the Dictionary of Old English Corpus (DOEC), compiled at the University of Toronto, and the York-Toronto-Helsinki Parsed Corpus of Old English (YCOE). The first one provides information on all surviving texts written in the Old English language. The latter offers the syntactical and morphological annotation of several texts included in the DOEC. Although both corpora are closely related, as the YCOE includes the DOE source text identifier, the main problem detected is that there is not an alignment of texts that allows for the search of whole fragments to be further analysed in terms of morphology and syntax. The database proposed in this paper gathers all this information and presents it in a simple, more accessible, visual, and educational way. The alignment of fragments has been done in an automatized way. However, some problems have emerged during the creating process particularly related to the lack of correspondence in the division of fragments. For this reason, it has been necessary to revise the whole entries manually to obtain a truthful high-quality product and to carefully indicate the gaps encountered in these corpora. All in all, this database contains more than 60,000 entries corresponding with the DOE fragments annotated by the YCOE. The main strength of the resulting product is its research and teaching implications in the study of Old English. The use of this database will help researchers and students in the study of different aspects of the language, such as inflectional morphology, syntactic behaviour of given words, or translation studies, among others. By means of the search of words or fragments, the annotated information on morphology and syntax will be automatically displayed, automatizing, and speeding up the search of data.

Keywords: alignment, corpus database, morphosyntactic analysis, Old English

Procedia PDF Downloads 104
1160 Exploring Reading into Writing: A Corpus-Based Analysis of Postgraduate Students’ Literature Review Essays

Authors: Tanzeela Anbreen, Ammara Maqsood

Abstract:

Reading into writing is one of university students' most required academic skills. The current study explored postgraduate university students’ writing quality using a corpus-based approach. Twelve postgraduate students’ literature review essays were chosen for the corpus-based analysis. These essays were chosen because students had to incorporate multiple reading sources in these essays, which was a new writing exercise for them. The students were provided feedback at least two times which comprised of the written comments by the tutor highlighting the areas of improvement and also by using the ‘track changes’ function. This exercise was repeated two times, and students submitted two drafts. This investigation included only the finally submitted work of the students. A corpus-based approach was adopted to analyse the essays because it promotes autonomous discovery and personalised learning. The aim of this analysis was to understand the existing level of students’ writing before the start of their postgraduate thesis. Text Inspector was used to analyse the quality of essays. With the help of the Text Inspector tool, the vocabulary used in the essays was compared to the English Vocabulary Profile (EVP), which describes what learners know and can do at each Common European Framework of Reference (CEFR) level. Writing quality was also measured for the Flesch reading ease score, which is a standard to describe the ease of understanding the writing content. The results reflected that students found writing essays using multiple sources challenging. In most essays, the vocabulary level achieved was between B1-B2 of the CEFL level. The study recommends that students need extensive training in developing academic writing skills, particularly in writing the literature review type assignment, which requires multiple sources citations.

Keywords: literature review essays, postgraduate students, corpus-based analysis, vocabulary proficiency

Procedia PDF Downloads 39
1159 Conceptual Metaphors of Responsibility in Arabic to English Translation of Political Speeches: A Corpus-Based Study

Authors: Amr Anany

Abstract:

This study offers a corpus-based analysis of the conceptual metaphors of RESPONSIBILITY inherent in the Arabic political speeches of King Abdulla II and their English translations rendered by the translators of the Royal Hashemite Court ("RHC translators"). In view of the Conceptual Metaphor Theory (CMT), the current study aims to uncover the extent to which the dominant ideology in the source Arabic speeches of King Abdulla II is conveyed into the target English translation. The study explores a bilingual corpus, including eleven authentic Arabic speeches delivered by King Abdulla II and their English translations. The study finds that both Arabic and English share several metaphorical expressions of RESPONSIBILITY that are based on bodily experience such as RESPONSIBILITY IS UP, RESPONSIBILITY IS AN OBJECT, and RESPONSIBILITY IS AN HONOR. Apparently, the study concludes that RHC translators succeed to convey the dominant ideology from the source Arabic speeches to the English ones using specific translation strategies.

Keywords: cognitive linguistics, CDA, conceptual metaphor theory, ideology, responsibility

Procedia PDF Downloads 37
1158 Neologisms and Word-Formation Processes in Board Game Rulebook Corpus: Preliminary Results

Authors: Athanasios Karasimos, Vasiliki Makri

Abstract:

This research focuses on the design and development of the first text Corpus based on Board Game Rulebooks (BGRC) with direct application on the morphological analysis of neologisms and tendencies in word-formation processes. Corpus linguistics is a dynamic field that examines language through the lens of vast collections of texts. These corpora consist of diverse written and spoken materials, ranging from literature and newspapers to transcripts of everyday conversations. By morphologically analyzing these extensive datasets, morphologists can gain valuable insights into how language functions and evolves, as these extensive datasets can reflect the byproducts of inflection, derivation, blending, clipping, compounding, and neology. This entails scrutinizing how words are created, modified, and combined to convey meaning in a corpus of challenging, creative, and straightforward texts that include rules, examples, tutorials, and tips. Board games teach players how to strategize, consider alternatives, and think flexibly, which are critical elements in language learning. Their rulebooks reflect not only their weight (complexity) but also the language properties of each genre and subgenre of these games. Board games are a captivating realm where strategy, competition, and creativity converge. Beyond the excitement of gameplay, board games also spark the art of word creation. Word games, like Scrabble, Codenames, Bananagrams, Wordcraft, Alice in the Wordland, Once uUpona Time, challenge players to construct words from a pool of letters, thus encouraging linguistic ingenuity and vocabulary expansion. These games foster a love for language, motivating players to unearth obscure words and devise clever combinations. On the other hand, the designers and creators produce rulebooks, where they include their joy of discovering the hidden potential of language, igniting the imagination, and playing with the beauty of words, making these games a delightful fusion of linguistic exploration and leisurely amusement. In this research, more than 150 rulebooks in English from all types of modern board games, either language-independent or language-dependent, are used to create the BGRC. A representative sample of each genre (family, party, worker placement, deckbuilding, dice, and chance games, strategy, eurogames, thematic, role-playing, among others) was selected based on the score from BoardGameGeek, the size of the texts and the level of complexity (weight) of the game. A morphological model with morphological networks, multi-word expressions, and word-creation mechanics based on the complexity of the textual structure, difficulty, and board game category will be presented. In enabling the identification of patterns, trends, and variations in word formation and other morphological processes, this research aspires to make avail of this creative yet strict text genre so as to (a) give invaluable insight into morphological creativity and innovation that (re)shape the lexicon of the English language and (b) test morphological theories. Overall, it is shown that corpus linguistics empowers us to explore the intricate tapestry of language, and morphology in particular, revealing its richness, flexibility, and adaptability in the ever-evolving landscape of human expression.

Keywords: board game rulebooks, corpus design, morphological innovations, neologisms, word-formation processes

Procedia PDF Downloads 56
1157 Investigating (Im)Politeness Strategies in Email Communication: The Case Algerian PhD Supervisees and Irish Supervisors

Authors: Zehor Ktitni

Abstract:

In pragmatics, politeness is regarded as a feature of paramount importance to successful interpersonal relationships. On the other hand, emails have recently become one of the indispensable means of communication in educational settings. This research puts email communication at the core of the study and analyses it from a politeness perspective. More specifically, it endeavours to look closely at how the concept of (im)politeness is reflected through students’ emails. To this end, a corpus of Algerian supervisees’ email threads, exchanged with their Irish supervisors, was compiled. Leech’s model of politeness (2014) was selected as the main theoretical framework of this study, in addition to making reference to Brown and Levinson’s model (1987) as it is one of the most influential models in the area of pragmatic politeness. Further, some follow-up interviews are to be conducted with Algerian students to reinforce the results derived from the corpus. Initial findings suggest that Algerian Ph.D. students’ emails tend to include more politeness markers than impoliteness ones, they heavily make use of academic titles when addressing their supervisors (Dr. or Prof.), and they rely on hedging devices in order to sound polite.

Keywords: politeness, email communication, corpus pragmatics, Algerian PhD supervisees, Irish supervisors

Procedia PDF Downloads 34
1156 Introducing Data-Driven Learning into Chinese Higher Education EAP Writing Instructional Settings

Authors: Jingwen Ou

Abstract:

Writing for academic purposes in a second or foreign language is one of the most important and the most demanding skills to be mastered by non-native speakers. Traditionally, the EAP writing instruction at the tertiary level encompasses the teaching of academic genre knowledge, more specifically, the disciplinary writing conventions, the rhetorical functions, and specific linguistic features. However, one of the main sources of challenges in English academic writing for L2 students at the tertiary level can still be found in proficiency in academic discourse, especially vocabulary, academic register, and organization. Data-Driven Learning (DDL) is defined as “a pedagogical approach featuring direct learner engagement with corpus data”. In the past two decades, the rising popularity of the application of the data-driven learning (DDL) approach in the field of EAP writing teaching has been noticed. Such a combination has not only transformed traditional pedagogy aided by published DDL guidebooks in classroom use but also triggered global research on corpus use in EAP classrooms. This study endeavors to delineate a systematic review of research in the intersection of DDL and EAP writing instruction by conducting a systematic literature review on both indirect and direct DDL practice in EAP writing instructional settings in China. Furthermore, the review provides a synthesis of significant discoveries emanating from prior research investigations concerning Chinese university students’ perception of Data-Driven Learning (DDL) and the subsequent impact on their academic writing performance following corpus-based training. Research papers were selected from Scopus-indexed journals and core journals from two main Chinese academic databases (CNKI and Wanfang) published in both English and Chinese over the last ten years based on keyword searches. Results indicated an insufficiency of empirical DDL research despite a noticeable upward trend in corpus research on discourse analysis and indirect corpus applications for material design by language teachers. Research on the direct use of corpora and corpus tools in DDL, particularly in combination with genre-based EAP teaching, remains a relatively small fraction of the whole body of research in Chinese higher education settings. Such scarcity is highly related to the prevailing absence of systematic training in English academic writing registers within most Chinese universities' EAP syllabi due to the Chinese English Medium Instruction policy, where only English major students are mandated to submit English dissertations. Findings also revealed that Chinese learners still held mixed attitudes towards corpus tools influenced by learner differences, limited access to language corpora, and insufficient pre-training on corpus theoretical concepts, despite their improvements in final academic writing performance.

Keywords: corpus linguistics, data-driven learning, EAP, tertiary education in China

Procedia PDF Downloads 17
1155 A Weighted Group EI Incorporating Role Information for More Representative Group EI Measurement

Authors: Siyu Wang, Anthony Ward

Abstract:

Emotional intelligence (EI) is a well-established personal characteristic. It has been viewed as a critical factor which can influence an individual's academic achievement, ability to work and potential to succeed. When working in a group, EI is fundamentally connected to the group members' interaction and ability to work as a team. The ability of a group member to intelligently perceive and understand own emotions (Intrapersonal EI), to intelligently perceive and understand other members' emotions (Interpersonal EI), and to intelligently perceive and understand emotions between different groups (Cross-boundary EI) can be considered as Group emotional intelligence (Group EI). In this research, a more representative Group EI measurement approach, which incorporates the information of the composition of a group and an individual’s role in that group, is proposed. To demonstrate the claim of being more representative Group EI measurement approach, this study adopts a multi-method research design, involving a combination of both qualitative and quantitative techniques to establish a metric of Group EI. From the results, it can be concluded that by introducing the weight coefficient of each group member on group work into the measurement of Group EI, Group EI will be more representative and more capable of understanding what happens during teamwork than previous approaches.

Keywords: case study, emotional intelligence, group EI, multi-method research

Procedia PDF Downloads 102
1154 Historical Development of Negative Emotive Intensifiers in Hungarian

Authors: Martina Katalin Szabó, Bernadett Lipóczi, Csenge Guba, István Uveges

Abstract:

In this study, an exhaustive analysis was carried out about the historical development of negative emotive intensifiers in the Hungarian language via NLP methods. Intensifiers are linguistic elements which modify or reinforce a variable character in the lexical unit they apply to. Therefore, intensifiers appear with other lexical items, such as adverbs, adjectives, verbs, infrequently with nouns. Due to the complexity of this phenomenon (set of sociolinguistic, semantic, and historical aspects), there are many lexical items which can operate as intensifiers. The group of intensifiers are admittedly one of the most rapidly changing elements in the language. From a linguistic point of view, particularly interesting are a special group of intensifiers, the so-called negative emotive intensifiers, that, on their own, without context, have semantic content that can be associated with negative emotion, but in particular cases, they may function as intensifiers (e.g.borzasztóanjó ’awfully good’, which means ’excellent’). Despite their special semantic features, negative emotive intensifiers are scarcely examined in literature based on large Historical corpora via NLP methods. In order to become better acquainted with trends over time concerning the intensifiers, The exhaustively analysed a specific historical corpus, namely the Magyar TörténetiSzövegtár (Hungarian Historical Corpus). This corpus (containing 3 millions text words) is a collection of texts of various genres and styles, produced between 1772 and 2010. Since the corpus consists of raw texts and does not contain any additional information about the language features of the data (such as stemming or morphological analysis), a large amount of manual work was required to process the data. Thus, based on a lexicon of negative emotive intensifiers compiled in a previous phase of the research, every occurrence of each intensifier was queried, and the results were stored in a separate data frame. Then, basic linguistic processing (POS-tagging, lemmatization etc.) was carried out automatically with the ‘magyarlanc’ NLP-toolkit. Finally, the frequency and collocation features of all the negative emotive words were automatically analyzed in the corpus. Outcomes of the research revealed in detail how these words have proceeded through grammaticalization over time, i.e., they change from lexical elements to grammatical ones, and they slowly go through a delexicalization process (their negative content diminishes over time). What is more, it was also pointed out which negative emotive intensifiers are at the same stage in this process in the same time period. Giving a closer look to the different domains of the analysed corpus, it also became certain that during this process, the pragmatic role’s importance increases: the newer use expresses the speaker's subjective, evaluative opinion at a certain level.

Keywords: historical corpus analysis, historical linguistics, negative emotive intensifiers, semantic changes over time

Procedia PDF Downloads 197
1153 Towards a Large Scale Deep Semantically Analyzed Corpus for Arabic: Annotation and Evaluation

Authors: S. Alansary, M. Nagi

Abstract:

This paper presents an approach of conducting semantic annotation of Arabic corpus using the Universal Networking Language (UNL) framework. UNL is intended to be a promising strategy for providing a large collection of semantically annotated texts with formal, deep semantics rather than shallow. The result would constitute a semantic resource (semantic graphs) that is editable and that integrates various phenomena, including predicate-argument structure, scope, tense, thematic roles and rhetorical relations, into a single semantic formalism for knowledge representation. The paper will also present the Interactive Analysis​ tool for automatic semantic annotation (IAN). In addition, the cornerstone of the proposed methodology which are the disambiguation and transformation rules, will be presented. Semantic annotation using UNL has been applied to a corpus of 20,000 Arabic sentences representing the most frequent structures in the Arabic Wikipedia. The representation, at different linguistic levels was illustrated starting from the morphological level passing through the syntactic level till the semantic representation is reached. The output has been evaluated using the F-measure. It is 90% accurate. This demonstrates how powerful the formal environment is, as it enables intelligent text processing and search.

Keywords: semantic analysis, semantic annotation, Arabic, universal networking language

Procedia PDF Downloads 561
1152 Developing an Intonation Labeled Dataset for Hindi

Authors: Esha Banerjee, Atul Kumar Ojha, Girish Nath Jha

Abstract:

This study aims to develop an intonation labeled database for Hindi. Although no single standard for prosody labeling exists in Hindi, researchers in the past have employed perceptual and statistical methods in literature to draw inferences about the behavior of prosody patterns in Hindi. Based on such existing research and largely agreed upon intonational theories in Hindi, this study attempts to develop a manually annotated prosodic corpus of Hindi speech data, which can be used for training speech models for natural-sounding speech in the future. 100 sentences ( 500 words) each for declarative and interrogative types have been labeled using Praat.

Keywords: speech dataset, Hindi, intonation, labeled corpus

Procedia PDF Downloads 160
1151 Corpus-Based Description of Core English Nouns of Pakistani English, an EFL Learner Perspective at Secondary Level

Authors: Abrar Hussain Qureshi

Abstract:

Vocabulary has been highlighted as a key indicator in any foreign language learning program, especially English as a foreign language (EFL). It is often considered a potential tool in foreign language curriculum, and its deficiency impedes successful communication in the target language. The knowledge of the lexicon is very significant in getting communicative competence and performance. Nouns constitute a considerable bulk of English vocabulary. Rather, they are the bones of the English language and are the main semantic carrier in spoken and written discourse. As nouns dominate the bulk of the English lexicon, their role becomes all the more potential. The undertaken research is a systematic effort in this regard to work out a list of highly frequent list of Pakistani English nouns for the EFL learners at the secondary level. It will encourage autonomy for the EFL learners as well as will save their time. The corpus used for the research has been developed locally from leading English newspapers of Pakistan. Wordsmith Tools has been used to process the research data and to retrieve word list of frequent Pakistani English nouns. The retrieved list of core Pakistani English nouns is supposed to be useful for English language learners at the secondary level as it covers a wide range of speech events.

Keywords: corpus, EFL, frequency list, nouns

Procedia PDF Downloads 75
1150 A Corpus-Based Study of Subtitling Religious Words into Arabic

Authors: Yousef Sahari, Eisa Asiri

Abstract:

Hollywood films are produced in an open and liberal context, and when subtitling for a more conservative and closed society such as an Arabic society, religious words can pose a thorny challenge for subtitlers. Using a corpus of 90 Hollywood films released between 2000 and 2018 and applying insights from Descriptive Translation Studies (Toury, 1995, 2012) and the dichotomy of domestication and foreignization, this paper investigates three main research questions: (1) What are the dominant religious terms and functions in the English subtitles? (2) What are the dominant translation strategies used in the translation of religious words? (3) Do these strategies tend to be SL-oriented or TL-oriented (domesticating or foreignising)? To answer the research questions above, a quantitative and qualitative analysis of the corpus is conducted, in which the researcher adopts a self-designed, parallel, aligned corpus of ninety films and their Arabic subtitles. A quantitative analysis is performed to compare the frequencies and distribution of religious words, their functions, and the translation strategies employed by the subtitlers of ninety films, with the aim of identifying similarities or differences in addition to identifying the impact of functions of religious terms on the use of subtitling strategies. Based on the quantitative analysis, a qualitative analysis is performed to identify any translational patterns in Arabic translations of religious words and the possible reasons for subtitlers’ choices. The results show that the function of religious words has a strong influence on the choice of subtitling strategies. Also, it is found that foreignization strategies are applied in about two-thirds of the total occurrences of religious words.

Keywords: religious terms, subtitling, audiovisual translation, modern standard arabic, subtitling strategies, english-arabic subtitling

Procedia PDF Downloads 125
1149 Comparison of Verb Complementation Patterns in Selected Pakistani and British English Newspaper Social Columns: A Corpus-Based Study

Authors: Zafar Iqbal Bhatti

Abstract:

The present research aims to examine and evaluate the frequencies and practices of verb complementation patterns in English newspaper social columns published in Pakistan and Britain. The research will demonstrate that Pakistani English is a non-native variety of English having its own unique usual and logical characteristics, affected by way of the native languages and the culture, upon syntactic levels, making the variety users aware that any differences from British or American English that are systematic and regular, or another English language, are not even if they are unique, erroneous forms and typical characteristics of several kinds. The objectives are to examine the verb complementation patterns that British and Pakistani social columnists use in relation to their syntactic categories. Secondly, to compare the verb complementation patterns used in Pakistani and British English newspapers social columns. This study will figure out various verb complementation patterns in Pakistani and British English newspaper social columns and their occurrence and distribution. The word classes express different functions of words, such as action, event, or state of being. This research aims to evaluate whether there are any appreciable differences in the verb complementation patterns used in Pakistani and British English newspaper social columns. The results will show the number of varieties of verb complementation patterns in selected English newspapers social columns. This study will fill the gap of previous studies conducted in this field as they only explore a little about the differences between Pakistani and British English newspapers. It will also figure out a variety of languages used in Pakistani and British English journals, as well as regional and cultural values and variations. The researcher will use AntConc software in this study to extract the data for analysis. The researcher will use a concordance tool to identify verb complementation patterns in selected data. Then the researcher will manually categorize them because the same type of adverb can sometimes be used for various purposes. From 1st June 2022 to 30th Sep. 2022, a four-month written corpus of the social columns of PE and BE newspapers will be collected and analyzed. For the analysis of the research questions, 50 social columns will be selected from Pakistani newspapers and 50 from British newspapers. The researcher will collect a representative sample of data from Pakistani and British English newspaper social columns. The researcher will manually analyze the complementation patterns of each verb in each sentence, and then the researcher will determine how frequently each pattern occurs. The researcher will use syntactic characteristics of the verb complementation elements according to the description by Downing and Locke (2006). The researcher will examine all of the verb complementation patterns in the data, and the frequency and distribution of each verb complementation pattern will be evaluated using the software. The researcher will explore every possible verb complementation pattern in Pakistani and British English before calculating the occurrence and abundance of each verb pattern. The researcher will explore every possible verb complementation pattern in Pakistani English before calculating the frequency and distribution of each pattern.

Keywords: verb complementation, syntactic categories, newspaper social columns, corpus

Procedia PDF Downloads 17
1148 A Comparison of the First Language Vocabulary Used by Indonesian Year 4 Students and the Vocabulary Taught to Them in English Language Textbooks

Authors: Fitria Ningsih

Abstract:

This study concerns on the process of making corpus obtained from Indonesian year 4 students’ free writing compared to the vocabulary taught in English language textbooks. 369 students’ sample writings from 19 public elementary schools in Malang, East Java, Indonesia and 5 selected English textbooks were analyzed through corpus in linguistics method using AdTAT -the Adelaide Text Analysis Tool- program. The findings produced wordlists of the top 100 words most frequently used by students and the top 100 words given in English textbooks. There was a 45% match between the two lists. Furthermore, the classifications of the top 100 most frequent words from the two corpora based on part of speech found that both the Indonesian and English languages employed a similar use of nouns, verbs, adjectives, and prepositions. Moreover, to see the contextualizing the vocabulary of learning materials towards the students’ need, a depth-analysis dealing with the content and the cultural views from the vocabulary taught in the textbooks was discussed through the criteria developed from the checklist. Lastly, further suggestions are addressed to language teachers to understand the students’ background such as recognizing the basic words students acquire before teaching them new vocabulary in order to achieve successful learning of the target language.

Keywords: corpus, frequency, English, Indonesian, linguistics, textbooks, vocabulary, wordlists, writing

Procedia PDF Downloads 158
1147 Using Corpora in Semantic Studies of English Adjectives

Authors: Oxana Lukoshus

Abstract:

The methods of corpus linguistics, a well-established field of research, are being increasingly applied in cognitive linguistics. Corpora data are especially useful for different quantitative studies of grammatical and other aspects of language. The main objective of this paper is to demonstrate how present-day corpora can be applied in semantic studies in general and in semantic studies of adjectives in particular. Polysemantic adjectives have been the subject of numerous studies. But most of them have been carried out on dictionaries. Undoubtedly, dictionaries are viewed as one of the basic data sources, but only at the initial steps of a research. The author usually starts with the analysis of the lexicographic data after which s/he comes up with a hypothesis. In the research conducted three polysemantic synonyms true, loyal, faithful have been analyzed in terms of differences and similarities in their semantic structure. A corpus-based approach in the study of the above-mentioned adjectives involves the following. After the analysis of the dictionary data there was the reference to the following corpora to study the distributional patterns of the words under study – the British National Corpus (BNC) and the Corpus of Contemporary American English (COCA). These corpora are continually updated and contain thousands of examples of the words under research which make them a useful and convenient data source. For the purpose of this study there were no special needs regarding genre, mode or time of the texts included in the corpora. Out of the range of possibilities offered by corpus-analysis software (e.g. word lists, statistics of word frequencies, etc.), the most useful tool for the semantic analysis was the extracting a list of co-occurrence for the given search words. Searching by lemmas, e.g. true, true to, and grouping the results by lemmas have proved to be the most efficient corpora feature for the adjectives under the study. Following the search process, the corpora provided a list of co-occurrences, which were then to be analyzed and classified. Not every co-occurrence was relevant for the analysis. For example, the phrases like An enormous sense of responsibility to protect the minds and hearts of the faithful from incursions by the state was perceived to be the basic duty of the church leaders or ‘True,’ said Phoebe, ‘but I'd probably get to be a Union Official immediately were left out as in the first example the faithful is a substantivized adjective and in the second example true is used alone with no other parts of speech. The subsequent analysis of the corpora data gave the grounds for the distribution groups of the adjectives under the study which were then investigated with the help of a semantic experiment. To sum it up, the corpora-based approach has proved to be a powerful, reliable and convenient tool to get the data for the further semantic study.

Keywords: corpora, corpus-based approach, polysemantic adjectives, semantic studies

Procedia PDF Downloads 294