Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 13

Corpus Linguistics Related Abstracts

13 Critical Discourse Analysis of President Mamnoon Hussain Speech in the Joint Session of Parliament.

Authors: Saeed Qaisrani

Abstract:

This article briefly reviews the rise of Critical Discourse Analysis about the Pakistani President Mamnoon Hussain speech which delivered in the joint session of Parliament and teases out a detailed analysis of the various critiques that have been levelled at CDA and its practitioners over the last twenty years, both by scholars working within the “critical” paradigm and by other critics. A range of criticisms are discussed which target the underlying premises, the analytical methodology and the disputed areas of reader response and the integration of contextual factors. Controversial issues such as the predominantly negative focus of much CDA scholarship, and the status of CDA as an emergent “intellectual orthodoxy”, are also reviewed. The conclusions offer a summary of the principal criticisms that emerge from this overview, and suggest some ways in which these problems could be attenuated. It also focused on the different views about president speech and how it is presented in the Pakistani print and electronic media.

Keywords: Corpus Linguistics, Analytical Methodology, Contextualization, Critical Discourse Analysis, reader response theory, Critical paradigm

Procedia PDF Downloads 329
12 The Construction of Malaysian Airline Tragedies in Malaysian and British Online News: A Multidisciplinary Study

Authors: Theng Theng Ong

Abstract:

This study adopts a multidisciplinary method by combining the corpus-based discourse analysis study and language attitude study to explore the construction of Malaysia airline tragedies: MH370, MH17 and QZ8501 in the selected Malaysian and United Kingdom (UK) online news. The study aims to determine the ways in which Malaysian Airline tragedies MH370, MH17 and QZ8501 are linguistically defined and constructed in terms of keyword and collocation. The study also seeks to identify the types of discourse that are presented in the new articles. The differences or similarities in terms of keywords, topics or issues covered by the selected Malaysian and UK news media will also be examined. Finally, the language attitude study will be carried out to examine the Malaysia and UK university students’ attitudes toward the keywords, topics or issues covered by the selected Malaysian and UK news media pertaining to Malaysian Airline tragedies MH370, MH17 and QZ8501. The analysis is divided into two parts with the first part focusing on corpus-based discourse analysis on the media text. The second part of the study is to investigate Malaysians and UK news readers’ attitudes towards the online news being reported by the Malaysian and UK news media pertaining to the Airline tragedies. The main findings of corpus-based discourse analysis are essential in designing the questions in the questionnaires and interview and therefore led to the identification of the attitudes among Malaysian and UK news. This study adopts a multidisciplinary method by combining the corpus-based discourse analysis study and language attitude study to explore the construction of Malaysia airline tragedies: MH370, MH17 and QZ8501 in the selected Malaysian and United Kingdom (UK) online news. The study aims to determine the ways in which Malaysian Airline tragedies MH370, MH17 and QZ8501 are linguistically defined and constructed in terms of keyword and collocation. The study also seeks to identify the types of discourse that are presented in the new articles. The differences or similarities in terms of keywords, topics or issues covered by the selected Malaysian and UK news media will also be examined. Finally, the language attitude study will be carried out to examine the Malaysia and UK university students’ attitudes toward the keywords, topics or issues covered by the selected Malaysian and UK news media pertaining to Malaysian Airline tragedies MH370, MH17 and QZ8501. The analysis is divided into two parts with the first part focusing on corpus-based discourse analysis on the media text. The second part of the study is to investigate Malaysians and UK news readers’ attitudes towards the online news being reported by the Malaysian and UK news media pertaining to the Airline tragedies. The main findings of corpus-based discourse analysis are essential in designing the questions in the questionnaires and interview and therefore led to the identification of the attitudes among Malaysian and UK news.

Keywords: Corpus Linguistics, Critical Discourse Analysis, news media, tragedies study

Procedia PDF Downloads 236
11 The Women-In-Mining Discourse: A Study Combining Corpus Linguistics and Discourse Analysis

Authors: Ylva Fältholm, Cathrine Norberg

Abstract:

One of the major threats identified to successful future mining is that women do not find the industry attractive. Many attempts have been made, for example in Sweden and Australia, to create organizational structures and mining communities attractive to both genders. Despite such initiatives, many mining areas are developing into gender-segregated fly-in/fly out communities dominated by men with both social and economic consequences. One of the challenges facing many mining companies is thus to break traditional gender patterns and structures. To do this increased knowledge about gender in the context of mining is needed. Since language both constitutes and reproduces knowledge, increased knowledge can be gained through an exploration and description of the mining discourse from a gender perspective. The aim of this study is to explore what conceptual ideas are activated in connection to the physical/geographical mining area and to work within the mining industry. We use a combination of critical discourse analysis implying close reading of selected texts, such as policy documents, interview materials, applications and research and innovation agendas, and analyses of linguistic patterns found in large language corpora covering millions of words of contemporary language production. The quantitative corpus data serves as a point of departure for the qualitative analysis of the texts, that is, suggests what patterns to explore further. The study shows that despite technological and organizational development, one of the most persistent discourses about mining is the conception of dangerous and unfriendly areas infused with traditional notions of masculinity ideals and manual hard work. Although some of the texts analyzed highlight gender issues, and describe gender-equalizing initiatives, such as wage-mapping systems, female networks and recruitment efforts for women executives, and thereby render the discourse less straightforward, it is shown that these texts are not unambiguous examples of a counter-discourse. They rather illustrate that discourses are not stable but include opposing discourses, in dialogue with each other. For example, many texts highlight why and how women are important to mining, at the same time as they suggest that gender and diversity are all about women: why mining is a problem for them, how they should be, and what they should do to fit in. Drawing on a constitutive view of discourse, knowledge about such conflicting perceptions of women is a prerequisite for succeeding in attracting women to the mining industry and thereby contributing to the development of future mining.

Keywords: Gender, Mining, Corpus Linguistics, discourse

Procedia PDF Downloads 156
10 A Corpus-Assisted Discourse Analysis of Adjectival Collocation of the Word 'Education' in the American Context

Authors: Ngan Nguyen

Abstract:

The study analyses adjectives collocating with the word ‘education’ in the American language of the Corpus of Global Web-based English using a combination of corpus linguistic and discourse analytical methods to examine not only language patterns but also social political ideologies around the topic. Significant conclusions are deduced: (1) there are a large number of adjectival collocates of the word education which have been identified and classified into four categories representing four different aspects of education: level, quality, forms and types of education; (2) education, as in combination with three first categories, carries the meaning as the act and process of teaching and learning while with the last category having the meaning of a particular kind of teaching or training; (3) higher education is the topic that gains most concerns from the American public; (4) five most significant ideologies are discovered from the corpus: higher education associates with financial affairs, higher education is an industry, monetary policy of the government on higher education, people require greater accessibility to higher education and people value higher education. The study contributes to the field of developing meanings of words through corpus analysis and the field of discourse analysis.

Keywords: Education, Discourse Analysis, Corpus Linguistics, adjectival collocation, American context

Procedia PDF Downloads 172
9 Verb Bias in Mandarin: The Corpus Based Study of Children

Authors: Jou-An Chung

Abstract:

The purpose of this study is to investigate the verb bias of the Mandarin verbs in children’s reading materials and provide the criteria for categorization. Verb bias varies cross-linguistically. As Mandarin and English are typological different, this study hopes to shed light on Mandarin verb bias with the use of corpus and provide thorough and detailed criteria for analysis. Moreover, this study focuses on children’s reading materials since it is a significant issue in understanding children’s sentence processing. Therefore, investigating verb bias of Mandarin verbs in children’s reading materials is also an important issue and can provide further insights into children’s sentence processing. The small corpus is built up for this study. The corpus consists of the collection of school textbooks and Mandarin Daily News for children. The files are then segmented and POS tagged by JiebaR (Chinese segmentation with R). For the ease of analysis, the one-word character verbs and intransitive verbs are excluded beforehand. The total of 20 high frequency verbs are hand-coded and are further categorized into one of the three types, namely DO type, SC type and other category. If the frequency of taking Other Type exceeds the threshold of 25%, the verb is excluded from the study. The results show that 10 verbs are direct object bias verbs, and six verbs are sentential complement bias verbs. The paired T-test was done to assure the statistical significance (p = 0.0001062 for DO bias verb, p=0.001149 for SC bias verb). The result has shown that in children’s reading materials, the DO biased verbs are used more than the SC bias verbs since the simplest structure of sentences is easier for children’s sentence comprehension or processing. In sum, this study not only discussed verb bias in child's reading materials but also provided basic coding criteria for verb bias analysis in Mandarin and underscored the role of context. Sentences are easier for children’s sentence comprehension or processing. In sum, this study not only discussed verb bias in child corpus, but also provided basic coding criteria for verb bias analysis in Mandarin and underscored the role of context.

Keywords: psycholinguistics, Corpus Linguistics, child language, verb bias

Procedia PDF Downloads 105
8 Men Act, Women Are Acted Upon: Morphosyntactic Framing of the Sexual Intercourse in Online Pornography Titles

Authors: Aleksandra Tomic

Abstract:

According to reliable sources, 4% of all websites is devoted to pornographic material, yet these estimates are often reported to be much higher. The largest internet pornography streaming website reports 21.2 billion visits in 2015 only. Considering the ubiquity of online pornography and the frequency of use, it is necessary to examine its potential influence on the construal of the sexual act and the roles of participants. Apart from the verbal and physical interactions in the pornographic movies themselves, the language in the titles of movies has the power to frame the sexual intercourse. In this study, Critical Discourse Analysis and corpus linguistics approaches will be used to examine the way the sexual intercourse and the roles of the participants are ideologically construed and perpetuated in the Internet pornography discourse. To this end, the study will explore the association between the specific morphosyntactic aspects of the references to performers of both genders, the person and the thematic role, and the gender of referred performer in the corpus of online pornographic movie titles. Distinctive collexeme analysis will be conducted to uncover possible associations between for gender of the performer denoted by the linguistic expression, and the person and thematic role assigned to it in the titles of online pornography movies. Initial results of the chi-square procedure performed on a sample of 295 online pornography movie titles on the largest pornography streaming website ‘Pornhub’ yielded significant results. The use of the three person categories was not equally distributed between genders, X2 (2, N = 106) = 32.52, p < 0.001, with female performers being referred to in the third person in 71.7% of the instances, and speaking in the first person 20.8% of the time, whereas male performers spoke in the first person 68% of the time, and were referred to in the third person in 17% of the instances. Moreover, there was a gender disparity in the assignment of thematic roles, with linguistic expressions for women being assigned the Patient role and men the Agent role in 58.8% of the cases, whereas the roles were reversed in 41.2% of the instances, X2 (1, N = 262) = 8.07633, p < 0.005. The results are discussed in terms of the ideologies surrounding female and male sexuality in the pornography discourse. Potential patterns of power imbalance, objectification, and discrimination are highlighted. Finally, the evidence from psycholinguistic studies on the influence of the language structure on event construal is related to the results of the study.

Keywords: Gender studies, Corpus Linguistics, pornography, thematic roles

Procedia PDF Downloads 50
7 The Automatisation of Dictionary-Based Annotation in a Parallel Corpus of Old English

Authors: Ana Elvira Ojanguren Lopez, Javier Martin Arista

Abstract:

The aims of this paper are to present the automatisation procedure adopted in the implementation of a parallel corpus of Old English, as well as, to assess the progress of automatisation with respect to tagging, annotation, and lemmatisation. The corpus consists of an aligned parallel text with word-for-word comparison Old English-English that provides the Old English segment with inflectional form tagging (gloss, lemma, category, and inflection) and lemma annotation (spelling, meaning, inflectional class, paradigm, word-formation and secondary sources). This parallel corpus is intended to fill a gap in the field of Old English, in which no parallel and/or lemmatised corpora are available, while the average amount of corpus annotation is low. With this background, this presentation has two main parts. The first part, which focuses on tagging and annotation, selects the layouts and fields of lexical databases that are relevant for these tasks. Most information used for the annotation of the corpus can be retrieved from the lexical and morphological database Nerthus and the database of secondary sources Freya. These are the sources of linguistic and metalinguistic information that will be used for the annotation of the lemmas of the corpus, including morphological and semantic aspects as well as the references to the secondary sources that deal with the lemmas in question. Although substantially adapted and re-interpreted, the lemmatised part of these databases draws on the standard dictionaries of Old English, including The Student's Dictionary of Anglo-Saxon, An Anglo-Saxon Dictionary, and A Concise Anglo-Saxon Dictionary. The second part of this paper deals with lemmatisation. It presents the lemmatiser Norna, which has been implemented on Filemaker software. It is based on a concordance and an index to the Dictionary of Old English Corpus, which comprises around three thousand texts and three million words. In its present state, the lemmatiser Norna can assign lemma to around 80% of textual forms on an automatic basis, by searching the index and the concordance for prefixes, stems and inflectional endings. The conclusions of this presentation insist on the limits of the automatisation of dictionary-based annotation in a parallel corpus. While the tagging and annotation are largely automatic even at the present stage, the automatisation of alignment is pending for future research. Lemmatisation and morphological tagging are expected to be fully automatic in the near future, once the database of secondary sources Freya and the lemmatiser Norna have been completed.

Keywords: Corpus Linguistics, Historical Linguistics, parallel corpus, old English

Procedia PDF Downloads 47
6 Combining Corpus Linguistics and Critical Discourse Analysis to Study Power Relations in Hindi Newspapers

Authors: Niladri Sekhar Dash, Vandana Mishra, Jayshree Charkraborty

Abstract:

This present paper focuses on the application of corpus linguistics techniques for critical discourse analysis (CDA) of Hindi newspapers. While Corpus linguistics is the study of language as expressed in corpora (samples) of 'real world' text, CDA is an interdisciplinary approach to the study of discourse that views language as a form of social practice. CDA has mainly been studied from a qualitative perspective. However, we can say that recent studies have begun combining corpus linguistics with CDA in analyzing large volumes of text for the study of existing power relations in society. The corpus under our study is also of a sizable amount (1 million words of Hindi newspaper texts) and its analysis requires an alternative analytical procedure. So, we have combined both the quantitative approach i.e. the use of corpus techniques with CDA’s traditional qualitative analysis. In this context, we have focused on the Keyword Analysis Sorting Concordance Lines of the selected Keywords and calculating collocates of the keywords. We have made use of the Wordsmith Tool for all these analysis. The analysis starts with identifying the keywords in the political news corpus when compared with the main news corpus. The keywords are extracted from the corpus based on their keyness calculated through statistical tests like chi-squared test and log-likelihood test on the frequent words of the corpus. Some of the top occurring keywords are मोदी (Modi), भाजपा (BJP), कांग्रेस (Congress), सरकार (Government) and पार्टी (Political party). This is followed by the concordance analysis of these keywords which generates thousands of lines but we have to select few lines and examine them based on our objective. We have also calculated the collocates of the keywords based on their Mutual Information (MI) score. Both concordance and collocation help to identify lexical patterns in the political texts. Finally, all these quantitative results derived from the corpus techniques will be subjectively interpreted in accordance to the CDA’s theory to examine the ways in which political news discourse produces social and political inequality, power abuse or domination.

Keywords: Corpus Linguistics, Power Relations, Critical Discourse Analysis, Hindi newspapers

Procedia PDF Downloads 51
5 A Critical Discourse Analysis of the Construction of Artists' Reputation by Online Art Magazines

Authors: Thomas Soro, Tim Stott, Brendan O'Rourke

Abstract:

The construction of artistic reputation has been examined within sociology, philosophy, and economics but, baring a few noteworthy exceptions its discursive aspect has been largely ignored. This is particularly surprising given that contemporary artworks primarily rely on discourse to construct their ontological status. This paper contributes a discourse analytical perspective to the broad body of literature on artistic reputation by providing an understanding of how it is discursively constructed within the institutional context of online contemporary art magazines. This paper uses corpora compiled from the websites of e-flux and ARTnews, two leading online contemporary art magazines, to examine how these organisations discursively construct the reputation of artists. By constructing word-sketches of the term 'Artist', the paper identified the most significant modifiers attributed to artists and the most significant verbs which have 'artist' as an object or subject. The most significant results were analysed through concordances and demonstrated a somewhat surprising lack of evaluative representation. To examine this feature more closely, the paper then analysed three announcement texts from e-flux’s site and three review texts from ARTnews' site, comparing the use of modifiers and verbs in the representation of artists, artworks, and institutions. The results of this analysis support the corpus findings, suggesting that artists are rarely represented in evaluative terms. Based on the relatively high frequency of evaluation in the representation of artworks and institutions, these results suggest that there may be discursive norms at work in the field of online contemporary art magazines which regulate the use of verbs and modifiers in the evaluation of artists.

Keywords: Corpus Linguistics, contemporary art, Critical Discourse Analysis, symbolic capital

Procedia PDF Downloads 33
4 A Lexicographic Approach to Obstacles Identified in the Ontological Representation of the Tree of Life

Authors: Sandra Young

Abstract:

The biodiversity literature is vast and heterogeneous. In today’s data age, numbers of data integration and standardisation initiatives aim to facilitate simultaneous access to all the literature across biodiversity domains for research and forecasting purposes. Ontologies are being used increasingly to organise this information, but the rationalisation intrinsic to ontologies can hit obstacles when faced with the intrinsic fluidity and inconsistency found in the domains comprising biodiversity. Essentially the problem is a conceptual one: biological taxonomies are formed on the basis of specific, physical specimens yet nomenclatural rules are used to provide labels to describe these physical objects. These labels are ambiguous representations of the physical specimen. An example of this is with the genus Melpomene, the scientific nomenclatural representation of a genus of ferns, but also for a genus of spiders. The physical specimens for each of these are vastly different, but they have been assigned the same nomenclatural reference. While there is much research into the conceptual stability of the taxonomic concept versus the nomenclature used, to the best of our knowledge as yet no research has looked empirically at the literature to see the conceptual plurality or singularity of the use of these species’ names, the linguistic representation of a physical entity. Language itself uses words as symbols to represent real world concepts, whether physical entities or otherwise, and as such lexicography has a well-founded history in the conceptual mapping of words in context for dictionary making. This makes it an ideal candidate to explore this problem. The lexicographic approach uses corpus-based analysis to look at word use in context, with a specific focus on collocated word frequencies (the frequencies of words used in specific grammatical and collocational contexts). It allows for inconsistencies and contradictions in the source data and in fact includes these in the word characterisation so that 100% of the available evidence is counted. Corpus analysis is indeed suggested as one of the ways to identify concepts for ontology building, because of its ability to look empirically at data and show patterns in language usage, which can indicate conceptual ideas which go beyond words themselves. In this sense it could potentially be used to identify if the hierarchical structures present within the empirical body of literature match those which have been identified in ontologies created to represent them. The first stages of this research have revealed a hierarchical structure that becomes apparent in the biodiversity literature when annotating scientific species’ names, common names and more general names as classes, which will be the focus of this paper. The next step in the research is focusing on a larger corpus in which specific words can be analysed and then compared with existing ontological structures looking at the same material, to evaluate the methods by means of an alternative perspective. This research aims to provide evidence as to the validity of the current methods in knowledge representation for biological entities, and also shed light on the way that scientific nomenclature is used within the literature.

Keywords: Biodiversity, Ontology, Lexicography, Knowledge Representation, Corpus Linguistics

Procedia PDF Downloads 24
3 Semantic Preference across Research Articles: A Corpus-Based Study of Adjectives in English

Authors: Valdênia Carvalho e Almeida

Abstract:

The goal of the present study is to investigate the semantic preference of the most frequent adjectives in research articles through a corpus-based analysis of texts published in journals in Applied Linguistics (AL). The corpus used in this study contains texts published in the period from 2014 to 2018 in the three journals: Language Learning and Technology; English for Academic Purposes, and TESOL Quaterly, totaling more than one million words. A corpus-based analysis was carried out on the corpus to identify the most frequent adjectives that co-occurred in the three journals. By observing the concordance lines of the adjectives and analyzing the words they associated with, the semantic preferences of each adjective were determined. Later, the AL corpus analysis was compared to the investigation of the same adjectives in a corpus of Chemistry. This second part of the study aimed to identify possible differences and similarities between the two corpora in relation to the use of the adjectives in research articles from both areas. The results show that there are some preferences which seem to be closely related not only to the academic genre of the texts but also to the specific domain of the discipline and, to a lesser extent, to the context of research in each journal. This research illustrates a possible contribution of Corpus Linguistics to explore the concept of semantic preference in more detail, considering the complex nature of the phenomenon.

Keywords: Chemistry, Applied Linguistics, Corpus Linguistics, research article, semantic preference

Procedia PDF Downloads 11
2 Investigating Iraqi EFL University Students' Productive Knowledge of Grammatical Collocations in English

Authors: Adnan Z. Mkhelif

Abstract:

Grammatical collocations (GCs) are word combinations containing a preposition or a grammatical structure, such as an infinitive (e.g. smile at, interested in, easy to learn, etc.). Such collocations tend to be difficult for Iraqi EFL university students (IUS) to master. To help address this problem, it is important to identify the factors causing it. This study aims at investigating the effects of L2 proficiency, frequency of GCs and their transparency on IUSs’ productive knowledge of GCs. The study involves 112 undergraduate participants with different proficiency levels, learning English in formal contexts in Iraq. The data collection instruments include (but not limited to) a productive knowledge test (designed by the researcher using the British National Corpus (BNC)), as well as the grammar part of the Oxford Placement Test (OPT). The study findings have shown that all the above-mentioned factors have significant effects on IUSs’ productive knowledge of GCs. In addition to establishing evidence of which factors of L2 learning might be relevant to learning GCs, it is hoped that the findings of the present study will contribute to more effective methods of teaching that can better address and help overcome the problems IUSs encounter in learning GCs. The study is thus hoped to have significant theoretical and pedagogical implications for researchers, syllabus designers as well as teachers of English as a foreign/second language.

Keywords: Transparency, Corpus Linguistics, Frequency, Proficiency, L2 vocabulary learning, grammatical collocations, productive knowledge

Procedia PDF Downloads 117
1 Words of Peace in the Speeches of the Egyptian President, Abdulfattah El-Sisi: A Corpus-Based Study

Authors: Mohamed S. Negm, Waleed S. Mandour

Abstract:

The present study aims primarily at investigating words of peace (lexemes of peace) in the formal speeches of the Egyptian president Abdulfattah El-Sisi in a two-year span of time, from 2018 to 2019. This paper attempts to shed light not only on the contextual use of the antonyms, war and peace, but also it underpins quantitative analysis through the current methods of corpus linguistics. As such, the researchers have deployed a corpus-based approach in collecting, encoding, and processing 30 presidential speeches over the stated period (23,411 words and 25,541 tokens in total). Further, semantic fields and collocational networkzs are identified and compared statistically. Results have shown a significant propensity of adopting peace, including its relevant collocation network, textually and therefore, ideationally, at the expense of war concept which in most cases surfaces euphemistically through the noun conflict. The president has not justified the action of war with an honorable cause or a valid reason. Such results, so far, have indicated a positive sociopolitical mindset the Egyptian president possesses and moreover, reveal national and international fair dealing on arising issues.

Keywords: Corpus Linguistics, Critical Discourse Analysis, CADS, collocation network

Procedia PDF Downloads 2