Search results for: corpus studies
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 8868

Search results for: corpus studies

8808 The Advancements of Transformer Models in Part-of-Speech Tagging System for Low-Resource Tigrinya Language

Authors: Shamm Kidane, Ibrahim Abdella, Fitsum Gaim, Simon Mulugeta, Sirak Asmerom, Natnael Ambasager, Yoel Ghebrihiwot

Abstract:

The call for natural language processing (NLP) systems for low-resource languages has become more apparent than ever in the past few years, with the arduous challenges still present in preparing such systems. This paper presents an improved dataset version of the Nagaoka Tigrinya Corpus for Parts-of-Speech (POS) classification system in the Tigrinya language. The size of the initial Nagaoka dataset was incremented, totaling the new tagged corpus to 118K tokens, which comprised the 12 basic POS annotations used previously. The additional content was also annotated manually in a stringent manner, followed similar rules to the former dataset and was formatted in CONLL format. The system made use of the novel approach in NLP tasks and use of the monolingually pre-trained TiELECTRA, TiBERT and TiRoBERTa transformer models. The highest achieved score is an impressive weighted F1-score of 94.2%, which surpassed the previous systems by a significant measure. The system will prove useful in the progress of NLP-related tasks for Tigrinya and similarly related low-resource languages with room for cross-referencing higher-resource languages.

Keywords: Tigrinya POS corpus, TiBERT, TiRoBERTa, conditional random fields

Procedia PDF Downloads 64
8807 Translating the Gendered Discourse: A Corpus-Based Study of the Chinese Science Fiction The Three Body Problem

Authors: Yi Gu

Abstract:

The Three-Body Problem by Cixin Liu has been a bestseller Chinese Sci-Fi novel for years since 2008. The book was translated into English by Ken Liu in 2014 and won the prestigious 2015 science fiction and fantasy writing Hugo Award, drawing greater attention from wider international communities. The story exposes the horrors of the Chinese Cultural Revolution in the 1960s, in an intriguing narrative for readers at home and abroad. However, without the access to the source text, western readers may not be aware that the original Chinese version of the book is rich in gender-bias. Some Chinese scholars have applied feminist translation theories to their analysis on this book before, based on isolated selected, cherry-picking examples. Thus this paper aims to obtain a more thorough picture of how translators can cope with gender discrimination and reshape the gendered discourse from the source text, by systematically investigating the lexical and syntactic patterns in the translation of Liu’s entire book of 400 pages. The source text and the translation were downloaded into digital files, automatically aligned at paragraph level and then manually post-edited. They were then compiled into a parallel corpus of 114,629 English words and 204,145 Chinese characters using Sketch Engine. Gender-discrimination markers such as the overuse of ‘girl’ to describe an adult woman were searched in the source text, and the alignment made it possible to identify the strategies adopted by the translator to mitigate gender discrimination. The results provide a framework for translators to address gender bias. The study also shows how corpus methods can be used to further research in feminist translation and critical discourse analysis.

Keywords: corpus, discourse analysis, feminist translation, science fiction translation

Procedia PDF Downloads 233
8806 Prostitution in Colonial Bengal: Autobiographical Articulations and Fictional Representations

Authors: Aparna Bandyopadhyay

Abstract:

The proposed paper will examine how prostitution produced a vast corpus of literature in colonial Bengal. This corpus included autobiographical accounts by prostitutes themselves. While the authenticity of some of these has, at times, been doubted by contemporary observers, the sheer magnitude of such narrative prose demands critical attention. Many of these autobiographical narratives focused on the prostitute’s early life within respectable society and then proceeded to delineate the transgressions and the inescapable chain of circumstances that eventually rendered her a prostitute. Significantly, these serve to corroborate the findings of official investigations regarding the circumstances that led upper-caste Hindu women in Bengal to embrace prostitution in this period. The literary corpus that dwelt on prostitution also included a vast volume of fiction penned by celebrated writers. These foregrounded a prostitute as the central protagonist, telling the life-stories of prostitutes and the circumstances that made them what they were. Novels and short stories often represented the prostitute as an affective being – an individual capable of deep emotions despite her profession. She was seldom a person who had voluntarily embraced prostitution. She was always a figure of helplessness and suffering, a woman whose desire to love and be loved transcended the carnality of her livelihood. She was an outcast, but she experienced the entire repertoire of emotions experienced by her respectable counterparts. The proposed paper will examine the trends and characteristics of the available repertoire of prostitute-oriented literature in late colonial Bengal. It will begin by focusing on the existing perspectives on the origins of prostitution in late colonial Bengal. It will proceed to discuss the literary corpus supposedly penned by prostitutes themselves and then focus on the manner in which some of the stalwarts of high literature represented the prostitute in their literary creations.

Keywords: emotions, literature, prostitution, transgression

Procedia PDF Downloads 90
8805 A Corpus-Based Analysis of Japanese Learners' English Modal Auxiliary Verb Usage in Writing

Authors: S. Nakayama

Abstract:

For non-native English speakers, using English modal auxiliary verbs appropriately can be among the most challenging tasks. This research sought to identify differences in modal verb usage between Japanese non-native English speakers (JNNS) and native speakers (NS) from two different perspectives: frequency of use and distribution of verb phrase structures (VPS) where modal verbs occur. This study can contribute to the identification of JNNSs' interlanguage with regard to modal verbs; the main aim is to make a suggestion for the improvement of teaching materials as well as to help language teachers to be able to teach modal verbs in a way that is helpful for learners. To address the primary question in this study, usage of nine central modals (‘can’, ‘could’, ‘may’, ‘might’, ‘shall’, ‘should’, ‘will’, ‘would’, and ‘must’) by JNNS was compared with that by NSs in the International Corpus Network of Asian Learners of English (ICNALE). This corpus is one of the largest freely-available corpora focusing on Asian English learners’ language use. The ICNALE corpus consists of four modules: ‘Spoken Monologue’, ‘Spoken Dialogue’, ‘Written Essays’, and ‘Edited Essays’. Among these, this research adopted the ‘Written Essays’ module only, which is the set of 200-300 word essays and contains approximately 1.3 million words in total. Frequency analysis revealed gaps as well as similarities in frequency order. Specifically, both JNNSs and NSs used ‘can’ with the most frequency, followed by ‘should’ and ‘will’; however, usage of all the other modals except for ‘shall’ was not identical to each other. A log-likelihood test uncovered JNNSs’ overuse of ‘can’ and ‘must’ as well as their underuse of ‘will’ and ‘would’. VPS analysis revealed that JNNSs used modal verbs in a relatively narrow range of VPSs as compared to NSs. Results showed that JNNSs used most of the modals with bare infinitives or the passive voice only whereas NSs used the modals in a wide range of VPSs including the progressive construction and the perfect aspect, both of which were the structures where JNNSs rarely used the modals. Results of frequency analysis suggest that language teachers or teaching materials should explain other modality items so that learners can avoid relying heavily on certain modals and have a wide range of lexical items to reflect their feelings more accurately. Besides, the underused modals should be more stressed in the classroom because they are members of epistemic modals, which allow us to not only interject our views into propositions but also build a relationship with readers. As for VPSs, teaching materials should present more examples of the modals occurring in a wide range of VPSs to help learners to be able to express their opinions from a variety of viewpoints.

Keywords: corpus linguistics, Japanese learners of English, modal auxiliary verbs, International Corpus Network of Asian Learners of English

Procedia PDF Downloads 110
8804 Track and Evaluate Cortical Responses Evoked by Electrical Stimulation

Authors: Kyosuke Kamada, Christoph Kapeller, Michael Jordan, Mostafa Mohammadpour, Christy Li, Christoph Guger

Abstract:

Cortico-cortical evoked potentials (CCEP) refer to responses generated by cortical electrical stimulation at distant brain sites. These responses provide insights into the functional networks associated with language or motor functions, and in the context of epilepsy, they can reveal pathological networks. Locating the origin and spread of seizures within the cortex is crucial for pre-surgical planning. This process can be enhanced by employing cortical stimulation at the seizure onset zone (SOZ), leading to the generation of CCEPs in remote brain regions that may be targeted for disconnection. In the case of a 24-year-old male patient suffering from intractable epilepsy, corpus callosotomy was performed as part of the treatment. DTI-MRI imaging, conducted using a 3T MRI scanner for fiber tracking, along with CCEP, is used as part of an assessment for surgical planning. Stimulation of the SOZ, with alternating monophasic pulses of 300µs duration and 15mA current intensity, resulted in CCEPs on the contralateral frontal cortex, reaching a peak amplitude of 206µV with a latency of 31ms, specifically in the left pars triangularis. The related fiber tracts were identified with a two-tensor unscented Kalman filter (UKF) technique, showing transversal fibers through the corpus callosum. The CCEPs were monitored through the progress of the surgery. Notably, the SOZ-associated CCEPs exhibited a reduction following the resection of the anterior portion of the corpus callosum, reaching the identified connecting fibers. This intervention demonstrated a potential strategy for mitigating the impact of intractable epilepsy through targeted disconnection of identified cortical regions.

Keywords: CCEP, SOZ, Corpus callosotomy, DTI

Procedia PDF Downloads 34
8803 The Development of Chinese-English Homophonic Word Pairs Databases for English Teaching and Learning

Authors: Yuh-Jen Wu, Chun-Min Lin

Abstract:

Homophonic words are common in Mandarin Chinese which belongs to the tonal language family. Using homophonic cues to study foreign languages is one of the learning techniques of mnemonics that can aid the retention and retrieval of information in the human memory. When learning difficult foreign words, some learners transpose them with words in a language they are familiar with to build an association and strengthen working memory. These phonological clues are beneficial means for novice language learners. In the classroom, if mnemonic skills are used at the appropriate time in the instructional sequence, it may achieve their maximum effectiveness. For Chinese-speaking students, proper use of Chinese-English homophonic word pairs may help them learn difficult vocabulary. In this study, a database program is developed by employing Visual Basic. The database contains two corpora, one with Chinese lexical items and the other with English ones. The Chinese corpus contains 59,053 Chinese words that were collected by a web crawler. The pronunciations of this group of words are compared with words in an English corpus based on WordNet, a lexical database for the English language. Words in both databases with similar pronunciation chunks and batches are detected. A total of approximately 1,000 Chinese lexical items are located in the preliminary comparison. These homophonic word pairs can serve as a valuable tool to assist Chinese-speaking students in learning and memorizing new English vocabulary.

Keywords: Chinese, corpus, English, homophonic words, vocabulary

Procedia PDF Downloads 155
8802 The Mirage of Progress? a Longitudinal Study of Japanese Students’ L2 Oral Grammar

Authors: Robert Long, Hiroaki Watanabe

Abstract:

This longitudinal study examines the grammatical errors of Japanese university students’ dialogues with a native speaker over an academic year. The L2 interactions of 15 Japanese speakers were taken from the JUSFC2018 corpus (April/May 2018) and the JUSFC2019 corpus (January/February). The corpora were based on a self-introduction monologue and a three-question dialogue; however, this study examines the grammatical accuracy found in the dialogues. Research questions focused on a possible significant difference in grammatical accuracy from the first interview session in 2018 and the second one the following year, specifically regarding errors in clauses per 100 words, global errors and local errors, and with specific errors related to parts of speech. The investigation also focused on which forms showed the least improvement or had worsened? Descriptive statistics showed that error-free clauses/errors per 100 words decreased slightly while clauses with errors/100 words increased by one clause. Global errors showed a significant decline, while local errors increased from 97 to 158 errors. For errors related to parts of speech, a t-test confirmed there was a significant difference between the two speech corpora with more error frequency occurring in the 2019 corpus. This data highlights the difficulty in having students self-edit themselves.

Keywords: clause analysis, global vs. local errors, grammatical accuracy, L2 output, longitudinal study

Procedia PDF Downloads 106
8801 Tracing the Developmental Repertoire of the Progressive: Evidence from L2 Construction Learning

Authors: Tianqi Wu, Min Wang

Abstract:

Research investigating language acquisition from a constructionist perspective has demonstrated that language is learned as constructions at various linguistic levels, which is related to factors of frequency, semantic prototypicality, and form-meaning contingency. However, previous research on construction learning tended to focus on clause-level constructions such as verb argument constructions but few attempts were made to study morpheme-level constructions such as the progressive construction, which is regarded as a source of acquisition problems for English learners from diverse L1 backgrounds, especially for those whose L1 do not have an equivalent construction such as German and Chinese. To trace the developmental trajectory of Chinese EFL learners’ use of the progressive with respect to verb frequency, verb-progressive contingency, and verbal prototypicality and generality, a learner corpus consisting of three sub-corpora representing three different English proficiency levels was extracted from the Chinese Learners of English Corpora (CLEC). As the reference point, a native speakers’ corpus extracted from the Louvain Corpus of Native English Essays was also established. All the texts were annotated with C7 tagset by part-of-speech tagging software. After annotation all valid progressive hits were retrieved with AntConc 3.4.3 followed by a manual check. Frequency-related data showed that from the lowest to the highest proficiency level, (1) the type token ratio increased steadily from 23.5% to 35.6%, getting closer to 36.4% in the native speakers’ corpus, indicating a wider use of verbs in the progressive; (2) the normalized entropy value rose from 0.776 to 0.876, working towards the target score of 0.886 in native speakers’ corpus, revealing that upper-intermediate learners exhibited a more even distribution and more productive use of verbs in the progressive; (3) activity verbs (i.e., verbs with prototypical progressive meanings like running and singing) dropped from 59% to 34% but non-prototypical verbs such as state verbs (e.g., being and living) and achievement verbs (e.g., dying and finishing) were increasingly used in the progressive. Apart from raw frequency analyses, collostructional analyses were conducted to quantify verb-progressive contingency and to determine what verbs were distinctively associated with the progressive construction. Results were in line with raw frequency findings, which showed that contingency between the progressive and non-prototypical verbs represented by light verbs (e.g., going, doing, making, and coming) increased as English proficiency proceeded. These findings altogether suggested that beginning Chinese EFL learners were less productive in using the progressive construction: they were constrained by a small set of verbs which had concrete and typical progressive meanings (e.g., the activity verbs). But with English proficiency increasing, their use of the progressive began to spread to marginal members such as the light verbs.

Keywords: Construction learning, Corpus-based, Progressives, Prototype

Procedia PDF Downloads 108
8800 A Corpus-based Study of Adjuncts in Colombian English as a Second Language (ESL) Argumentative Essays

Authors: E. Velasco

Abstract:

Meeting high standards of writing in a Second Language (L2) is extremely important for many students who wish to undertake studies at universities in both English and non-English speaking countries. University lecturers in English speaking countries continue to express dissatisfaction with the apparent poor quality of essay writing skills displayed by English as a Second Language (ESL) students, whose essays are often criticised for their lack of cohesion and coherence. These critiques have extended to contexts such as Colombia, where many ESL students are criticised for their inability to write high-quality academic texts in L2-English, particularly at the tertiary level. If Colombian ESL students are expected to meet high standards of writing when studying locally and abroad, it makes sense to carry out specific research that can perhaps lead to recommendations to support their quest for improving argumentative strategies. Employing Corpus Linguistics methods within a Learner Corpus Research framework, and a combination of Log-Likelihood and Bayes Factor measures, this paper investigated argumentative essays written by Colombian ESL students. The study specifically aimed to analyse conjunctive adjuncts in argumentative essays to find out how Colombian ESL students connect their ideas in discourse. Results suggest that a) Colombian ESL learners need explicit instruction on specific areas of conjunctive adjuncts to counteract overuse, underuse and misuse; b) underuse of endophoric and evidential adjuncts highlights gaps between IELTS-like essays and good quality tertiary-level essays and published papers, and these gaps are linked to prior knowledge brought into writing task, rhetorical functions in writing, and research processes before writing takes place; c) both Colombian ESL learners and L1-English writers (in a reference corpus) overuse some adjuncts and underuse endophoric and evidential adjuncts, when compared to skilled L1-English and L2-English writers, so differences in frequencies of adjuncts has little to do with the writers’ L1, and differences are rather linked to types of essays writers produce (e.g. ESL vs. university essays). Ender Velasco: The pedagogical recommendations deriving from the study are that: a) Colombian ESL learners need to be shown that overuse is not the only way of giving cohesion to argumentative essays and there are other alternatives to cohesion (e.g., implicit adjuncts, lexical chains and collocations); b) syllabi and classroom input need to raise awareness of gaps in writing skills between IELTS-like and tertiary-level argumentative essays, and of how endophoric and evidential adjuncts are used to refer to anaphoric and cataphoric sections of essays, and to other people’s work or ideas; c) syllabi and classroom input need to include essay-writing tasks based on previous research/reading which learners need to incorporate into their arguments, and tasks that raise awareness of referencing systems (e.g., APA); d) classroom input needs to include explicit instruction on use of punctuation, functions and/or syntax with specific conjunctive adjuncts such as for example, for that reason, although, despite and nevertheless.

Keywords: argumentative essays, colombian english as a second language (esl) learners, conjunctive adjuncts, corpus linguistics

Procedia PDF Downloads 51
8799 Identification of Text Domains and Register Variation through the Analysis of Lexical Distribution in a Bangla Mass Media Text Corpus

Authors: Mahul Bhattacharyya, Niladri Sekhar Dash

Abstract:

The present research paper is an experimental attempt to investigate the nature of variation in the register in three major text domains, namely, social, cultural, and political texts collected from the corpus of Bangla printed mass media texts. This present study uses a corpus of a moderate amount of Bangla mass media text that contains nearly one million words collected from different media sources like newspapers, magazines, advertisements, periodicals, etc. The analysis of corpus data reveals that each text has certain lexical properties that not only control their identity but also mark their uniqueness across the domains. At first, the subject domains of the texts are classified into two parameters namely, ‘Genre' and 'Text Type'. Next, some empirical investigations are made to understand how the domains vary from each other in terms of lexical properties like both function and content words. Here the method of comparative-cum-contrastive matching of lexical load across domains is invoked through word frequency count to track how domain-specific words and terms may be marked as decisive indicators in the act of specifying the textual contexts and subject domains. The study shows that the common lexical stock that percolates across all text domains are quite dicey in nature as their lexicological identity does not have any bearing in the act of specifying subject domains. Therefore, it becomes necessary for language users to anchor upon certain domain-specific lexical items to recognize a text that belongs to a specific text domain. The eventual findings of this study confirm that texts belonging to different subject domains in Bangla news text corpus clearly differ on the parameters of lexical load, lexical choice, lexical clustering, lexical collocation. In fact, based on these parameters, along with some statistical calculations, it is possible to classify mass media texts into different types to mark their relation with regard to the domains they should actually belong. The advantage of this analysis lies in the proper identification of the linguistic factors which will give language users a better insight into the method they employ in text comprehension, as well as construct a systemic frame for designing text identification strategy for language learners. The availability of huge amount of Bangla media text data is useful for achieving accurate conclusions with a certain amount of reliability and authenticity. This kind of corpus-based analysis is quite relevant for a resource-poor language like Bangla, as no attempt has ever been made to understand how the structure and texture of Bangla mass media texts vary due to certain linguistic and extra-linguistic constraints that are actively operational to specific text domains. Since mass media language is assumed to be the most 'recent representation' of the actual use of the language, this study is expected to show how the Bangla news texts reflect the thoughts of the society and how they leave a strong impact on the thought process of the speech community.

Keywords: Bangla, corpus, discourse, domains, lexical choice, mass media, register, variation

Procedia PDF Downloads 155
8798 The Diary of Dracula, by Marin Mincu: Inquiries into a Romanian 'Book of Wisdom' as a Fictional Counterpart for Corpus Hermeticum

Authors: Lucian Vasile Bagiu, Paraschiva Bagiu

Abstract:

The novel written in Italian and published in Italy in 1992 by the Romanian scholar Marin Mincu is meant for the foreign reader, aiming apparently at a better knowledge of the historical character of Vlad the Empalor (Vlad Dracul), within the European cultural, political and historical context of 1463. Throughout the very well written tome, one comes to realize that one of the underlining levels of the fiction is the exposing of various fundamental features of the Romanian culture and civilization. The author of the diary, Dracula, makes mention of Corpus Hermeticum no less than fifteen times, suggesting his own diary is some sort of a philosophical counterpart. The essay focuses on several ‘truths’ and ‘wisdom’ revealed in the fictional teachings of Dracula. The boycott of History by the Romanians is identified as an echo of the philosophical approach of the famous Romanian scholar and writer Lucian Blaga. The orality of the Romanian culture is a landmark opposed to written culture of the Western Europe. The religion of the ancient Dacian God Zalmoxis is seen as the basis for the Romanian existential and/or metaphysical ethnic philosophy (a feature tackled by the famous Romanian historian of religion Mircea Eliade), with a suggestion that Hermes Trismegistus may have written his Corpus Hermeticum being influenced by Zalmoxis. The historical figure of the last Dacian king Decebalus (death 106 AD) is a good pretext for a tantalizing Indo-European suggestion that the prehistoric Thraco-Dacian people may have been the ancestors of the first Romans settled in Latium. The lost diary of the Emperor Trajan The Bello Dacico may have proved that the unknown language of the Dacians was very much alike Latin language (a secret well hidden by the Vatican). The attitude towards death of the Dacians, as described by Herodotus, may have later inspired Pitagora, Socrates, the Eleusinian and Orphic Mysteries, etc. All of these within the Humanistic and Renascentist European context of the epoch, Dracula having a close relationship with scholars such as Nicolaus Cusanus, Cosimo de Medici, Marsilio Ficino, Pope Pius II, etc. Thus The Diary of Dracula turns out as exciting and stupefying as Corpus Hermeticum, a book impossible to assimilate entirely, yet a reference not wise to be ignored.

Keywords: Corpus Hermeticum, Dacians, Dracula, Zalmoxis

Procedia PDF Downloads 135
8797 Use of Ing-Formed and Derived Verbal Nominalization in American English: A Survey Applied to Native American English Speakers

Authors: Yujia Sun

Abstract:

Research on nominalizations in English can be traced back to at least the 1960s and even centered in the field nowadays. At the very beginning, the discussion was about the relationship between verbs and nouns, but then it moved to the distinct senses embodied in different forms of nominals, namely, various types of nominalizations. This paper tries to address the issue that how speakers perceive different forms of verbal nouns, and what might influence their perceptions. The data are collected through a self-designed questionnaire targeted at native speakers of American English, and the employment of the Corpus of Contemporary American English (COCA). The results show that semantic differences between different forms of nominals do play a role in people’s preference to certain form than another. But it still awaits more explorations to see how the frequency of usage is interrelates to this issue.

Keywords: corpus of contemporary American English, derived nominalization, frequency of usage, ing-formed nominalization

Procedia PDF Downloads 154
8796 Direct Translation vs. Pivot Language Translation for Persian-Spanish Low-Resourced Statistical Machine Translation System

Authors: Benyamin Ahmadnia, Javier Serrano

Abstract:

In this paper we compare two different approaches for translating from Persian to Spanish, as a language pair with scarce parallel corpus. The first approach involves direct transfer using an statistical machine translation system, which is available for this language pair. The second approach involves translation through English, as a pivot language, which has more translation resources and more advanced translation systems available. The results show that, it is possible to achieve better translation quality using English as a pivot language in either approach outperforms direct translation from Persian to Spanish. Our best result is the pivot system which scores higher than direct translation by (1.12) BLEU points.

Keywords: statistical machine translation, direct translation approach, pivot language translation approach, parallel corpus

Procedia PDF Downloads 462
8795 A Framework for Chinese Domain-Specific Distant Supervised Named Entity Recognition

Authors: Qin Long, Li Xiaoge

Abstract:

The Knowledge Graphs have now become a new form of knowledge representation. However, there is no consensus in regard to a plausible and definition of entities and relationships in the domain-specific knowledge graph. Further, in conjunction with several limitations and deficiencies, various domain-specific entities and relationships recognition approaches are far from perfect. Specifically, named entity recognition in Chinese domain is a critical task for the natural language process applications. However, a bottleneck problem with Chinese named entity recognition in new domains is the lack of annotated data. To address this challenge, a domain distant supervised named entity recognition framework is proposed. The framework is divided into two stages: first, the distant supervised corpus is generated based on the entity linking model of graph attention neural network; secondly, the generated corpus is trained as the input of the distant supervised named entity recognition model to train to obtain named entities. The link model is verified in the ccks2019 entity link corpus, and the F1 value is 2% higher than that of the benchmark method. The re-pre-trained BERT language model is added to the benchmark method, and the results show that it is more suitable for distant supervised named entity recognition tasks. Finally, it is applied in the computer field, and the results show that this framework can obtain domain named entities.

Keywords: distant named entity recognition, entity linking, knowledge graph, graph attention neural network

Procedia PDF Downloads 71
8794 An Automatic Speech Recognition Tool for the Filipino Language Using the HTK System

Authors: John Lorenzo Bautista, Yoon-Joong Kim

Abstract:

This paper presents the development of a Filipino speech recognition tool using the HTK System. The system was trained from a subset of the Filipino Speech Corpus developed by the DSP Laboratory of the University of the Philippines-Diliman. The speech corpus was both used in training and testing the system by estimating the parameters for phonetic HMM-based (Hidden-Markov Model) acoustic models. Experiments on different mixture-weights were incorporated in the study. The phoneme-level word-based recognition of a 5-state HMM resulted in an average accuracy rate of 80.13 for a single-Gaussian mixture model, 81.13 after implementing a phoneme-alignment, and 87.19 for the increased Gaussian-mixture weight model. The highest accuracy rate of 88.70% was obtained from a 5-state model with 6 Gaussian mixtures.

Keywords: Filipino language, Hidden Markov Model, HTK system, speech recognition

Procedia PDF Downloads 445
8793 A Corpus-Based Study of Evaluative Language in Leading Articles in British Broadsheet and Tabloid Newspapers

Authors: Fatimah AlSaiari

Abstract:

In recent years, newspapers in the United Kingdom have been no longer just a means of sharing news about what happens in the world; they are also used to influence target readers by having them become more up-to-date, well-informed, entertained, exasperated, delighted, and infuriated. To achieve these objectives and maintain influence on public opinion, journalists use a particular language in which they can convey emotions and opinions, organize their discourse, and establish solidarity with their audience. This type of language has been widely analyzed under different labels, such as evaluation, appraisal, and stance. There is a considerable amount of linguistic and non-linguistic research devoted to analyzing this type of interpersonal language in journalistic discourse, and most of these studies were carried out to challenge the traditional assumptions of the objectivity and impartiality of news reporting. However, very little research has been undertaken on evaluative language in newspaper institutional editorials, and there is hardly any systematic or exhaustive analysis of this type of language in British tabloid and broadsheet newspapers. This study will attempt to provide new insights into the nature of authorial and non-authorial evaluation in leading articles in popular and quality British newspapers, along with their targets, sources, and discourse functions. The study will also attempt to develop a framework of evaluation that can be applied to evaluative lexical items in newspaper opinion texts. The framework is both theory-driven (i.e., it builds on and modifies previous frameworks of evaluation such as appraisal theory and parameter-based approach) and data-driven (i.e., it elicits the evaluative categories from the analysis of the corpus, which helps in the development of the current framework). To achieve this aim, a corpus of 140 leading articles were selected. The findings revealed that the tabloids tended to express their stance through explicitness, dramatization, frequent reference to social actors’ emotions and beliefs, and exaggeration in negativity, while the broadsheets preferred to express their stance through mitigation ambiguity and implicitness. conceptual themes and propositions were more preferable targets for expressing stance in the broadsheets while human behavior and characters were preferable targets for the tabloids.

Keywords: appraisal theory, evaluative language, British newspapers, broadsheets & tabloids, evaluative adjectives

Procedia PDF Downloads 269
8792 The Power of Words: A Corpus Analysis of Campaign Speeches of President Donald J. Trump

Authors: Aiza Dalman

Abstract:

Words are powerful when these are used wisely and strategically. In this study, twelve (12) campaign speeches of President Donald J. Trump were analyzed as to frequently used words and ethos, pathos and logos being employed. The speeches were read thoroughly, analyzed and interpreted. With the use of Word Counter Tool and Text Analyzer software accessible online, it was found out that the word ‘will’ has the highest frequency of 121, followed by Hillary (58), American (38), going (35), plan and Clinton (32), illegal (30), government (28), corruption (26) and criminal (24). When the speeches were analyzed as to ethos, pathos and logos, on the other hand, it revealed that these were all employed in his speeches. The statements under these pointed out against Hillary or in his favor. The unique strategy of President Donald J. Trump as to frequently used words and ethos, pathos and logos in persuading people perhaps lead the way to his victory.

Keywords: campaign speeches, corpus analysis, ethos, logos and pathos, power of words

Procedia PDF Downloads 251
8791 A Corpus Study of English Verbs in Chinese EFL Learners’ Academic Writing Abstracts

Authors: Shuaili Ji

Abstract:

The correct use of verbs is an important element of high-quality research articles, and thus for Chinese EFL learners, it is significant to master characteristics of verbs and to precisely use verbs. However, some researches have shown that there are differences in using verbs between learners and native speakers and learners have difficulty in using English verbs. This corpus-based quantitative research can enhance learners’ knowledge of English verbs and promote the quality of research article abstracts even of the whole academic writing. The aim of this study is to find the differences between learners’ and native speakers’ use of verbs and to study the factors that contribute to those differences. To this end, the research question is as follows: What are the differences between most frequently used verbs by learners and those by native speakers? The research question is answered through a study that uses corpus-based data-driven approach to analyze the verbs used by learners in their abstract writings in terms of collocation, colligation and semantic prosody. The results show that: (1) EFL learners obviously overused ‘be, can, find, make’ and underused ‘investigate, examine, may’. As to modal verbs, learners obviously overused ‘can’ while underused ‘may’. (2) Learners obviously overused ‘we find + object clauses’ while underused ‘nouns (results, findings, data) + suggest/indicate/reveal + object clauses’ when expressing research results. (3) Learners tended to transfer the collocation, colligation and semantic prosody of shǐ and zuò to make. (4) Learners obviously overused ‘BE+V-ed’ and used BE as the main verb. They also obviously overused the basic forms of BE such as be, is, are, while obviously underused its inflections (was, were). These results manifested learners’ lack of accuracy and idiomatic property in verb usage. Due to the influence of the concept transfer of Chinese, the verbs in learners’ abstracts showed obvious transfer of mother language. In addition, learners have not fully mastered the use of verbs, avoiding using complex colligations to prevent errors. Based on these findings, the present study has implications for English teaching, seeking to have implications for English academic abstract writing in China. Further research could be undertaken to study the use of verbs in the whole dissertation to find out whether the characteristic of the verbs in abstracts can apply in the whole dissertation or not.

Keywords: academic writing abstracts, Chinese EFL learners, corpus-based, data-driven, verbs

Procedia PDF Downloads 307
8790 MicroRNA in Bovine Corpus Luteum during Early Pregnancy

Authors: Rreze Gecaj, Corina Schanzenbach, Benedikt Kirchner, Michael Pfaffl, Bajram Berisha

Abstract:

The maintenance of corpus lutem (CL) during early pregnancy in cattle is a critical and multifarious process. A luteotrophic mechanism originating from the embryo is widely accepted as the triggering signal for the CL maintenance. In the cattle, it is the interferon-tau (IFNT) secretion form conceptus that prevents CL regression and ensures progesterone production for the establishment of pregnancy. In addition to endocrine and paracrine signals, microRNA (miRNA) can also support CL sustainability during early pregnancy. MiRNA are small non-coding nucleic acids that regulate gene expression post-transcriptionally and are shown to be involved in the modulation of CL function. However, the examination of miRNAs in corpus luteum function at the early pregnancy still remains largely uncovered. This study aims at profiling the expression of miRNA in CL during the early pregnancy in cattle by comparing it with the CL form late cycle and with the regressed CL. Corpora lutea were assigned in two different groups during the cycle (C13 group, late CL: days 13-18 and C18, regressed CL group: day >18) and during the early pregnancy (group P: 1-2 month). The estrous cycle was determined by macroscopic examination and to age the fetus crown-rump length measurement was applied. A total of 9 corpora lutea from individual animals were included in the study, three corpora lutea for each group. MiRNAs population was profiled using small RNA next-generation sequencing and biologically significant miRNAs were evaluated for their differential expression using the DESeq2-methodology. We show that 6 differentially expressed miRNAs (bta-mir-2890, -2332, -2441-3p, -148b, -1248 and -29c) are common to both comparisons, P vs C13 and P vs C18. While for each stage individually we have identified unique miRNAs differentially expressed only for the given comparison. bta-miR-23a and -769 were unique miRNAs differentially expressed in P vs C13, whereas forty-four unique miRNAs were identified as differentially expressed in P vs C18. These data confirm that miRNAs are highly abundant in luteal tissue during early pregnancy and potentially regulate the CL maintenance at this stage of fetus development.

Keywords: bovine, corpus luteum, microRNA, pregnancy, RNA-Seq

Procedia PDF Downloads 231
8789 The Analysis of Deceptive and Truthful Speech: A Computational Linguistic Based Method

Authors: Seham El Kareh, Miramar Etman

Abstract:

Recently, detecting liars and extracting features which distinguish them from truth-tellers have been the focus of a wide range of disciplines. To the author’s best knowledge, most of the work has been done on facial expressions and body gestures but only few works have been done on the language used by both liars and truth-tellers. This paper sheds light on four axes. The first axis copes with building an audio corpus for deceptive and truthful speech for Egyptian Arabic speakers. The second axis focuses on examining the human perception of lies and proving our need for computational linguistic-based methods to extract features which characterize truthful and deceptive speech. The third axis is concerned with building a linguistic analysis program that could extract from the corpus the inter- and intra-linguistic cues for deceptive and truthful speech. The program built here is based on selected categories from the Linguistic Inquiry and Word Count program. Our results demonstrated that Egyptian Arabic speakers on one hand preferred to use first-person pronouns and present tense compared to the past tense when lying and their lies lacked of second-person pronouns, and on the other hand, when telling the truth, they preferred to use the verbs related to motion and the nouns related to time. The results also showed that there is a need for bigger data to prove the significance of words related to emotions and numbers.

Keywords: Egyptian Arabic corpus, computational analysis, deceptive features, forensic linguistics, human perception, truthful features

Procedia PDF Downloads 182
8788 Corpus-Based Neural Machine Translation: Empirical Study Multilingual Corpus for Machine Translation of Opaque Idioms - Cloud AutoML Platform

Authors: Khadija Refouh

Abstract:

Culture bound-expressions have been a bottleneck for Natural Language Processing (NLP) and comprehension, especially in the case of machine translation (MT). In the last decade, the field of machine translation has greatly advanced. Neural machine translation NMT has recently achieved considerable development in the quality of translation that outperformed previous traditional translation systems in many language pairs. Neural machine translation NMT is an Artificial Intelligence AI and deep neural networks applied to language processing. Despite this development, there remain some serious challenges that face neural machine translation NMT when translating culture bounded-expressions, especially for low resources language pairs such as Arabic-English and Arabic-French, which is not the case with well-established language pairs such as English-French. Machine translation of opaque idioms from English into French are likely to be more accurate than translating them from English into Arabic. For example, Google Translate Application translated the sentence “What a bad weather! It runs cats and dogs.” to “يا له من طقس سيء! تمطر القطط والكلاب” into the target language Arabic which is an inaccurate literal translation. The translation of the same sentence into the target language French was “Quel mauvais temps! Il pleut des cordes.” where Google Translate Application used the accurate French corresponding idioms. This paper aims to perform NMT experiments towards better translation of opaque idioms using high quality clean multilingual corpus. This Corpus will be collected analytically from human generated idiom translation. AutoML translation, a Google Neural Machine Translation Platform, is used as a custom translation model to improve the translation of opaque idioms. The automatic evaluation of the custom model will be compared to the Google NMT using Bilingual Evaluation Understudy Score BLEU. BLEU is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. Human evaluation is integrated to test the reliability of the Blue Score. The researcher will examine syntactical, lexical, and semantic features using Halliday's functional theory.

Keywords: multilingual corpora, natural language processing (NLP), neural machine translation (NMT), opaque idioms

Procedia PDF Downloads 110
8787 A Corpus-Linguistic Analysis of Online Iranian News Coverage on Syrian Revolution

Authors: Amaal Ali Al-Gamde

Abstract:

The Syrian revolution is a major issue in the Middle East, which draws in world powers and receives a great focus in international mass media since 2011. The heavy global reliance on cyber news and digital sources plays a key role in conveying a sense of bias to a wide range of online readers. Thus, based on the assumption that media discourse possesses ideological implications, this study investigates the representation of Syrian revolution in online media. The paper explores the discursive constructions of anti and pro-government powers in Syrian revolution in 1000,000-word corpus of Fars online reports (an Iranian news agency), issued between 2013 and 2015. Taking a corpus assisted discourse analysis approach, the analysis investigates three types of lexicosemantic relations, the semantic macrostructures within which the two social actors are framed, the lexical collocations characterizing the news discourse and the discourse prosodies they tell about the two sides of the conflict. The study utilizes computer-based approaches, sketch engine and AntConc software to minimize the bias of the subjective analysis. The analysis moves from the insights of lexical frequencies and keyness scores to examine themes and the collocational patterns. The findings reveal the Fars agency’s ideological mode of representations in reporting events of Syrian revolution in two ways. The first is by stereotyping the opposition groups under the umbrella of terrorism, using words such as (law breakers, foreign-backed groups, militant groups, terrorists) to legitimize the atrocities of security forces against protesters and enhance horror among civilians. The second is through emphasizing the power of the government and depicting it as the defender of the Arab land by foregrounding the discourse of international conspiracy against Syria. The paper concludes discussing the potential importance of triangulating corpus linguistic tools with critical discourse analysis to elucidate more about discourses and reality.

Keywords: discourse prosody, ideology, keyness, semantic macrostructure

Procedia PDF Downloads 110
8786 Query in Grammatical Forms and Corpus Error Analysis

Authors: Katerina Florou

Abstract:

Two decades after coined the term "learner corpora" as collections of texts created by foreign or second language learners across various language contexts, and some years following suggestion to incorporate "focusing on form" within a Task-Based Learning framework, this study aims to explore how learner corpora, whether annotated with errors or not, can facilitate a focus on form in an educational setting. Argues that analyzing linguistic form serves the purpose of enabling students to delve into language and gain an understanding of different facets of the foreign language. This same objective is applicable when analyzing learner corpora marked with errors or in their raw state, but in this scenario, the emphasis lies on identifying incorrect forms. Teachers should aim to address errors or gaps in the students' second language knowledge while they engage in a task. Building on this recommendation, we compared the written output of two student groups: the first group (G1) employed the focusing on form phase by studying a specific aspect of the Italian language, namely the past participle, through examples from native speakers and grammar rules; the second group (G2) focused on form by scrutinizing their own errors and comparing them with analogous examples from a native speaker corpus. In order to test our hypothesis, we created four learner corpora. The initial two were generated during the task phase, with one representing each group of students, while the remaining two were produced as a follow-up activity at the end of the lesson. The results of the first comparison indicated that students' exposure to their own errors can enhance their grasp of a grammatical element. The study is in its second stage and more results are to be announced.

Keywords: Corpus interlanguage analysis, task based learning, Italian language as F1, learner corpora

Procedia PDF Downloads 25
8785 Number Variation of the Personal Pronoun We in American Spoken English

Authors: Qiong Hu, Ming Yue

Abstract:

Language variation signals the newest usage of language community, which might become the developmental trend of that language. The personal pronoun we is prescribed as a plural pronoun in grammar, but its number value is more flexible in actual use. Based on the homemade Friends corpus, the present research explores the number value of the first person pronoun we in nowadays American spoken English. With consideration of the subjectivity of we, this paper used ‘we+ PCU (Perception-cognation-utterance) verbs’ collocations and ‘we+ plural categories’ as the parameters. Results from corpus data and manual annotation show that: 1) the overall frequency of we has been increasing; 2) we has been increasingly used with other plural categories, indicating a weakening of its plural reference; and 3) we has been increasingly used with PCU (perception-cognition-utterance) verbs of strong subjectivity, indicating a strengthening of its singular reference. All these seem to support our hypothesis that we is undergoing the process of further grammaticalization towards a singular reference, though future evidence is needed to attest the bold prediction.

Keywords: number, PCU verbs, personal pronoun we,

Procedia PDF Downloads 206
8784 The Construction of Malaysian Airline Tragedies in Malaysian and British Online News: A Multidisciplinary Study

Authors: Theng Theng Ong

Abstract:

This study adopts a multidisciplinary method by combining the corpus-based discourse analysis study and language attitude study to explore the construction of Malaysia airline tragedies: MH370, MH17 and QZ8501 in the selected Malaysian and United Kingdom (UK) online news. The study aims to determine the ways in which Malaysian Airline tragedies MH370, MH17 and QZ8501 are linguistically defined and constructed in terms of keyword and collocation. The study also seeks to identify the types of discourse that are presented in the new articles. The differences or similarities in terms of keywords, topics or issues covered by the selected Malaysian and UK news media will also be examined. Finally, the language attitude study will be carried out to examine the Malaysia and UK university students’ attitudes toward the keywords, topics or issues covered by the selected Malaysian and UK news media pertaining to Malaysian Airline tragedies MH370, MH17 and QZ8501. The analysis is divided into two parts with the first part focusing on corpus-based discourse analysis on the media text. The second part of the study is to investigate Malaysians and UK news readers’ attitudes towards the online news being reported by the Malaysian and UK news media pertaining to the Airline tragedies. The main findings of corpus-based discourse analysis are essential in designing the questions in the questionnaires and interview and therefore led to the identification of the attitudes among Malaysian and UK news. This study adopts a multidisciplinary method by combining the corpus-based discourse analysis study and language attitude study to explore the construction of Malaysia airline tragedies: MH370, MH17 and QZ8501 in the selected Malaysian and United Kingdom (UK) online news. The study aims to determine the ways in which Malaysian Airline tragedies MH370, MH17 and QZ8501 are linguistically defined and constructed in terms of keyword and collocation. The study also seeks to identify the types of discourse that are presented in the new articles. The differences or similarities in terms of keywords, topics or issues covered by the selected Malaysian and UK news media will also be examined. Finally, the language attitude study will be carried out to examine the Malaysia and UK university students’ attitudes toward the keywords, topics or issues covered by the selected Malaysian and UK news media pertaining to Malaysian Airline tragedies MH370, MH17 and QZ8501. The analysis is divided into two parts with the first part focusing on corpus-based discourse analysis on the media text. The second part of the study is to investigate Malaysians and UK news readers’ attitudes towards the online news being reported by the Malaysian and UK news media pertaining to the Airline tragedies. The main findings of corpus-based discourse analysis are essential in designing the questions in the questionnaires and interview and therefore led to the identification of the attitudes among Malaysian and UK news.

Keywords: corpus linguistics, critical discourse analysis, news media, tragedies study

Procedia PDF Downloads 315
8783 We Are the 99 percent – the Occupy-Movement in Social Media

Authors: Wolfram Karg

Abstract:

The Occupy-Movement came into in 2011 existence in the US as a reaction to one of the worst economic crisis since World War II. With cuts in benefits and social services, with people being evicted from their homes on the one hand and high bonuses granted to their managers of the very same companies, a strong feeling of injustice besieged people in the US and caused them to voice their anger peacefully in social media and on the streets. Due to the world-wide-web, users all around the world read about this movement and recognized the same injustice in their own countries, making Occupy a global movement. The vast array of topics covered by Occupy offers a unique chance to carry out a corpus-based discourse analysis based on the DIMEAN-Model. The focus on this paper is limited to two aspects of DIMEAN: intertextual references and the use of connectors in texts. Because the discourse is to a large extent carried out via posts in blogs, online-articles and comments, the paper also analyses, in how far modern (i.e. computer-based media) there is a correlation between the use of connectors in different communicative types used by the Occupy-Movement.

Keywords: discourse, new media, occupy, corpus analysis

Procedia PDF Downloads 470
8782 Lexical Collocations in Medical Articles of Non-Native vs Native English-Speaking Researchers

Authors: Waleed Mandour

Abstract:

This study presents multidimensional scrutiny of Benson et al.’s seven-category taxonomy of lexical collocations used by Egyptian medical authors and their peers of native-English speakers. It investigates 212 medical papers, all published during a span of 6 years (from 2013 to 2018). The comparison is held to the medical research articles submitted by native speakers of English (25,238 articles in total with over 103 million words) as derived from the Directory of Open Access Journals (a 2.7 billion-word corpus). The non-native speakers compiled corpus was properly annotated and marked-up manually by the researcher according to the standards of Weisser. In terms of statistical comparisons, though, deployed were the conventional frequency-based analysis besides the relevant criteria, such as association measures (AMs) in which LogDice is deployed as per the recommendation of Kilgariff et al. when comparing large corpora. Despite the terminological convergence in the subject corpora, comparison results confirm the previous literature of which the non-native speakers’ compositions reveal limited ranges of lexical collocations in terms of their distribution. However, there is a ubiquitous tendency of overusing the NS-high-frequency multi-words in all lexical categories investigated. Furthermore, Egyptian authors, conversely to their English-speaking peers, tend to embrace more collocations denoting quantitative rather than qualitative analyses in their produced papers. This empirical work, per se, contributes to the English for Academic Purposes (EAP) and English as a Lingua Franca in Academic settings (ELFA). In addition, there are pedagogical implications that would promote a better quality of medical research papers published in Egyptian universities.

Keywords: corpus linguistics, EAP, ELFA, lexical collocations, medical discourse

Procedia PDF Downloads 107
8781 Sentence Structure for Free Word Order Languages in Context with Anaphora Resolution: A Case Study of Hindi

Authors: Pardeep Singh, Kamlesh Dutta

Abstract:

Many languages have fixed sentence structure and others are free word order. The accuracy of anaphora resolution of syntax based algorithm depends on structure of the sentence. So, it is important to analyze the structure of any language before implementing these algorithms. In this study, we analyzed the sentence structure exploiting the case marker in Hindi as well as some special tag for subject and object. We also investigated the word order for Hindi. Word order typology refers to the study of the order of the syntactic constituents of a language. We analyzed 165 news items of Ranchi Express from EMILEE corpus of plain text. It consisted of 1745 sentences. Eight file of dialogue based from the same corpus has been analyzed which will have 1521 sentences. The percentages of subject object verb structure (SOV) and object subject verb (OSV) are 66.90 and 33.10, respectively.

Keywords: anaphora resolution, free word order languages, SOV, OSV

Procedia PDF Downloads 444
8780 A Corpus Output Error Analysis of Chinese L2 Learners From America, Myanmar, and Singapore

Authors: Qiao-Yu Warren Cai

Abstract:

Due to the rise of big data, building corpora and using them to analyze ChineseL2 learners’ language output has become a trend. Various empirical research has been conducted using Chinese corpora built by different academic institutes. However, most of the research analyzed the data in the Chinese corpora usingcorpus-based qualitative content analysis with descriptive statistics. Descriptive statistics can be used to make summations about the subjects or samples that research has actually measured to describe the numerical data, but the collected data cannot be generalized to the population. Comte, a Frenchpositivist, has argued since the 19th century that human beings’ knowledge, whether the discipline is humanistic and social science or natural science, should be verified in a scientific way to construct a universal theory to explain the truth and human beings behaviors. Inferential statistics, able to make judgments of the probability of a difference observed between groups being dependable or caused by chance (Free Geography Notes, 2015)and to infer from the subjects or examples what the population might think or behave, is just the right method to support Comte’s argument in the field of TCSOL. Also, inferential statistics is a core of quantitative research, but little research has been conducted by combing corpora with inferential statistics. Little research analyzes the differences in Chinese L2 learners’ language corpus output errors by using theOne-way ANOVA so that the findings of previous research are limited to inferring the population's Chinese errors according to the given samples’ Chinese corpora. To fill this knowledge gap in the professional development of Taiwanese TCSOL, the present study aims to utilize the One-way ANOVA to analyze corpus output errors of Chinese L2 learners from America, Myanmar, and Singapore. The results show that no significant difference exists in ‘shì (是) sentence’ and word order errors, but compared with Americans and Singaporeans, it is significantly easier for Myanmar to have ‘sentence blends.’ Based on the above results, the present study provides an instructional approach and contributes to further exploration of how Chinese L2 learners can have (and use) learning strategies to lower errors.

Keywords: Chinese corpus, error analysis, one-way analysis of variance, Chinese L2 learners, Americans, myanmar, Singaporeans

Procedia PDF Downloads 81
8779 Competition between Verb-Based Implicit Causality and Theme Structure's Influence on Anaphora Bias in Mandarin Chinese Sentences: Evidence from Corpus

Authors: Linnan Zhang

Abstract:

Linguists, as well as psychologists, have shown great interests in implicit causality in reference processing. However, most frequently-used approaches to this issue are psychological experiments (such as eye tracking or self-paced reading, etc.). This research is a corpus-based one and is assisted with statistical tool – software R. The main focus of the present study is about the competition between verb-based implicit causality and theme structure’s influence on anaphora bias in Mandarin Chinese sentences. In Accessibility Theory, it is believed that salience, which is also known as accessibility, and relevance are two important factors in reference processing. Theme structure, which is a special syntactic structure in Chinese, determines the salience of an antecedent on the syntactic level while verb-based implicit causality is a key factor to the relevance between antecedent and anaphora. Therefore, it is a study about anaphora, combining psychology with linguistics. With analysis of the sentences from corpus as well as the statistical analysis of Multinomial Logistic Regression, major findings of the present study are as follows: 1. When the sentence is stated in a ‘cause-effect’ structure, the theme structure will always be the antecedent no matter forward biased verbs or backward biased verbs co-occur; in non-theme structure, the anaphora bias will tend to be the opposite of the verb bias; 2. When the sentence is stated in a ‘effect-cause’ structure, theme structure will not always be the antecedent and the influence of verb-based implicit causality will outweigh that of theme structure; moreover, the anaphora bias will be the same with the bias of verbs. All the results indicate that implicit causality functions conditionally and the noun in theme structure will not be the high-salience antecedent under any circumstances.

Keywords: accessibility theory, anaphora, theme strcture, verb-based implicit causality

Procedia PDF Downloads 173