Search results for: Corpus interlanguage analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 26960

Search results for: Corpus interlanguage analysis

26870 Corpus-Based Description of Core English Nouns of Pakistani English, an EFL Learner Perspective at Secondary Level

Authors: Abrar Hussain Qureshi

Abstract:

Vocabulary has been highlighted as a key indicator in any foreign language learning program, especially English as a foreign language (EFL). It is often considered a potential tool in foreign language curriculum, and its deficiency impedes successful communication in the target language. The knowledge of the lexicon is very significant in getting communicative competence and performance. Nouns constitute a considerable bulk of English vocabulary. Rather, they are the bones of the English language and are the main semantic carrier in spoken and written discourse. As nouns dominate the bulk of the English lexicon, their role becomes all the more potential. The undertaken research is a systematic effort in this regard to work out a list of highly frequent list of Pakistani English nouns for the EFL learners at the secondary level. It will encourage autonomy for the EFL learners as well as will save their time. The corpus used for the research has been developed locally from leading English newspapers of Pakistan. Wordsmith Tools has been used to process the research data and to retrieve word list of frequent Pakistani English nouns. The retrieved list of core Pakistani English nouns is supposed to be useful for English language learners at the secondary level as it covers a wide range of speech events.

Keywords: corpus, EFL, frequency list, nouns

Procedia PDF Downloads 75
26869 Adjectives in Academic Discourse: A Comparative Study of Research Articles

Authors: Beata Grymska

Abstract:

The research studies on academic discourse focus in general on lexical bundles, epistemic modality markers, or interactions between writers and readers. Following the research into the written forms of the academic community, this study concentrates on adjectives in research articles. The study investigates the distribution of adjectives in research articles in two academic disciplines: linguistics and medicine. It is corpus-based in design and consists of 100 linguistic and 100 medical research articles all written in English. The aim of the study is to compare the distribution of adjectives between the two corpora and four main parts of articles: IMRD (Introduction, Methods, Results, and Discussion). The second aim is to see if the two corpora share common core adjectives, e.g., different, important, specific, and if there are discipline-specific adjectives. The further part of the paper elaborates on adjectives use in the corpora together with examples. The results indicate that the two corpora do not differ in the distribution of adjectives to a great extent. The occurrences of the most frequently used adjectives depend on the academic discipline of the research articles. The concluding part reflects upon the role of adjectives in academic discourse and also presents how corpora can be helpful in composing academic texts.

Keywords: academic discourse, academic texts, adjectives, corpus analysis, research articles

Procedia PDF Downloads 160
26868 Unraveling Language Contact through Syntactic Dynamics of ‘Also’ in Hong Kong and Britain English

Authors: Xu Zhang

Abstract:

This article unveils an indicator of language contact between English and Cantonese in one of the Outer Circle Englishes, Hong Kong (HK) English, through an empirical investigation into 1000 tokens from the Global Web-based English (GloWbE) corpus, employing frequency analysis and logistic regression analysis. It is perceived that Cantonese and general Chinese are contextually marked by an integral underlying thinking pattern. Chinese speakers exhibit a reliance on semantic context over syntactic rules and lexical forms. This linguistic trait carries over to their use of English, affording greater flexibility to formal elements in constructing English sentences. The study focuses on the syntactic positioning of the focusing subjunct ‘also’, a linguistic element used to add new or contrasting prominence to specific sentence constituents. The English language generally allows flexibility in the relative position of 'also’, while there is a preference for close marking relationships. This article shifts attention to Hong Kong, where Cantonese and English converge, and 'also' finds counterparts in Cantonese ‘jaa’ and Mandarin ‘ye’. Employing a corpus-based data-driven method, we investigate the syntactic position of 'also' in both HK and GB English. The study aims to ascertain whether HK English exhibits a greater 'syntactic freedom,' allowing for a more distant marking relationship with 'also' compared to GB English. The analysis involves a random extraction of 500 samples from both HK and GB English from the GloWbE corpus, forming a dataset (N=1000). Exclusions are made for cases where 'also' functions as an additive conjunct or serves as a copulative adverb, as well as sentences lacking sufficient indication that 'also' functions as a focusing particle. The final dataset comprises 820 tokens, with 416 for GB and 404 for HK, annotated according to the focused constituent and the relative position of ‘also’. Frequency analysis reveals significant differences in the relative position of 'also' and marking relationships between HK and GB English. Regression analysis indicates a preference in HK English for a distant marking relationship between 'also' and its focused constituent. Notably, the subject and other constituents emerge as significant predictors of a distant position for 'also.' Together, these findings underscore the nuanced linguistic dynamics in HK English and contribute to our understanding of language contact. It suggests that future pedagogical practice should consider incorporating the syntactic variation within English varieties, facilitating leaners’ effective communication in diverse English-speaking environments and enhancing their intercultural communication competence.

Keywords: also, Cantonese, English, focus marker, frequency analysis, language contact, logistic regression analysis

Procedia PDF Downloads 23
26867 Lexical Collocations in Medical Articles of Non-Native vs Native English-Speaking Researchers

Authors: Waleed Mandour

Abstract:

This study presents multidimensional scrutiny of Benson et al.’s seven-category taxonomy of lexical collocations used by Egyptian medical authors and their peers of native-English speakers. It investigates 212 medical papers, all published during a span of 6 years (from 2013 to 2018). The comparison is held to the medical research articles submitted by native speakers of English (25,238 articles in total with over 103 million words) as derived from the Directory of Open Access Journals (a 2.7 billion-word corpus). The non-native speakers compiled corpus was properly annotated and marked-up manually by the researcher according to the standards of Weisser. In terms of statistical comparisons, though, deployed were the conventional frequency-based analysis besides the relevant criteria, such as association measures (AMs) in which LogDice is deployed as per the recommendation of Kilgariff et al. when comparing large corpora. Despite the terminological convergence in the subject corpora, comparison results confirm the previous literature of which the non-native speakers’ compositions reveal limited ranges of lexical collocations in terms of their distribution. However, there is a ubiquitous tendency of overusing the NS-high-frequency multi-words in all lexical categories investigated. Furthermore, Egyptian authors, conversely to their English-speaking peers, tend to embrace more collocations denoting quantitative rather than qualitative analyses in their produced papers. This empirical work, per se, contributes to the English for Academic Purposes (EAP) and English as a Lingua Franca in Academic settings (ELFA). In addition, there are pedagogical implications that would promote a better quality of medical research papers published in Egyptian universities.

Keywords: corpus linguistics, EAP, ELFA, lexical collocations, medical discourse

Procedia PDF Downloads 105
26866 Analysis of Steles with Libyan Inscriptions of Grande Kabylia, Algeria

Authors: Samia Ait Ali Yahia

Abstract:

Several steles with Libyan inscriptions were discovered in Grande Kabylia (Algeria), but very few researchers were interested in these inscriptions. Our work is to list, if possible all these steles in order to do a descriptive study of the corpus. The steles analysis will be focused on the iconographic and epigraphic level and on the different forms of Libyan characters in order to highlight the alphabet used by the Grande Kabylia.

Keywords: epigraphy, stele, Libyan inscription, Grande Kabylia

Procedia PDF Downloads 183
26865 The Study of Formal and Semantic Errors of Lexis by Persian EFL Learners

Authors: Mohammad J. Rezai, Fereshteh Davarpanah

Abstract:

Producing a text in a language which is not one’s mother tongue can be a demanding task for language learners. Examining lexical errors committed by EFL learners is a challenging area of investigation which can shed light on the process of second language acquisition. Despite the considerable number of investigations into grammatical errors, few studies have tackled formal and semantic errors of lexis committed by EFL learners. The current study aimed at examining Persian learners’ formal and semantic errors of lexis in English. To this end, 60 students at three different proficiency levels were asked to write on 10 different topics in 10 separate sessions. Finally, 600 essays written by Persian EFL learners were collected, acting as the corpus of the study. An error taxonomy comprising formal and semantic errors was selected to analyze the corpus. The formal category covered misselection and misformation errors, while the semantic errors were classified into lexical, collocational and lexicogrammatical categories. Each category was further classified into subcategories depending on the identified errors. The results showed that there were 2583 errors in the corpus of 9600 words, among which, 2030 formal errors and 553 semantic errors were identified. The most frequent errors in the corpus included formal error commitment (78.6%), which were more prevalent at the advanced level (42.4%). The semantic errors (21.4%) were more frequent at the low intermediate level (40.5%). Among formal errors of lexis, the highest number of errors was devoted to misformation errors (98%), while misselection errors constituted 2% of the errors. Additionally, no significant differences were observed among the three semantic error subcategories, namely collocational, lexical choice and lexicogrammatical. The results of the study can shed light on the challenges faced by EFL learners in the second language acquisition process.

Keywords: collocational errors, lexical errors, Persian EFL learners, semantic errors

Procedia PDF Downloads 113
26864 The Advancements of Transformer Models in Part-of-Speech Tagging System for Low-Resource Tigrinya Language

Authors: Shamm Kidane, Ibrahim Abdella, Fitsum Gaim, Simon Mulugeta, Sirak Asmerom, Natnael Ambasager, Yoel Ghebrihiwot

Abstract:

The call for natural language processing (NLP) systems for low-resource languages has become more apparent than ever in the past few years, with the arduous challenges still present in preparing such systems. This paper presents an improved dataset version of the Nagaoka Tigrinya Corpus for Parts-of-Speech (POS) classification system in the Tigrinya language. The size of the initial Nagaoka dataset was incremented, totaling the new tagged corpus to 118K tokens, which comprised the 12 basic POS annotations used previously. The additional content was also annotated manually in a stringent manner, followed similar rules to the former dataset and was formatted in CONLL format. The system made use of the novel approach in NLP tasks and use of the monolingually pre-trained TiELECTRA, TiBERT and TiRoBERTa transformer models. The highest achieved score is an impressive weighted F1-score of 94.2%, which surpassed the previous systems by a significant measure. The system will prove useful in the progress of NLP-related tasks for Tigrinya and similarly related low-resource languages with room for cross-referencing higher-resource languages.

Keywords: Tigrinya POS corpus, TiBERT, TiRoBERTa, conditional random fields

Procedia PDF Downloads 61
26863 An Analysis of Interactional Metadiscourse Devices in Communication Arts Research Articles

Authors: Woravit Kitjaroenpaiboon, Kanyarat Getkham

Abstract:

This corpus analysis is a quantitative study which intended to investigate the uses of four main interactional metadiscourse devices including fourteen sub-devices in the introduction and the discussion sections of the twenty communication arts research articles taken from Online Journal of Communication and Media technologies by applying ‘AntConc’ software and PASW 18.0. The findings reveal that the three most frequently used devices in the introduction parts are attitudinal marker (adjective), booster (verb), and hedge (modal verb) while the three most frequently found devices in the discussion sections are attitudinal marker (adjective), hedge (modal verb) and booster (verb). There are nine sub-interactional metadiscourse devices among each of which significant difference exist in both introduction and discussion sections. They are attitudinal marker (adverb), attitudinal marker (adjective), booster (verb), booster (adverb), booster (adjective), hedge (modal verb), hedge (lexical verb), hedge (adverb), and hedge (adjective), while another five sub-interactional metadiscourse devices; self-mention, attitudinal marker (verb), attitudinal marker (noun), hedge (noun), and Hedge (phraseology) are found to have has no significant difference between the uses of each device in the introduction and discussion sections. The results also revealed that low and positive relationships exist among thirteen devices. One device which has no relationship with others is attitudinal marker (verb).

Keywords: corpus analysis, interactional metadiscourse devices, communication arts research articles, media technologies

Procedia PDF Downloads 347
26862 Crossing the Interdisciplinary Border: A Multidimensional Linguistics Analysis of a Legislative Discourse

Authors: Manvender Kaur Sarjit Singh

Abstract:

There is a crucial mismatch between classroom written language tasks and real world written language requirements. Realizing the importance of reducing the gap between the professional needs of the legal practitioners and the higher learning institutions that offer the legislative education in Malaysia, it is deemed necessary to develop a framework that integrates real-life written communication with the teaching of content-based legislative discourse to future legal practitioners. By highlighting the actual needs of the legal practitioners in the country, the present teaching practices will be enhanced and aligned with the actual needs of the learners thus realizing the vision and aspirations of the Malaysian Education Blueprint 2013-2025 and Legal Profession Qualifying Board. The need to focus future education according to the actual needs of the learners can be realized by developing a teaching framework which is designed within the prospective requirements of its real-life context. This paper presents the steps taken to develop a specific teaching framework that fulfills the fundamental real-life context of the prospective legal practitioners. The teaching framework was developed based on real-life written communication from the legal profession in Malaysia, using the specific genre analysis approach which integrates a corpus-based approach and a structural linguistics analysis. This approach was adopted due to its fundamental nature of intensive exploration of the real-life written communication according to the established strategies used. The findings showed the use of specific moves and parts-of-speech by the legal practitioners, in order to prepare the selected genre. The teaching framework is hoped to enhance the teachings of content-based law courses offered at present in the higher learning institutions in Malaysia.

Keywords: linguistics analysis, corpus analysis, genre analysis, legislative discourse

Procedia PDF Downloads 363
26861 The Noun-Phrase Elements on the Usage of the Zero Article

Authors: Wen Zhen

Abstract:

Compared to content words, function words have been relatively overlooked by English learners especially articles. The article system, to a certain extent, becomes a resistance to know English better, driven by different elements. Three principal factors can be summarized in term of the nature of the articles when referring to the difficulty of the English article system. However, making the article system more complex are difficulties in the second acquisition process, for [-ART] learners have to create another category, causing even most non-native speakers at proficiency level to make errors. According to the sequences of acquisition of the English article, it is showed that the zero article is first acquired and in high inaccuracy. The zero article is often overused in the early stages of L2 acquisition. Although learners at the intermediate level move to underuse the zero article for they realize that the zero article does not cover any case, overproduction of the zero article even occurs among advanced L2 learners. The aim of the study is to investigate noun-phrase factors which give rise to incorrect usage or overuse of the zero article, thus providing suggestions for L2 English acquisition. Moreover, it enables teachers to carry out effective instruction that activate conscious learning of students. The research question will be answered through a corpus-based, data- driven approach to analyze the noun-phrase elements from the semantic context and countability of noun-phrases. Based on the analysis of the International Thurber Thesis corpus, the results show that: (1) Although context of [-definite,-specific] favored the zero article, both[-definite,+specific] and [+definite,-specific] showed less influence. When we reflect on the frequency order of the zero article , prototypicality plays a vital role in it .(2)EFL learners in this study have trouble classifying abstract nouns as countable. We can find that it will bring about overuse of the zero article when learners can not make clear judgements on countability altered from (+definite ) to (-definite).Once a noun is perceived as uncountable by learners, the choice would fall back on the zero article. These findings suggest that learners should be engaged in recognition of the countability of new vocabulary by explaining nouns in lexical phrases and explore more complex aspects such as analysis dependent on discourse.

Keywords: noun phrase, zero article, corpus, second language acquisition

Procedia PDF Downloads 227
26860 Prostitution in Colonial Bengal: Autobiographical Articulations and Fictional Representations

Authors: Aparna Bandyopadhyay

Abstract:

The proposed paper will examine how prostitution produced a vast corpus of literature in colonial Bengal. This corpus included autobiographical accounts by prostitutes themselves. While the authenticity of some of these has, at times, been doubted by contemporary observers, the sheer magnitude of such narrative prose demands critical attention. Many of these autobiographical narratives focused on the prostitute’s early life within respectable society and then proceeded to delineate the transgressions and the inescapable chain of circumstances that eventually rendered her a prostitute. Significantly, these serve to corroborate the findings of official investigations regarding the circumstances that led upper-caste Hindu women in Bengal to embrace prostitution in this period. The literary corpus that dwelt on prostitution also included a vast volume of fiction penned by celebrated writers. These foregrounded a prostitute as the central protagonist, telling the life-stories of prostitutes and the circumstances that made them what they were. Novels and short stories often represented the prostitute as an affective being – an individual capable of deep emotions despite her profession. She was seldom a person who had voluntarily embraced prostitution. She was always a figure of helplessness and suffering, a woman whose desire to love and be loved transcended the carnality of her livelihood. She was an outcast, but she experienced the entire repertoire of emotions experienced by her respectable counterparts. The proposed paper will examine the trends and characteristics of the available repertoire of prostitute-oriented literature in late colonial Bengal. It will begin by focusing on the existing perspectives on the origins of prostitution in late colonial Bengal. It will proceed to discuss the literary corpus supposedly penned by prostitutes themselves and then focus on the manner in which some of the stalwarts of high literature represented the prostitute in their literary creations.

Keywords: emotions, literature, prostitution, transgression

Procedia PDF Downloads 87
26859 A Corpus-Based Analysis on Code-Mixing Features in Mandarin-English Bilingual Children in Singapore

Authors: Xunan Huang, Caicai Zhang

Abstract:

This paper investigated the code-mixing features in Mandarin-English bilingual children in Singapore. First, it examined whether the code-mixing rate was different in Mandarin Chinese and English contexts. Second, it explored the syntactic categories of code-mixing in Singapore bilingual children. Moreover, this study investigated whether morphological information was preserved when inserting syntactic components into the matrix language. Data are derived from the Singapore Bilingual Corpus, in which the recordings and transcriptions of sixty English-Mandarin 5-to-6-year-old children were preserved for analysis. Results indicated that the rate of code-mixing was asymmetrical in the two language contexts, with the rate being significantly higher in the Mandarin context than that in the English context. The asymmetry is related to language dominance in that children are more likely to code-mix when using their nondominant language. Concerning the syntactic categories of code-mixing words in the Singaporean bilingual children, we found that noun-mixing, verb-mixing, and adjective-mixing are the three most frequently used categories in code-mixing in the Mandarin context. This pattern mirrors the syntactic categories of code-mixing in the Cantonese context in Cantonese-English bilingual children, and the general trend observed in lexical borrowing. Third, our results also indicated that English vocabularies that carry morphological information are embedded in bare forms in the Mandarin context. These findings shed light upon how bilingual children take advantage of the two languages in mixed utterances in a bilingual environment.

Keywords: bilingual children, code-mixing, English, Mandarin Chinese

Procedia PDF Downloads 187
26858 Grammatical and Lexical Explorations on ‘Outer Circle’ Englishes and ‘Expanding Circle’ Englishes: A Corpus-Based Comparative Analysis

Authors: Orlyn Joyce D. Esquivel

Abstract:

This study analyzed 50 selected research papers from professional language and linguistic academic journals to portray the differences between Kachru’s (1994) outer circle and expanding circle Englishes. The selected outer circle Englishes include those of Bangladesh, Malaysia, the Philippines, India, and Singapore; and the selected expanding circle Englishes are those of China, Indonesia, Japan, Korea, and Thailand. The researcher built ten corpora (five research papers for each corpus) to represent each variety of Englishes. The corpora were examined under grammatical and lexical features using Modified English TreeTagger in Sketch Engine. Results revealed the distinct grammatical and lexical features through the table and textual analyses, illustrated from the most to least dominant linguistic elements. In addition, comparative analyses were done to distinguish the features of each of the selected Englishes. The Language Change Theory was used as a basis in the discussion. Hence, the findings suggest that the ‘outer circle’ Englishes and ‘expanding circle’ Englishes will continue to drift from International English.

Keywords: applied linguistics, English as a global language, expanding circle Englishes, global Englishes, outer circle Englishes

Procedia PDF Downloads 126
26857 The Women-In-Mining Discourse: A Study Combining Corpus Linguistics and Discourse Analysis

Authors: Ylva Fältholm, Cathrine Norberg

Abstract:

One of the major threats identified to successful future mining is that women do not find the industry attractive. Many attempts have been made, for example in Sweden and Australia, to create organizational structures and mining communities attractive to both genders. Despite such initiatives, many mining areas are developing into gender-segregated fly-in/fly out communities dominated by men with both social and economic consequences. One of the challenges facing many mining companies is thus to break traditional gender patterns and structures. To do this increased knowledge about gender in the context of mining is needed. Since language both constitutes and reproduces knowledge, increased knowledge can be gained through an exploration and description of the mining discourse from a gender perspective. The aim of this study is to explore what conceptual ideas are activated in connection to the physical/geographical mining area and to work within the mining industry. We use a combination of critical discourse analysis implying close reading of selected texts, such as policy documents, interview materials, applications and research and innovation agendas, and analyses of linguistic patterns found in large language corpora covering millions of words of contemporary language production. The quantitative corpus data serves as a point of departure for the qualitative analysis of the texts, that is, suggests what patterns to explore further. The study shows that despite technological and organizational development, one of the most persistent discourses about mining is the conception of dangerous and unfriendly areas infused with traditional notions of masculinity ideals and manual hard work. Although some of the texts analyzed highlight gender issues, and describe gender-equalizing initiatives, such as wage-mapping systems, female networks and recruitment efforts for women executives, and thereby render the discourse less straightforward, it is shown that these texts are not unambiguous examples of a counter-discourse. They rather illustrate that discourses are not stable but include opposing discourses, in dialogue with each other. For example, many texts highlight why and how women are important to mining, at the same time as they suggest that gender and diversity are all about women: why mining is a problem for them, how they should be, and what they should do to fit in. Drawing on a constitutive view of discourse, knowledge about such conflicting perceptions of women is a prerequisite for succeeding in attracting women to the mining industry and thereby contributing to the development of future mining.

Keywords: discourse, corpus linguistics, gender, mining

Procedia PDF Downloads 235
26856 Lexical Bundles in the Alexiad of Anna Comnena: Computational and Discourse Analysis Approach

Authors: Georgios Alexandropoulos

Abstract:

The purpose of this study is to examine the historical text of Alexiad by Anna Comnena using computational tools for the extraction of lexical bundles containing the name of her father, Alexius Comnenus. For this reason, in this research we apply corpus linguistics techniques for the automatic extraction of lexical bundles and through them we will draw conclusions about how these lexical bundles serve her support provided to her father.

Keywords: lexical bundles, computational literature, critical discourse analysis, Alexiad

Procedia PDF Downloads 597
26855 Track and Evaluate Cortical Responses Evoked by Electrical Stimulation

Authors: Kyosuke Kamada, Christoph Kapeller, Michael Jordan, Mostafa Mohammadpour, Christy Li, Christoph Guger

Abstract:

Cortico-cortical evoked potentials (CCEP) refer to responses generated by cortical electrical stimulation at distant brain sites. These responses provide insights into the functional networks associated with language or motor functions, and in the context of epilepsy, they can reveal pathological networks. Locating the origin and spread of seizures within the cortex is crucial for pre-surgical planning. This process can be enhanced by employing cortical stimulation at the seizure onset zone (SOZ), leading to the generation of CCEPs in remote brain regions that may be targeted for disconnection. In the case of a 24-year-old male patient suffering from intractable epilepsy, corpus callosotomy was performed as part of the treatment. DTI-MRI imaging, conducted using a 3T MRI scanner for fiber tracking, along with CCEP, is used as part of an assessment for surgical planning. Stimulation of the SOZ, with alternating monophasic pulses of 300µs duration and 15mA current intensity, resulted in CCEPs on the contralateral frontal cortex, reaching a peak amplitude of 206µV with a latency of 31ms, specifically in the left pars triangularis. The related fiber tracts were identified with a two-tensor unscented Kalman filter (UKF) technique, showing transversal fibers through the corpus callosum. The CCEPs were monitored through the progress of the surgery. Notably, the SOZ-associated CCEPs exhibited a reduction following the resection of the anterior portion of the corpus callosum, reaching the identified connecting fibers. This intervention demonstrated a potential strategy for mitigating the impact of intractable epilepsy through targeted disconnection of identified cortical regions.

Keywords: CCEP, SOZ, Corpus callosotomy, DTI

Procedia PDF Downloads 32
26854 The Usage of Negative Emotive Words in Twitter

Authors: Martina Katalin Szabó, István Üveges

Abstract:

In this paper, the usage of negative emotive words is examined on the basis of a large Hungarian twitter-database via NLP methods. The data is analysed from a gender point of view, as well as changes in language usage over time. The term negative emotive word refers to those words that, on their own, without context, have semantic content that can be associated with negative emotion, but in particular cases, they may function as intensifiers (e.g. rohadt jó ’damn good’) or a sentiment expression with positive polarity despite their negative prior polarity (e.g. brutális, ahogy ez a férfi rajzol ’it’s awesome (lit. brutal) how this guy draws’. Based on the findings of several authors, the same phenomenon can be found in other languages, so it is probably a language-independent feature. For the recent analysis, 67783 tweets were collected: 37818 tweets (19580 tweets written by females and 18238 tweets written by males) in 2016 and 48344 (18379 tweets written by females and 29965 tweets written by males) in 2021. The goal of the research was to make up two datasets comparable from the viewpoint of semantic changes, as well as from gender specificities. An exhaustive lexicon of Hungarian negative emotive intensifiers was also compiled (containing 214 words). After basic preprocessing steps, tweets were processed by ‘magyarlanc’, a toolkit is written in JAVA for the linguistic processing of Hungarian texts. Then, the frequency and collocation features of all these words in our corpus were automatically analyzed (via the analysis of parts-of-speech and sentiment values of the co-occurring words). Finally, the results of all four subcorpora were compared. Here some of the main outcomes of our analyses are provided: There are almost four times fewer cases in the male corpus compared to the female corpus when the negative emotive intensifier modified a negative polarity word in the tweet (e.g., damn bad). At the same time, male authors used these intensifiers more frequently, modifying a positive polarity or a neutral word (e.g., damn good and damn big). Results also pointed out that, in contrast to female authors, male authors used these words much more frequently as a positive polarity word as well (e.g., brutális, ahogy ez a férfi rajzol ’it’s awesome (lit. brutal) how this guy draws’). We also observed that male authors use significantly fewer types of emotive intensifiers than female authors, and the frequency proportion of the words is more balanced in the female corpus. As for changes in language usage over time, some notable differences in the frequency and collocation features of the words examined were identified: some of the words collocate with more positive words in the 2nd subcorpora than in the 1st, which points to the semantic change of these words over time.

Keywords: gender differences, negative emotive words, semantic changes over time, twitter

Procedia PDF Downloads 173
26853 Arabic Lexicon Learning to Analyze Sentiment in Microblogs

Authors: Mahmoud B. Rokaya

Abstract:

The study of opinion mining and sentiment analysis includes analysis of opinions, sentiments, evaluations, attitudes, and emotions. The rapid growth of social media, social networks, reviews, forum discussions, microblogs, and Twitter, leads to a parallel growth in the field of sentiment analysis. The field of sentiment analysis tries to develop effective tools to make it possible to capture the trends of people. There are two approaches in the field, lexicon-based and corpus-based methods. A lexicon-based method uses a sentiment lexicon which includes sentiment words and phrases with assigned numeric scores. These scores reveal if sentiment phrases are positive or negative, their intensity, and/or their emotional orientations. Creation of manual lexicons is hard. This brings the need for adaptive automated methods for generating a lexicon. The proposed method generates dynamic lexicons based on the corpus and then classifies text using these lexicons. In the proposed method, different approaches are combined to generate lexicons from text. The proposed method classifies the tweets into 5 classes instead of +ve or –ve classes. The sentiment classification problem is written as an optimization problem, finding optimum sentiment lexicons are the goal of the optimization process. The solution was produced based on mathematical programming approaches to find the best lexicon to classify texts. A genetic algorithm was written to find the optimal lexicon. Then, extraction of a meta-level feature was done based on the optimal lexicon. The experiments were conducted on several datasets. Results, in terms of accuracy, recall and F measure, outperformed the state-of-the-art methods proposed in the literature in some of the datasets. A better understanding of the Arabic language and culture of Arab Twitter users and sentiment orientation of words in different contexts can be achieved based on the sentiment lexicons proposed by the algorithm.

Keywords: social media, Twitter sentiment, sentiment analysis, lexicon, genetic algorithm, evolutionary computation

Procedia PDF Downloads 153
26852 The Development of Chinese-English Homophonic Word Pairs Databases for English Teaching and Learning

Authors: Yuh-Jen Wu, Chun-Min Lin

Abstract:

Homophonic words are common in Mandarin Chinese which belongs to the tonal language family. Using homophonic cues to study foreign languages is one of the learning techniques of mnemonics that can aid the retention and retrieval of information in the human memory. When learning difficult foreign words, some learners transpose them with words in a language they are familiar with to build an association and strengthen working memory. These phonological clues are beneficial means for novice language learners. In the classroom, if mnemonic skills are used at the appropriate time in the instructional sequence, it may achieve their maximum effectiveness. For Chinese-speaking students, proper use of Chinese-English homophonic word pairs may help them learn difficult vocabulary. In this study, a database program is developed by employing Visual Basic. The database contains two corpora, one with Chinese lexical items and the other with English ones. The Chinese corpus contains 59,053 Chinese words that were collected by a web crawler. The pronunciations of this group of words are compared with words in an English corpus based on WordNet, a lexical database for the English language. Words in both databases with similar pronunciation chunks and batches are detected. A total of approximately 1,000 Chinese lexical items are located in the preliminary comparison. These homophonic word pairs can serve as a valuable tool to assist Chinese-speaking students in learning and memorizing new English vocabulary.

Keywords: Chinese, corpus, English, homophonic words, vocabulary

Procedia PDF Downloads 149
26851 A Critical Discourse Analysis of the Construction of Artists' Reputation by Online Art Magazines

Authors: Thomas Soro, Tim Stott, Brendan O'Rourke

Abstract:

The construction of artistic reputation has been examined within sociology, philosophy, and economics but, baring a few noteworthy exceptions its discursive aspect has been largely ignored. This is particularly surprising given that contemporary artworks primarily rely on discourse to construct their ontological status. This paper contributes a discourse analytical perspective to the broad body of literature on artistic reputation by providing an understanding of how it is discursively constructed within the institutional context of online contemporary art magazines. This paper uses corpora compiled from the websites of e-flux and ARTnews, two leading online contemporary art magazines, to examine how these organisations discursively construct the reputation of artists. By constructing word-sketches of the term 'Artist', the paper identified the most significant modifiers attributed to artists and the most significant verbs which have 'artist' as an object or subject. The most significant results were analysed through concordances and demonstrated a somewhat surprising lack of evaluative representation. To examine this feature more closely, the paper then analysed three announcement texts from e-flux’s site and three review texts from ARTnews' site, comparing the use of modifiers and verbs in the representation of artists, artworks, and institutions. The results of this analysis support the corpus findings, suggesting that artists are rarely represented in evaluative terms. Based on the relatively high frequency of evaluation in the representation of artworks and institutions, these results suggest that there may be discursive norms at work in the field of online contemporary art magazines which regulate the use of verbs and modifiers in the evaluation of artists.

Keywords: contemporary art, corpus linguistics, critical discourse analysis, symbolic capital

Procedia PDF Downloads 136
26850 Laundering vs. Blanqueo: Translating Financial Crime Metaphors From English to Spanish

Authors: Stephen Gerome

Abstract:

This study examines the translation and use of metaphors in the realm of public safety discourse and intends to shed light on a continuing problem in cross-cultural communication. Metaphors can cause problems not only within languages but also in interlingual communication. The use and misuse of metaphors may hinder the ability to adequately communicate prevention efforts and, in some cases, facilitate and allow financial crime to go undetected. The use of lexicalized metaphors in communications by political entities, journalists, and legal agents in communications regarding law, policy making, compliance monitoring and enforcement as well as in adjudication can have negative consequences if misconstrued. This study provides examples of metaphor usage in published documents in a corpus linguistic study that compares the use of lexicalized metaphors in this discourse to shed light on possible unexpected consequences as well as counterproductive ones.

Keywords: translation, legal, corpus linguistics, financial

Procedia PDF Downloads 80
26849 Tracing the Developmental Repertoire of the Progressive: Evidence from L2 Construction Learning

Authors: Tianqi Wu, Min Wang

Abstract:

Research investigating language acquisition from a constructionist perspective has demonstrated that language is learned as constructions at various linguistic levels, which is related to factors of frequency, semantic prototypicality, and form-meaning contingency. However, previous research on construction learning tended to focus on clause-level constructions such as verb argument constructions but few attempts were made to study morpheme-level constructions such as the progressive construction, which is regarded as a source of acquisition problems for English learners from diverse L1 backgrounds, especially for those whose L1 do not have an equivalent construction such as German and Chinese. To trace the developmental trajectory of Chinese EFL learners’ use of the progressive with respect to verb frequency, verb-progressive contingency, and verbal prototypicality and generality, a learner corpus consisting of three sub-corpora representing three different English proficiency levels was extracted from the Chinese Learners of English Corpora (CLEC). As the reference point, a native speakers’ corpus extracted from the Louvain Corpus of Native English Essays was also established. All the texts were annotated with C7 tagset by part-of-speech tagging software. After annotation all valid progressive hits were retrieved with AntConc 3.4.3 followed by a manual check. Frequency-related data showed that from the lowest to the highest proficiency level, (1) the type token ratio increased steadily from 23.5% to 35.6%, getting closer to 36.4% in the native speakers’ corpus, indicating a wider use of verbs in the progressive; (2) the normalized entropy value rose from 0.776 to 0.876, working towards the target score of 0.886 in native speakers’ corpus, revealing that upper-intermediate learners exhibited a more even distribution and more productive use of verbs in the progressive; (3) activity verbs (i.e., verbs with prototypical progressive meanings like running and singing) dropped from 59% to 34% but non-prototypical verbs such as state verbs (e.g., being and living) and achievement verbs (e.g., dying and finishing) were increasingly used in the progressive. Apart from raw frequency analyses, collostructional analyses were conducted to quantify verb-progressive contingency and to determine what verbs were distinctively associated with the progressive construction. Results were in line with raw frequency findings, which showed that contingency between the progressive and non-prototypical verbs represented by light verbs (e.g., going, doing, making, and coming) increased as English proficiency proceeded. These findings altogether suggested that beginning Chinese EFL learners were less productive in using the progressive construction: they were constrained by a small set of verbs which had concrete and typical progressive meanings (e.g., the activity verbs). But with English proficiency increasing, their use of the progressive began to spread to marginal members such as the light verbs.

Keywords: Construction learning, Corpus-based, Progressives, Prototype

Procedia PDF Downloads 106
26848 On Grammatical Metaphors: A Corpus-Based Reflection on the Academic Texts Written in the Field of Environmental Management

Authors: Masoomeh Estaji, Ahdie Tahamtani

Abstract:

Considering the necessity of conducting research and publishing academic papers during Master’s and Ph.D. programs, graduate students are in dire need of improving their writing skills through either writing courses or self-study planning. One key feature that could aid academic papers to look more sophisticated is the application of grammatical metaphors (GMs). These types of metaphors represent the ‘non-congruent’ and ‘implicit’ ways of decoding meaning through which one grammatical category is replaced by another, more implied counterpart, which can alter the readers’ understanding of the text as well. Although a number of studies have been conducted on the application of GMs across various disciplines, almost none has been devoted to the field of environmental management, and the scope of the previous studies has been relatively limited compared to the present work. In the current study, attempts were made to analyze different types of GMs used in academic papers published in top-tiered journals in the field of environmental management, and make a list of the most frequently used GMs based on their functions in this particular discipline to make the teaching of academic writing courses more explicit and the composition of academic texts more well-structured. To fulfill these purposes, a corpus-based analysis based on the two theoretical models of Martin et al. (1997) and Liardet (2014) was run. Through two stages of manual analysis and concordancers, ten recent academic articles entailing 132490 words published in two prestigious journals were precisely scrutinized. The results yielded that through the whole IMRaD sections of the articles, among all types of ideational GMs, material processes were the most frequent types. The second and the third ranks would apply to the relational and mental categories, respectively. Regarding the use of interpersonal GMs, objective expanding metaphors were the highest in number. In contrast, subjective interpersonal metaphors, either expanding or contracting, were the least significant. This would suggest that scholars in the field of Environmental Management tended to shift the focus on the main procedures and explain technical phenomenon in detail, rather than to compare and contrast other statements and subjective beliefs. Moreover, since no instances of verbal ideational metaphors were detected, it could be deduced that the act of ‘saying or articulating’ something might be against the standards of the academic genre. One other assumption would be that the application of ideational GMs is context-embedded and that the more technical they are, the least frequent they become. For further studies, it is suggested that the employment of GMs to be studied in a wider scope and other disciplines, and the third type of GMs known as ‘textual’ metaphors to be included as well.

Keywords: English for specific purposes, grammatical metaphor, academic texts, corpus-based analysis

Procedia PDF Downloads 142
26847 Dialogism in Research Article Introductions Written by Iranian Non-Native and English Native Speaking Writers

Authors: Moharram Sharifi

Abstract:

Despite a growing interest in the study of the introduction section of Research Articles (RA), there have been few studies to investigate how academic writers engage with other voices and alternative positions in this academic genre. Therefore, the purpose of this study was to show how Native Speaker (NS) and (Non-Native Speaker (NNS) writers take positions and stances in research article introductions. For this purpose, Engagement resources based on the appraisal framework were investigated in sixty articles written by English NS and Iranian NNS published in applied linguistics journals. It was found that the mean occurrences of heteroglossic items in both corpora were larger than those of monoglossic items, but comparing the means of monoglossic engagements between the two corpora, it was revealed that NS writers’ corpus had larger mean occurrences of monoglossic engagements than NNS writers’ corpus implying the native’s stronger authorial stance in the texts. The results also revealed that there was no significant difference in the use of contractive and expansive engagements by NS writers (t (29) = -0.995, p>0.05), indicating a balanced use between the two options. However, the higher mean occurrences of expansive options compared with contractive options in the NNS corpus may suggest that NN writers open up more dialogic room for alternative positions in the RA introductions. The findings of this study may help writers to better perceive the creation of a strong authorial position using appropriate engagement resources in RA introductions.

Keywords: engagement, heteroglossic, monoglossic, introduction

Procedia PDF Downloads 18
26846 Lennox-gastaut Syndrome Associated with Dysgenesis of Corpus Callosum

Authors: A. Bruce Janati, Muhammad Umair Khan, Naif Alghassab, Ibrahim Alzeir, Assem Mahmoud, M. Sammour

Abstract:

Rationale: Lennox-Gastaut syndrome(LGS) is an electro-clinical syndrome composed of the triad of mental retardation, multiple seizure types, and the characteristic generalized slow spike-wave complexes in the EEG. In this article, we report on two patients with LGS whose brain MRI showed dysgenesis of corpus callosum(CC). We review the literature and stress the role of CC in the genesis of secondary bilateral synchrony(SBS). Method: This was a clinical study conducted at King Khalid Hospital. Results: The EEG was consistent with LGS in patient 1 and unilateral slow spike-wave complexes in patient 2. The MRI showed hypoplasia of the splenium of CC in patient 1, and global hypoplasia of CC combined with Joubert syndrome in patient 2. Conclusion: Based on the data, we proffer the following hypotheses: 1-Hypoplasia of CC interferes with functional integrity of this structure. 2-The genu of CC plays a pivotal role in the genesis of secondary bilateral synchrony. 3-Electrodecremental seizures in LGS emanate from pacemakers generated in the brain stem, in particular the mesencephalon projecting abnormal signals to the cortex via thalamic nuclei. 4-Unilateral slow spike-wave complexes in the context of mental retardation and multiple seizure types may represent a variant of LGS, justifying neuroimaging studies.

Keywords: EEG, Lennox-Gastaut syndrome, corpus callosum , MRI

Procedia PDF Downloads 405
26845 A Corpus-Based Study of Evaluative Language in Leading Articles in British Broadsheet and Tabloid Newspapers

Authors: Fatimah AlSaiari

Abstract:

In recent years, newspapers in the United Kingdom have been no longer just a means of sharing news about what happens in the world; they are also used to influence target readers by having them become more up-to-date, well-informed, entertained, exasperated, delighted, and infuriated. To achieve these objectives and maintain influence on public opinion, journalists use a particular language in which they can convey emotions and opinions, organize their discourse, and establish solidarity with their audience. This type of language has been widely analyzed under different labels, such as evaluation, appraisal, and stance. There is a considerable amount of linguistic and non-linguistic research devoted to analyzing this type of interpersonal language in journalistic discourse, and most of these studies were carried out to challenge the traditional assumptions of the objectivity and impartiality of news reporting. However, very little research has been undertaken on evaluative language in newspaper institutional editorials, and there is hardly any systematic or exhaustive analysis of this type of language in British tabloid and broadsheet newspapers. This study will attempt to provide new insights into the nature of authorial and non-authorial evaluation in leading articles in popular and quality British newspapers, along with their targets, sources, and discourse functions. The study will also attempt to develop a framework of evaluation that can be applied to evaluative lexical items in newspaper opinion texts. The framework is both theory-driven (i.e., it builds on and modifies previous frameworks of evaluation such as appraisal theory and parameter-based approach) and data-driven (i.e., it elicits the evaluative categories from the analysis of the corpus, which helps in the development of the current framework). To achieve this aim, a corpus of 140 leading articles were selected. The findings revealed that the tabloids tended to express their stance through explicitness, dramatization, frequent reference to social actors’ emotions and beliefs, and exaggeration in negativity, while the broadsheets preferred to express their stance through mitigation ambiguity and implicitness. conceptual themes and propositions were more preferable targets for expressing stance in the broadsheets while human behavior and characters were preferable targets for the tabloids.

Keywords: appraisal theory, evaluative language, British newspapers, broadsheets & tabloids, evaluative adjectives

Procedia PDF Downloads 268
26844 Computerized Analysis of Phonological Structure of 10,400 Brazilian Sign Language Signs

Authors: Wanessa G. Oliveira, Fernando C. Capovilla

Abstract:

Capovilla and Raphael’s Libras Dictionary documents a corpus of 4,200 Brazilian Sign Language (Libras) signs. Duduchi and Capovilla’s software SignTracking permits users to retrieve signs even when ignoring the gloss corresponding to it and to discover the meaning of all 4,200 signs sign simply by clicking on graphic menus of the sign characteristics (phonemes). Duduchi and Capovilla have discovered that the ease with which any given sign can be retrieved is an inverse function of the average popularity of its component phonemes. Thus, signs composed of rare (distinct) phonemes are easier to retrieve than are those composed of common phonemes. SignTracking offers a means of computing the average popularity of the phonemes that make up each one of 4,200 signs. It provides a precise measure of the degree of ease with which signs can be retrieved, and sign meanings can be discovered. Duduchi and Capovilla’s logarithmic model proved valid: The degree with which any given sign can be retrieved is an inverse function of the arithmetic mean of the logarithm of the popularity of each component phoneme. Capovilla, Raphael and Mauricio’s New Libras Dictionary documents a corpus of 10,400 Libras signs. The present analysis revealed Libras DNA structure by mapping the incidence of 501 sign phonemes resulting from the layered distribution of five parameters: 163 handshape phonemes (CherEmes-ManusIculi); 34 finger shape phonemes (DactilEmes-DigitumIculi); 55 hand placement phonemes (ArtrotoToposEmes-ArticulatiLocusIculi); 173 movement dimension phonemes (CinesEmes-MotusIculi) pertaining to direction, frequency, and type; and 76 Facial Expression phonemes (MascarEmes-PersonalIculi).

Keywords: Brazilian sign language, lexical retrieval, libras sign, sign phonology

Procedia PDF Downloads 309
26843 Translation Quality Assessment in Fansubbed English-Chinese Swearwords: A Corpus-Based Study of the Big Bang Theory

Authors: Qihang Jiang

Abstract:

Fansubbing, the combination of fan and subtitling, is one of the main branches of Audiovisual Translation (AVT) having kindled more and more interest of researchers into the AVT field in recent decades. In particular, the quality of so-called non-professional translation seems questionable due to the non-transparent qualification of subtitlers in a huge community network. This paper attempts to figure out how YYeTs aka 'ZiMuZu', the largest fansubbing group in China, translates swearwords from English to Chinese for its fans of the prevalent American sitcom The Big Bang Theory, taking cultural, social and political elements into account in the context of China. By building a bilingual corpus containing both the source and target texts, this paper found that most of the original swearwords were translated in a toned-down manner, probably due to Chinese audiences’ cultural and social network features as well as the strict censorship under the Chinese government. Additionally, House (2015)’s newly revised model of Translation Quality Assessment (TQA) was applied and examined. Results revealed that most of the subtitled swearwords achieved their pragmatic functions and exerted a communicative effect for audiences. In conclusion, this paper enriches the empirical research concerning House’s new TQA model, gives a full picture of the subtitling of swearwords in AVT field and provides a practical guide for the practitioners in their career of subtitling.

Keywords: corpus-based approach, fansubbing, pragmatic functions, swearwords, translation quality assessment

Procedia PDF Downloads 125
26842 The Diminished Online Persona: A Semantic Change of Chinese Classifier Mei on Weibo

Authors: Hui Shi

Abstract:

This study investigates a newly emerged usage of Chinese numeral classifier mei (枚) in the cyberspace. In modern Chinese grammar, mei as a classifier should occupy the pre-nominal position, and its valid accompanying nouns are restricted to small, flat, fragile inanimate objects rather than humans. To examine the semantic change of mei, two types of data from Weibo.com were collected. First, 500 mei-included Weibo posts constructed a corpus for analyzing this classifier's word order distribution (post-nominal or pre-nominal) as well as its accompanying nouns' semantics (inanimate or human). Second, considering that mei accompanies a remarkable number of human nouns in the first corpus, the second corpus is composed of mei-involved Weibo IDs from users located in first and third-tier cities (n=8 respectively). The findings show that in the cyber community, mei frequently classifies human-related neologisms at the archaic post-normal position. Besides, the 23 to 29-year-old females as well as Weibo users from third-tier cities are the major populations who adopt mei in their user IDs for self-description and identity expression. This paper argues that the creative usage of mei gains popularity in the Chinese internet due to a humor effect. The marked word order switch and semantic misapplication combined to trigger incongruity and jocularity. This study has significance for research on Chinese cyber neologism. It may also lay a foundation for further studies on Chinese classifier change and Chinese internet communication.

Keywords: Chinese classifier, humor, neologism, semantic change

Procedia PDF Downloads 228
26841 Corpus-Based Analysis on the Translatability of Conceptual Vagueness in Traditional Chinese Medicine Classics Huang Di Nei Jing

Authors: Yan Yue

Abstract:

Huang Di Nei Jing (HDNJ) is one of the significant traditional Chinese medicine (TCM) classics which lays the foundation of TCM theory and practice. It is an important work for the world to study the ancient civilizations and medical history of China. Language in HDNJ is highly concise and vague, and notably challenging to translate. This paper investigates the translatability of one particular vagueness in HDNJ: the conceptual vagueness which carries the Chinese philosophical and cultural connotations. The corpora tool Sketch Engine is used to provide potential online contexts and word behaviors. Selected two English translations of HDNJ by TCM practitioner and non-practitioner are used to examine frequency and distribution of linguistic features of the translation. It was found the hypothesis about the universals of translated language (explicitation, normalisation) is true in one translation, but it is on the sacrifice of some original contextual connotations. Transliteration is purposefully used in the second translation to retain the original flavor, which is argued as a violation of the principle of relevance in communication because it yields little contextual effects and demands more processing effort of the reader. The translatability of conceptual vagueness in HDNJ is constrained by source language context and the reader’s cognitive environment.

Keywords: corpus-based translation, translatability, TCM classics, vague language

Procedia PDF Downloads 344