Search results for: corpus based approach
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 36009

Search results for: corpus based approach

35979 Combining Instance-Based and Reasoning-Based Approaches for Ontology Matching

Authors: Abderrahmane Khiat, Moussa Benaissa

Abstract:

Due to the increasing number of sources of information available on the web and their distribution and heterogeneity, ontology alignment became a very important and inevitable problem to ensure semantic interoperability. Instance-based ontology alignment is based on the comparison of the extensions of concepts; and represents a very promising technique to find semantic correspondences between entities of different ontologies. In practice, two situations may arise: ontologies that share many common instances and ontologies that share few or do not share common instances. In this paper, we describe an approach to manage the latter case. This approach exploits the reasoning on ontologies in order to create a corpus of common instances. We show that it is theoretically powerful because it is based on description logics and very useful in practice. We present the experimental results obtained by running our approach on ontologies of OAEI 2012 benchmark test. The results show the performance of our approach.

Keywords: description logic inference, instance-based ontology alignment, semantic interoperability, semantic web

Procedia PDF Downloads 447
35978 Studying Language of Immediacy and Language of Distance from a Corpus Linguistic Perspective: A Pilot Study of Evaluation Markers in French Television Weather Reports

Authors: Vince Liégeois

Abstract:

Language of immediacy and distance: Within their discourse theory, Koch & Oesterreicher establish a distinction between a language of immediacy and a language of distance. The former refers to those discourses which are oriented more towards a spoken norm, whereas the latter entails discourses oriented towards a written norm, regardless of whether they are realised phonically or graphically. This means that an utterance can be realised phonically but oriented more towards the written language norm (e.g., a scientific presentation or eulogy) or realised graphically but oriented towards a spoken norm (e.g., a scribble or chat messages). Research desiderata: The methodological approach from Koch & Oesterreicher has often been criticised for not providing a corpus-linguistic methodology, which makes it difficult to work with quantitative data or address large text collections within this research paradigm. Consequently, the Koch & Oesterreicher approach has difficulties gaining ground in those research areas which rely more on corpus linguistic research models, like text linguistics and LSP-research. A combinatory approach: Accordingly, we want to establish a combinatory approach with corpus-based linguistic methodology. To this end, we propose to (i) include data about the context of an utterance (e.g., monologicity/dialogicity, familiarity with the speaker) – which were called “conditions of communication” in the original work of Koch & Oesterreicher – and (ii) correlate the linguistic phenomenon at the centre of the inquiry (e.g., evaluation markers) to a group of linguistic phenomena deemed typical for either distance- or immediacy-language. Based on these two parameters, linguistic phenomena and texts could then be mapped on an immediacy-distance continuum. Pilot study: To illustrate the benefits of this approach, we will conduct a pilot study on evaluation phenomena in French television weather reports, a form of domain-sensitive discourse which has often been cited as an example of a “text genre”. Within this text genre, we will look at so-called “evaluation markers,” e.g., fixed strings like bad weather, stifling hot, and “no luck today!”. These evaluation markers help to communicate the coming weather situation towards the lay audience but have not yet been studied within the Koch & Oesterreicher research paradigm. Accordingly, we want to figure out whether said evaluation markers are more typical for those weather reports which tend more towards immediacy or those which tend more towards distance. To this aim, we collected a corpus with different kinds of television weather reports,e.g., as part of the news broadcast, including dialogue. The evaluation markers themselves will be studied according to the explained methodology, by correlating them to (i) metadata about the context and (ii) linguistic phenomena characterising immediacy-language: repetition, deixis (personal, spatial, and temporal), a freer choice of tense and right- /left-dislocation. Results: Our results indicate that evaluation markers are more dominantly present in those weather reports inclining towards immediacy-language. Based on the methodology established above, we have gained more insight into the working of evaluation markers in the domain-sensitive text genre of (television) weather reports. For future research, it will be interesting to determine whether said evaluation markers are also typical for immediacy-language-oriented in other domain-sensitive discourses.

Keywords: corpus-based linguistics, evaluation markers, language of immediacy and distance, weather reports

Procedia PDF Downloads 219
35977 OPEN-EmoRec-II-A Multimodal Corpus of Human-Computer Interaction

Authors: Stefanie Rukavina, Sascha Gruss, Steffen Walter, Holger Hoffmann, Harald C. Traue

Abstract:

OPEN-EmoRecII is an open multimodal corpus with experimentally induced emotions. In the first half of the experiment, emotions were induced with standardized picture material and in the second half during a human-computer interaction (HCI), realized with a wizard-of-oz design. The induced emotions are based on the dimensional theory of emotions (valence, arousal and dominance). These emotional sequences - recorded with multimodal data (mimic reactions, speech, audio and physiological reactions) during a naturalistic-like HCI-environment one can improve classification methods on a multimodal level. This database is the result of an HCI-experiment, for which 30 subjects in total agreed to a publication of their data including the video material for research purposes. The now available open corpus contains sensory signal of: video, audio, physiology (SCL, respiration, BVP, EMG Corrugator supercilii, EMG Zygomaticus Major) and mimic annotations.

Keywords: open multimodal emotion corpus, annotated labels, intelligent interaction

Procedia PDF Downloads 416
35976 The Omani Learner of English Corpus: Source and Tools

Authors: Anood Al-Shibli

Abstract:

Designing a learner corpus is not an easy task to accomplish because dealing with learners’ language has many variables which might affect the results of any study based on learners’ language production (spoken and written). Also, it is very essential to systematically design a learner corpus especially when it is aimed to be a reference to language research. Therefore, designing the Omani Learner Corpus (OLEC) has undergone many explicit and systematic considerations. These criteria can be regarded as the foundation to design any learner corpus to be exploited effectively in language use and language learning studies. Added to that, OLEC is manually error-annotated corpus. Error-annotation in learner corpora is very essential; however, it is time-consuming and prone to errors. Consequently, a navigating tool is designed to help the annotators to insert errors’ codes in order to make the error-annotation process more efficient and consistent. To assure accuracy, error annotation procedure is followed to annotate OLEC and some preliminary findings are noted. One of the main results of this procedure is creating an error-annotation system based on the Omani learners of English language production. Because OLEC is still in the first stages, the primary findings are related to only one level of proficiency and one error type which is verb related errors. It is found that Omani learners in OLEC has the tendency to have more errors in forming the verb and followed by problems in agreement of verb. Comparing the results to other error-based studies indicate that the Omani learners tend to have basic verb errors which can found in lower-level of proficiency. To this end, it is essential to admit that examining learners’ errors can give insights to language acquisition and language learning and most errors do not happen randomly but they occur systematically among language learners.

Keywords: error-annotation system, error-annotation manual, learner corpora, verbs related errors

Procedia PDF Downloads 141
35975 The Acquisition of Case in Biological Domain Based on Text Mining

Authors: Shen Jian, Hu Jie, Qi Jin, Liu Wei Jie, Chen Ji Yi, Peng Ying Hong

Abstract:

In order to settle the problem of acquiring case in biological related to design problems, a biometrics instance acquisition method based on text mining is presented. Through the construction of corpus text vector space and knowledge mining, the feature selection, similarity measure and case retrieval method of text in the field of biology are studied. First, we establish a vector space model of the corpus in the biological field and complete the preprocessing steps. Then, the corpus is retrieved by using the vector space model combined with the functional keywords to obtain the biological domain examples related to the design problems. Finally, we verify the validity of this method by taking the example of text.

Keywords: text mining, vector space model, feature selection, biologically inspired design

Procedia PDF Downloads 260
35974 Saudi Twitter Corpus for Sentiment Analysis

Authors: Adel Assiri, Ahmed Emam, Hmood Al-Dossari

Abstract:

Sentiment analysis (SA) has received growing attention in Arabic language research. However, few studies have yet to directly apply SA to Arabic due to lack of a publicly available dataset for this language. This paper partially bridges this gap due to its focus on one of the Arabic dialects which is the Saudi dialect. This paper presents annotated data set of 4700 for Saudi dialect sentiment analysis with (K= 0.807). Our next work is to extend this corpus and creation a large-scale lexicon for Saudi dialect from the corpus.

Keywords: Arabic, sentiment analysis, Twitter, annotation

Procedia PDF Downloads 629
35973 Corpus-Assisted Study of Gender Related Tiger Metaphors in the Chinese Context

Authors: Na Xiao

Abstract:

Animal metaphors have many different connotations, ranging from loving emotions to derogatory epithets, but gender expressions using animal metaphors are often imbalanced. Generally, animal metaphors related to females tend to be negative. Little known about the reasons for the negative expressions of animal female metaphors in Chinese contexts still have not been quantified. The Modern Chinese Corpus at the Center for Chinese Linguistics at Peking University (CCL Corpus) provided the data for this research, which aims to identify the influencing variables of gender differences in the description of animal metaphors mapping humans in Chinese by observing the percentage of "tiger" metaphor, which is based on the conceptual metaphor theory. A quantitative research method was used in this study to statistically examine the gender attitude percentage of the "tiger" metaphor using corpus data. This study has proved that the tiger metaphors associated with humans in the Chinese context tend to be negative. Importantly, this study has also shown that the high proportion of tiger metaphorical idioms is what causes the high proportion of negative tiger metaphors that are related to women. This finding can be used as crucial information for future studies on other gender-related animal metaphorical idioms and can offer additional insights for understanding trends in other animal metaphors.

Keywords: Chinese, CCL corpus, gender differences, metaphorical idioms, tigers

Procedia PDF Downloads 108
35972 Corpus Linguistic Methods in a Theoretical Study of Quran Verb Tense and Aspect in Translations from Arabic to English

Authors: Jawharah Alasmari

Abstract:

In inflectional morphology of verb, tense and aspect indicate action’s time either past/present or future and their period whether completed or not. The usage and meaning of tense and aspect differ in Arabic and English, therefore is no simple one -to- one mapping from an Arabic verb inflected form an appropriate English translation depends on a range of features, including immediate and wider context of use. The Quranic Arabic Corpus includes seven alternative expertly crafted English translations of each Arabic verses, which provides a test dataset for the study of appropriate Arabic to English translations of verb tense and aspect. We applied Corpus Linguistics Methods in a theoretical study of exemplary verbs, to elicit candidate verbal contexts which influence the choice of English inflection for each verse.

Keywords: Corpus linguistics methods, Arabic verb, tense and aspect, English translations

Procedia PDF Downloads 391
35971 Redundancy in Malay Morphology: School Grammar versus Corpus Grammar

Authors: Zaharani Ahmad, Nor Hashimah Jalaluddin

Abstract:

The aim of this paper is to examine and identify the issue of linguistic redundancy in two competing grammars of Malay, namely the school grammar and the corpus grammar. The former is a normative grammar which is formally and prescriptively taught in the classroom, whereas the latter is a descriptive grammar that is informally acquired and mastered by the students as native speakers of the language outside the classroom. Corpus grammar is depicted based on its actual used in natural occurring texts, as attested in the corpus. It is observed that the grammar taught in schools is incompatible with the grammar used in the corpus. For instance, a noun phrase containing nominal reduplicated form which denotes plurality (i.e. murid-murid ‘students’ which is derived from murid ‘student’) and a modifier categorized as quantifiers (i.e. semua ‘all’, seluruh ‘entire’, and kebanyakan ‘most’) is not acceptable in the school grammar because the formation (i.e. semua murid-murid ‘all the students’ kebanyakan pelajar-pelajar ‘most of the students’) is claimed to be redundant, and redundancy is prohibited in the grammar. Redundancy is generally construed as the property of speech and language by which more information is provided than is precisely required for the message to be understood, so that, if some information is omitted, the remaining information will still be sufficient for the message to be comprehended. Thus, the correct construction to be used is strictly the reduplicated form (i.e. murid-murid ‘students’) or the quantifier plus the root (i.e. semua murid ‘all the students’) with the intention that the grammatical meaning of plural is not repeated. Nevertheless, the so-called redundant form (i.e. kebanyakan pelajar-pelajar ‘most of the students’) is frequently used in the corpus grammar. This study shows that there are a number of redundant forms occur in the morphology of the language, particularly in affixation, reduplication and combination of both. Apparently, the so-called redundancy has grammatical and socio-cultural functions in communication that is to give emphasis and to stress the importance of the information delivered by the speakers or writers.

Keywords: corpus grammar, morphology, redundancy, school grammar

Procedia PDF Downloads 342
35970 Conceptual Metaphors of Responsibility in Arabic to English Translation of Political Speeches: A Corpus-Based Study

Authors: Amr Anany

Abstract:

This study offers a corpus-based analysis of the conceptual metaphors of RESPONSIBILITY inherent in the Arabic political speeches of King Abdulla II and their English translations rendered by the translators of the Royal Hashemite Court ("RHC translators"). In view of the Conceptual Metaphor Theory (CMT), the current study aims to uncover the extent to which the dominant ideology in the source Arabic speeches of King Abdulla II is conveyed into the target English translation. The study explores a bilingual corpus, including eleven authentic Arabic speeches delivered by King Abdulla II and their English translations. The study finds that both Arabic and English share several metaphorical expressions of RESPONSIBILITY that are based on bodily experience such as RESPONSIBILITY IS UP, RESPONSIBILITY IS AN OBJECT, and RESPONSIBILITY IS AN HONOR. Apparently, the study concludes that RHC translators succeed to convey the dominant ideology from the source Arabic speeches to the English ones using specific translation strategies.

Keywords: cognitive linguistics, CDA, conceptual metaphor theory, ideology, responsibility

Procedia PDF Downloads 71
35969 Translation Quality Assessment in Fansubbed English-Chinese Swearwords: A Corpus-Based Study of the Big Bang Theory

Authors: Qihang Jiang

Abstract:

Fansubbing, the combination of fan and subtitling, is one of the main branches of Audiovisual Translation (AVT) having kindled more and more interest of researchers into the AVT field in recent decades. In particular, the quality of so-called non-professional translation seems questionable due to the non-transparent qualification of subtitlers in a huge community network. This paper attempts to figure out how YYeTs aka 'ZiMuZu', the largest fansubbing group in China, translates swearwords from English to Chinese for its fans of the prevalent American sitcom The Big Bang Theory, taking cultural, social and political elements into account in the context of China. By building a bilingual corpus containing both the source and target texts, this paper found that most of the original swearwords were translated in a toned-down manner, probably due to Chinese audiences’ cultural and social network features as well as the strict censorship under the Chinese government. Additionally, House (2015)’s newly revised model of Translation Quality Assessment (TQA) was applied and examined. Results revealed that most of the subtitled swearwords achieved their pragmatic functions and exerted a communicative effect for audiences. In conclusion, this paper enriches the empirical research concerning House’s new TQA model, gives a full picture of the subtitling of swearwords in AVT field and provides a practical guide for the practitioners in their career of subtitling.

Keywords: corpus-based approach, fansubbing, pragmatic functions, swearwords, translation quality assessment

Procedia PDF Downloads 142
35968 A Self-Built Corpus-Based Study of Four-Word Lexical Bundles in Native English Teachers’ EFL Classroom Discourse in Northeast China: The Significance of Stance

Authors: Fang Tan

Abstract:

This research focuses on the appropriate use of lexical bundles in spoken discourse, particularly in English as a Foreign Language (EFL) classrooms in Northeast China. While previous studies have mainly examined lexical bundles in written discourse, there is a need to investigate their usage in spoken discourse due to the limited availability of spoken discourse corpora. English teachers’ use of lexical bundles is crucial for effective teaching and communication in the EFL classroom. The aim of this study is to investigate the functions of four-word lexical bundles in native English teachers’ EFL oral English classes in Northeast China. Specifically, the research focuses on the usage of stance bundles, which were found to be the most significant type of bundle in the analyzed corpus. By comparing the self-built university spoken English classroom discourse corpus with the other self-built university English for General Purposes (EGP) corpus, the study aims to highlight the difference in bundle usage between native and non-native teachers in EFL classrooms. The research employs a corpus-based study. The observed corpus consists of more than 300,000 tokens, in which the data has been collected in the past five years. The reference corpus is composed of over 800,000 tokens, in which the data has been collected over 12 years. All the primary data collection involved transcribing and annotating spoken English classes taught by native English teachers. The analysis procedures included identifying and categorizing four-word lexical bundles, with specific emphasis on stance bundles. Frequency counts, and comparisons with the Chinese English teachers’ corpus were conducted to identify patterns and differences in bundle usage. The research addresses the following questions: 1) What are the functions of four-word lexical bundles in native English teachers’ EFL oral English classes? 2) How do stance bundles differ in usage between native and non-native English teachers’ classes? 3) What implications can be drawn for English teachers’ professional development based on the findings? In conclusion, this study provides valuable insights into the usage of four-word lexical bundles, particularly stance bundles, in native English teachers’ EFL oral English classes in Northeast China. The research highlights the difference in bundle usage between native and non-native English teachers’ classes and provides implications for English teachers’ professional development. The findings contribute to the understanding of lexical bundle usage in EFL classroom discourse and have theoretical importance for language teaching methodologies. The self-built university English classroom discourse corpus used in this research is a valuable resource for future studies in this field.

Keywords: EFL classroom discourse, four-word lexical bundles, stance, implication

Procedia PDF Downloads 65
35967 Metaphors Investigation between President Xi Jinping of China and Trump of Us on the Corpus-Based Approach

Authors: Jie Zheng, Ruifeng Luo

Abstract:

The United States is the world’s most developed economy with the strongest military power. China is the fastest growing country with growing comprehensive strength and its economic strength is second only to the US. However, the conflict between them is getting serious in recent years. President’s address is the representative of a nation’s ideology. The paper has built up a small sized corpus of President Xi Jinping and Trump’s speech in Davos to investigate their respective use and types of metaphors and calculate the respective percentage of each type of metaphor. The result shows President Xi Jinping employs more metaphors than Trump. The metaphors of Xi includes “building” metaphor, “plant” metaphor, “journey” metaphor, “ship” metaphor, “traffic” metaphor, “nation is a person” metaphor, “show” metaphor, etc while Trump’s comprises “war” metaphor, “building” metaphor, “journey” metaphor, “traffic” metaphor, “tax” metaphor, “book” metaphor, etc. After investigating metaphor use differences, the paper makes an analysis of the underlying ideology between the two nations. China is willing to strengthen ties with all the countries all over the world and has built a platform of development for them and itself to go to the destination of social well being while the US pays much concern to itself, emphasizing its first leading position and is also willing to help its alliances to development. The paper’s comparison of the ideology difference between the two countries will help them get a better understanding and reduce the conflict to some extent.

Keywords: metaphor; corpus; ideology; conflict

Procedia PDF Downloads 147
35966 A Corpus Study of English Verbs in Chinese EFL Learners’ Academic Writing Abstracts

Authors: Shuaili Ji

Abstract:

The correct use of verbs is an important element of high-quality research articles, and thus for Chinese EFL learners, it is significant to master characteristics of verbs and to precisely use verbs. However, some researches have shown that there are differences in using verbs between learners and native speakers and learners have difficulty in using English verbs. This corpus-based quantitative research can enhance learners’ knowledge of English verbs and promote the quality of research article abstracts even of the whole academic writing. The aim of this study is to find the differences between learners’ and native speakers’ use of verbs and to study the factors that contribute to those differences. To this end, the research question is as follows: What are the differences between most frequently used verbs by learners and those by native speakers? The research question is answered through a study that uses corpus-based data-driven approach to analyze the verbs used by learners in their abstract writings in terms of collocation, colligation and semantic prosody. The results show that: (1) EFL learners obviously overused ‘be, can, find, make’ and underused ‘investigate, examine, may’. As to modal verbs, learners obviously overused ‘can’ while underused ‘may’. (2) Learners obviously overused ‘we find + object clauses’ while underused ‘nouns (results, findings, data) + suggest/indicate/reveal + object clauses’ when expressing research results. (3) Learners tended to transfer the collocation, colligation and semantic prosody of shǐ and zuò to make. (4) Learners obviously overused ‘BE+V-ed’ and used BE as the main verb. They also obviously overused the basic forms of BE such as be, is, are, while obviously underused its inflections (was, were). These results manifested learners’ lack of accuracy and idiomatic property in verb usage. Due to the influence of the concept transfer of Chinese, the verbs in learners’ abstracts showed obvious transfer of mother language. In addition, learners have not fully mastered the use of verbs, avoiding using complex colligations to prevent errors. Based on these findings, the present study has implications for English teaching, seeking to have implications for English academic abstract writing in China. Further research could be undertaken to study the use of verbs in the whole dissertation to find out whether the characteristic of the verbs in abstracts can apply in the whole dissertation or not.

Keywords: academic writing abstracts, Chinese EFL learners, corpus-based, data-driven, verbs

Procedia PDF Downloads 335
35965 The Value of Computerized Corpora in EFL Textbook Design: The Case of Modal Verbs

Authors: Lexi Li

Abstract:

This study aims to contribute to the field of how computer technology can be exploited to enhance EFL textbook design. Specifically, the study demonstrates how computerized native and learner corpora can be used to enhance modal verb treatment in EFL textbooks. The linguistic focus is will, would, can, could, may, might, shall, should, must. The native corpus is the spoken component of BNC2014 (hereafter BNCS2014). The spoken part is chosen because the pedagogical purpose of the textbooks is communication-oriented. Using the standard query option of CQPweb, 5% of each of the nine modals was sampled from BNCS2014. The learner corpus is the POS-tagged Ten-thousand English Compositions of Chinese Learners (TECCL). All the essays under the “secondary school” section were selected. A series of five secondary coursebooks comprise the textbook corpus. All the data in both the learner and the textbook corpora are retrieved through the concordance functions of WordSmith Tools (version, 5.0). Data analysis was divided into two parts. The first part compared the patterns of modal verbs in the textbook corpus and BNC2014 with respect to distributional features, semantic functions, and co-occurring constructions to examine whether the textbooks reflect the authentic use of English. Secondly, the learner corpus was compared with the textbook corpus in terms of the use (distributional features, semantic functions, and co-occurring constructions) in order to examine the degree of influence of the textbook on learners’ use of modal verbs. Moreover, the learner corpus was analyzed for the misuse (syntactic errors, e.g., she can sings*.) of the nine modal verbs to uncover potential difficulties that confront learners. The results indicate discrepancies between the textbook presentation of modal verbs and authentic modal use in natural discourse in terms of distributions of frequencies, semantic functions, and co-occurring structures. Furthermore, there are consistent patterns of use between the learner corpus and the textbook corpus with respect to the three above-mentioned aspects, except could, will and must, partially confirming the correlation between the frequency effects and L2 grammar acquisition. Further analysis reveals that the exceptions are caused by both positive and negative L1 transfer, indicating that the frequency effects can be intercepted by L1 interference. Besides, error analysis revealed that could, would, should and must are the most difficult for Chinese learners due to both inter-linguistic and intra-linguistic interference. The discrepancies between the textbook corpus and the native corpus point to a need to adjust the presentation of modal verbs in the textbooks in terms of frequencies, different meanings, and verb-phrase structures. Along with the adjustment of modal verb treatment based on authentic use, it is important for textbook writers to take into consideration the L1 interference as well as learners’ difficulties in their use of modal verbs. The present study is a methodological showcase of the combination both native and learner corpora in the enhancement of EFL textbook language authenticity and appropriateness for learners.

Keywords: EFL textbooks, learner corpus, modal verbs, native corpus

Procedia PDF Downloads 124
35964 Crossing the Interdisciplinary Border: A Multidimensional Linguistics Analysis of a Legislative Discourse

Authors: Manvender Kaur Sarjit Singh

Abstract:

There is a crucial mismatch between classroom written language tasks and real world written language requirements. Realizing the importance of reducing the gap between the professional needs of the legal practitioners and the higher learning institutions that offer the legislative education in Malaysia, it is deemed necessary to develop a framework that integrates real-life written communication with the teaching of content-based legislative discourse to future legal practitioners. By highlighting the actual needs of the legal practitioners in the country, the present teaching practices will be enhanced and aligned with the actual needs of the learners thus realizing the vision and aspirations of the Malaysian Education Blueprint 2013-2025 and Legal Profession Qualifying Board. The need to focus future education according to the actual needs of the learners can be realized by developing a teaching framework which is designed within the prospective requirements of its real-life context. This paper presents the steps taken to develop a specific teaching framework that fulfills the fundamental real-life context of the prospective legal practitioners. The teaching framework was developed based on real-life written communication from the legal profession in Malaysia, using the specific genre analysis approach which integrates a corpus-based approach and a structural linguistics analysis. This approach was adopted due to its fundamental nature of intensive exploration of the real-life written communication according to the established strategies used. The findings showed the use of specific moves and parts-of-speech by the legal practitioners, in order to prepare the selected genre. The teaching framework is hoped to enhance the teachings of content-based law courses offered at present in the higher learning institutions in Malaysia.

Keywords: linguistics analysis, corpus analysis, genre analysis, legislative discourse

Procedia PDF Downloads 383
35963 A Corpus-Based Analysis of "MeToo" Discourse in South Korea: Coverage Representation in Korean Newspapers

Authors: Sun-Hee Lee, Amanda Kraley

Abstract:

The “MeToo” movement is a social movement against sexual abuse and harassment. Though the hashtag went viral in 2017 following different cultural flashpoints in different countries, the initial response was quiet in South Korea. This radically changed in January 2018, when a high-ranking senior prosecutor, Seo Ji-hyun, gave a televised interview discussing being sexually assaulted by a colleague. Acknowledging public anger, particularly among women, on the long-existing problems of sexual harassment and abuse, the South Korean media have focused on several high-profile cases. Analyzing the media representation of these cases is a window into the evolving South Korean discourse around “MeToo.” This study presents a linguistic analysis of “MeToo” discourse in South Korea by utilizing a corpus-based approach. The term corpus (pl. corpora) is used to refer to electronic language data, that is, any collection of recorded instances of spoken or written language. A “MeToo” corpus has been collected by extracting newspaper articles containing the keyword “MeToo” from BIGKinds, big data analysis, and service and Nexis Uni, an online academic database search engine, to conduct this language analysis. The corpus analysis explores how Korean media represent accusers and the accused, victims and perpetrators. The extracted data includes 5,885 articles from four broadsheet newspapers (Chosun, JoongAng, Hangyore, and Kyunghyang) and 88 articles from two Korea-based English newspapers (Korea Times and Korea Herald) between January 2017 and November 2020. The information includes basic data analysis with respect to keyword frequency and network analysis and adds refined examinations of select corpus samples through naming strategies, semantic relations, and pragmatic properties. Along with the exponential increase of the number of articles containing the keyword “MeToo” from 104 articles in 2017 to 3,546 articles in 2018, the network and keyword analysis highlights ‘US,’ ‘Harvey Weinstein’, and ‘Hollywood,’ as keywords for 2017, with articles in 2018 highlighting ‘Seo Ji-Hyun, ‘politics,’ ‘President Moon,’ ‘An Ui-Jeong, ‘Lee Yoon-taek’ (the names of perpetrators), and ‘(Korean) society.’ This outcome demonstrates the shift of media focus from international affairs to domestic cases. Another crucial finding is that word ‘defamation’ is widely distributed in the “MeToo” corpus. This relates to the South Korean legal system, in which a person who defames another by publicly alleging information detrimental to their reputation—factual or fabricated—is punishable by law (Article 307 of the Criminal Act of Korea). If the defamation occurs on the internet, it is subject to aggravated punishment under the Act on Promotion of Information and Communications Network Utilization and Information Protection. These laws, in particular, have been used against accusers who have publicly come forward in the wake of “MeToo” in South Korea, adding an extra dimension of risk. This corpus analysis of “MeToo” newspaper articles contributes to the analysis of the media representation of the “MeToo” movement and sheds light on the shifting landscape of gender relations in the public sphere in South Korea.

Keywords: corpus linguistics, MeToo, newspapers, South Korea

Procedia PDF Downloads 223
35962 Understanding the Top Questions Asked about Hong Kong by Travellers Worldwide through a Corpus-Based Discourse Analytic Approach

Authors: Phoenix W. Y. Lam

Abstract:

As one of the most important service-oriented industries in contemporary society, tourism has increasingly seen the influence of the Internet on all aspects of travelling. Travellers nowadays habitually research online before making travel-related decisions. One platform on which such research is conducted is destination forums. The emergence of such online destination forums in the last decade has allowed tourists to share their travel experiences quickly and easily with a large number of online users around the world. As such, these destination forums also provide invaluable data for tourism bodies to better understand travellers’ views on their destinations. Collecting posts from the Hong Kong travel forum on the world’s largest travel website TripAdvisor®, the present study identifies the top questions asked by TripAdvisor users about Hong Kong through a corpus-based discourse analytic approach. Based on questions posted on the forum and their associated meta-data gathered in a one-year period, the study examines the top questions asked by travellers around the world to identify the key geographical locations in which users have shown the greatest interest in the city. Questions raised by travellers from different geographical locations are also compared to see if traveller communities by location vary in terms of their areas of interest. This analysis involves the study of key words and concordance of frequently-occurring items and a close reading of representative examples in context. Findings from the present study show that travellers who asked the most questions about Hong Kong are from North America and Asia, and that travellers from different locations have different concerns and interests, which are clearly reflected in the language of the questions asked on the travel forum. These findings can therefore provide tourism organisations with useful information about the key markets that should be targeted for promotional purposes, and can also allow such organisations to design advertising campaigns which better address the specific needs of such markets. The present study thus demonstrates the value of applying linguistic knowledge and methodologies to the domain of tourism to address practical issues.

Keywords: corpus, hong kong, online travel forum, tourism, TripAdvisor

Procedia PDF Downloads 177
35961 Towards a Large Scale Deep Semantically Analyzed Corpus for Arabic: Annotation and Evaluation

Authors: S. Alansary, M. Nagi

Abstract:

This paper presents an approach of conducting semantic annotation of Arabic corpus using the Universal Networking Language (UNL) framework. UNL is intended to be a promising strategy for providing a large collection of semantically annotated texts with formal, deep semantics rather than shallow. The result would constitute a semantic resource (semantic graphs) that is editable and that integrates various phenomena, including predicate-argument structure, scope, tense, thematic roles and rhetorical relations, into a single semantic formalism for knowledge representation. The paper will also present the Interactive Analysis​ tool for automatic semantic annotation (IAN). In addition, the cornerstone of the proposed methodology which are the disambiguation and transformation rules, will be presented. Semantic annotation using UNL has been applied to a corpus of 20,000 Arabic sentences representing the most frequent structures in the Arabic Wikipedia. The representation, at different linguistic levels was illustrated starting from the morphological level passing through the syntactic level till the semantic representation is reached. The output has been evaluated using the F-measure. It is 90% accurate. This demonstrates how powerful the formal environment is, as it enables intelligent text processing and search.

Keywords: semantic analysis, semantic annotation, Arabic, universal networking language

Procedia PDF Downloads 582
35960 Tagging a corpus of Media Interviews with Diplomats: Challenges and Solutions

Authors: Roberta Facchinetti, Sara Corrizzato, Silvia Cavalieri

Abstract:

Increasing interconnection between data digitalization and linguistic investigation has given rise to unprecedented potentialities and challenges for corpus linguists, who need to master IT tools for data analysis and text processing, as well as to develop techniques for efficient and reliable annotation in specific mark-up languages that encode documents in a format that is both human and machine-readable. In the present paper, the challenges emerging from the compilation of a linguistic corpus will be taken into consideration, focusing on the English language in particular. To do so, the case study of the InterDiplo corpus will be illustrated. The corpus, currently under development at the University of Verona (Italy), represents a novelty in terms both of the data included and of the tag set used for its annotation. The corpus covers media interviews and debates with diplomats and international operators conversing in English with journalists who do not share the same lingua-cultural background as their interviewees. To date, this appears to be the first tagged corpus of international institutional spoken discourse and will be an important database not only for linguists interested in corpus analysis but also for experts operating in international relations. In the present paper, special attention will be dedicated to the structural mark-up, parts of speech annotation, and tagging of discursive traits, that are the innovational parts of the project being the result of a thorough study to find the best solution to suit the analytical needs of the data. Several aspects will be addressed, with special attention to the tagging of the speakers’ identity, the communicative events, and anthropophagic. Prominence will be given to the annotation of question/answer exchanges to investigate the interlocutors’ choices and how such choices impact communication. Indeed, the automated identification of questions, in relation to the expected answers, is functional to understand how interviewers elicit information as well as how interviewees provide their answers to fulfill their respective communicative aims. A detailed description of the aforementioned elements will be given using the InterDiplo-Covid19 pilot corpus. The data yielded by our preliminary analysis of the data will highlight the viable solutions found in the construction of the corpus in terms of XML conversion, metadata definition, tagging system, and discursive-pragmatic annotation to be included via Oxygen.

Keywords: spoken corpus, diplomats’ interviews, tagging system, discursive-pragmatic annotation, english linguistics

Procedia PDF Downloads 185
35959 Verb Bias in Mandarin: The Corpus Based Study of Children

Authors: Jou-An Chung

Abstract:

The purpose of this study is to investigate the verb bias of the Mandarin verbs in children’s reading materials and provide the criteria for categorization. Verb bias varies cross-linguistically. As Mandarin and English are typological different, this study hopes to shed light on Mandarin verb bias with the use of corpus and provide thorough and detailed criteria for analysis. Moreover, this study focuses on children’s reading materials since it is a significant issue in understanding children’s sentence processing. Therefore, investigating verb bias of Mandarin verbs in children’s reading materials is also an important issue and can provide further insights into children’s sentence processing. The small corpus is built up for this study. The corpus consists of the collection of school textbooks and Mandarin Daily News for children. The files are then segmented and POS tagged by JiebaR (Chinese segmentation with R). For the ease of analysis, the one-word character verbs and intransitive verbs are excluded beforehand. The total of 20 high frequency verbs are hand-coded and are further categorized into one of the three types, namely DO type, SC type and other category. If the frequency of taking Other Type exceeds the threshold of 25%, the verb is excluded from the study. The results show that 10 verbs are direct object bias verbs, and six verbs are sentential complement bias verbs. The paired T-test was done to assure the statistical significance (p = 0.0001062 for DO bias verb, p=0.001149 for SC bias verb). The result has shown that in children’s reading materials, the DO biased verbs are used more than the SC bias verbs since the simplest structure of sentences is easier for children’s sentence comprehension or processing. In sum, this study not only discussed verb bias in child's reading materials but also provided basic coding criteria for verb bias analysis in Mandarin and underscored the role of context. Sentences are easier for children’s sentence comprehension or processing. In sum, this study not only discussed verb bias in child corpus, but also provided basic coding criteria for verb bias analysis in Mandarin and underscored the role of context.

Keywords: corpus linguistics, verb bias, child language, psycholinguistics

Procedia PDF Downloads 291
35958 Corpus Stylistics and Multidimensional Analysis for English for Specific Purposes Teaching and Assessment

Authors: Svetlana Strinyuk, Viacheslav Lanin

Abstract:

Academic English has become lingua franca for international scientific community which stimulates universities to introduce English for Specific Purposes (EAP) courses into curriculum. Teaching L2 EAP students might be fulfilled with corpus technologies and digital stylistics. A special software developed to reach the manifold task of teaching, assessing and researching academic writing of L2 students on basis of digital stylistics and multidimensional analysis was created. A set of annotations (style markers) – grammar, lexical and syntactic features most significant of academic writing was built. Contrastive comparison of two corpora “model corpus”, subject domain limited papers published by competent writers in leading academic journals, and “students’ corpus”, subject domain limited papers written by last year students allows to receive data about the features of academic writing underused or overused by L2 EAP student. Both corpora are tagged with a special software created in GATE Developer. Style markers within the framework of research might be replaced depending on the relevance and validity of the result which is achieved from research corpora. Thus, selecting relevant (high frequency) style markers and excluding less relevant, i.e. less frequent annotations, high validity of the model is achieved. Software allows to compare the data received from processing model corpus to students’ corpus and get reports which can be used in teaching and assessment. The less deviation from the model corpus students demonstrates in their writing the higher is academic writing skill acquisition. The research showed that several style markers (hedging devices) were underused by L2 EAP students whereas lexical linking devices were used excessively. A special software implemented into teaching of EAP courses serves as a successful visual aid, makes assessment more valid; it is indicative of the degree of writing skill acquisition, and provides data for further research.

Keywords: corpus technologies in EAP teaching, multidimensional analysis, GATE Developer, corpus stylistics

Procedia PDF Downloads 198
35957 A Corpus-Linguistic Analysis of Online Iranian News Coverage on Syrian Revolution

Authors: Amaal Ali Al-Gamde

Abstract:

The Syrian revolution is a major issue in the Middle East, which draws in world powers and receives a great focus in international mass media since 2011. The heavy global reliance on cyber news and digital sources plays a key role in conveying a sense of bias to a wide range of online readers. Thus, based on the assumption that media discourse possesses ideological implications, this study investigates the representation of Syrian revolution in online media. The paper explores the discursive constructions of anti and pro-government powers in Syrian revolution in 1000,000-word corpus of Fars online reports (an Iranian news agency), issued between 2013 and 2015. Taking a corpus assisted discourse analysis approach, the analysis investigates three types of lexicosemantic relations, the semantic macrostructures within which the two social actors are framed, the lexical collocations characterizing the news discourse and the discourse prosodies they tell about the two sides of the conflict. The study utilizes computer-based approaches, sketch engine and AntConc software to minimize the bias of the subjective analysis. The analysis moves from the insights of lexical frequencies and keyness scores to examine themes and the collocational patterns. The findings reveal the Fars agency’s ideological mode of representations in reporting events of Syrian revolution in two ways. The first is by stereotyping the opposition groups under the umbrella of terrorism, using words such as (law breakers, foreign-backed groups, militant groups, terrorists) to legitimize the atrocities of security forces against protesters and enhance horror among civilians. The second is through emphasizing the power of the government and depicting it as the defender of the Arab land by foregrounding the discourse of international conspiracy against Syria. The paper concludes discussing the potential importance of triangulating corpus linguistic tools with critical discourse analysis to elucidate more about discourses and reality.

Keywords: discourse prosody, ideology, keyness, semantic macrostructure

Procedia PDF Downloads 131
35956 A Corpus-Based Study of Subtitling Religious Words into Arabic

Authors: Yousef Sahari, Eisa Asiri

Abstract:

Hollywood films are produced in an open and liberal context, and when subtitling for a more conservative and closed society such as an Arabic society, religious words can pose a thorny challenge for subtitlers. Using a corpus of 90 Hollywood films released between 2000 and 2018 and applying insights from Descriptive Translation Studies (Toury, 1995, 2012) and the dichotomy of domestication and foreignization, this paper investigates three main research questions: (1) What are the dominant religious terms and functions in the English subtitles? (2) What are the dominant translation strategies used in the translation of religious words? (3) Do these strategies tend to be SL-oriented or TL-oriented (domesticating or foreignising)? To answer the research questions above, a quantitative and qualitative analysis of the corpus is conducted, in which the researcher adopts a self-designed, parallel, aligned corpus of ninety films and their Arabic subtitles. A quantitative analysis is performed to compare the frequencies and distribution of religious words, their functions, and the translation strategies employed by the subtitlers of ninety films, with the aim of identifying similarities or differences in addition to identifying the impact of functions of religious terms on the use of subtitling strategies. Based on the quantitative analysis, a qualitative analysis is performed to identify any translational patterns in Arabic translations of religious words and the possible reasons for subtitlers’ choices. The results show that the function of religious words has a strong influence on the choice of subtitling strategies. Also, it is found that foreignization strategies are applied in about two-thirds of the total occurrences of religious words.

Keywords: religious terms, subtitling, audiovisual translation, modern standard arabic, subtitling strategies, english-arabic subtitling

Procedia PDF Downloads 166
35955 A Bayesian Approach for Analyzing Academic Article Structure

Authors: Jia-Lien Hsu, Chiung-Wen Chang

Abstract:

Research articles may follow a simple and succinct structure of organizational patterns, called move. For example, considering extended abstracts, we observe that an extended abstract usually consists of five moves, including Background, Aim, Method, Results, and Conclusion. As another example, when publishing articles in PubMed, authors are encouraged to provide a structured abstract, which is an abstract with distinct and labeled sections (e.g., Introduction, Methods, Results, Discussions) for rapid comprehension. This paper introduces a method for computational analysis of move structures (i.e., Background-Purpose-Method-Result-Conclusion) in abstracts and introductions of research documents, instead of manually time-consuming and labor-intensive analysis process. In our approach, sentences in a given abstract and introduction are automatically analyzed and labeled with a specific move (i.e., B-P-M-R-C in this paper) to reveal various rhetorical status. As a result, it is expected that the automatic analytical tool for move structures will facilitate non-native speakers or novice writers to be aware of appropriate move structures and internalize relevant knowledge to improve their writing. In this paper, we propose a Bayesian approach to determine move tags for research articles. The approach consists of two phases, training phase and testing phase. In the training phase, we build a Bayesian model based on a couple of given initial patterns and the corpus, a subset of CiteSeerX. In the beginning, the priori probability of Bayesian model solely relies on initial patterns. Subsequently, with respect to the corpus, we process each document one by one: extract features, determine tags, and update the Bayesian model iteratively. In the testing phase, we compare our results with tags which are manually assigned by the experts. In our experiments, the promising accuracy of the proposed approach reaches 56%.

Keywords: academic English writing, assisted writing, move tag analysis, Bayesian approach

Procedia PDF Downloads 330
35954 The Advancements of Transformer Models in Part-of-Speech Tagging System for Low-Resource Tigrinya Language

Authors: Shamm Kidane, Ibrahim Abdella, Fitsum Gaim, Simon Mulugeta, Sirak Asmerom, Natnael Ambasager, Yoel Ghebrihiwot

Abstract:

The call for natural language processing (NLP) systems for low-resource languages has become more apparent than ever in the past few years, with the arduous challenges still present in preparing such systems. This paper presents an improved dataset version of the Nagaoka Tigrinya Corpus for Parts-of-Speech (POS) classification system in the Tigrinya language. The size of the initial Nagaoka dataset was incremented, totaling the new tagged corpus to 118K tokens, which comprised the 12 basic POS annotations used previously. The additional content was also annotated manually in a stringent manner, followed similar rules to the former dataset and was formatted in CONLL format. The system made use of the novel approach in NLP tasks and use of the monolingually pre-trained TiELECTRA, TiBERT and TiRoBERTa transformer models. The highest achieved score is an impressive weighted F1-score of 94.2%, which surpassed the previous systems by a significant measure. The system will prove useful in the progress of NLP-related tasks for Tigrinya and similarly related low-resource languages with room for cross-referencing higher-resource languages.

Keywords: Tigrinya POS corpus, TiBERT, TiRoBERTa, conditional random fields

Procedia PDF Downloads 103
35953 Cataphora in English and Chinese Conversation: A Corpus-based Contrastive Study

Authors: Jun Gao

Abstract:

This paper combines the corpus-based and contrastive approaches, seeking to provide a systematic account of cataphora in English and Chinese natural conversations. Based on spoken corpus data, the first part of the paper examines a range of characteristics of cataphora in the two languages, including frequency of occurrence, patterns, and syntactic features. On the basis of this exploration, cataphora in the two languages are contrasted in a structured way. The analysis shows that English and Chinese share a similar distribution of cataphora in natural conversations in terms of frequency of occurrence, with repeat identification cataphora higher than first mention cataphora and intra-sentential cataphora much higher than inter-sentential cataphora. In terms of patterns, three types are identified in English, i.e. P+N, Ø+N, and it+Clause, while in Chinese, two types are identified, i.e., P+N and Ø+N. English and Chinese are similar in terms of syntactic features, i.e., cataphor and postcedent in the intra-sentential cataphora mainly occur in the initial subject position of the same clause, with postcedent immediately followed or delayed, and cataphor and postcedent are mostly in adjacent sentences in inter-sentential cataphora. In the second part of the paper, the motivations of cataphora are investigated. It is found that cataphora is primarily motivated by the speaker and hearer’s different knowledge states with regard to the referent. Other factors are also involved, such as interference, word search, and the tension between the principles of Economy and Clarity.

Keywords: cataphora, contrastive study, motivation, pattern, syntactic features

Procedia PDF Downloads 81
35952 Spoken Subcorpus of the Kazakh Language: History, Content, Methodology

Authors: Kuralay Bimoldaevna Kuderinova, Beisenkhan Samal

Abstract:

The history of creating a linguistic corpus in Kazakh linguistics begins only in 2016. Though within this short period of time, the linguistic corpus has become a national corpus and its several subcorpora, namely historical, cultural, spoken, dialectological, writers’ subcorpus, proverbs subcorpus and poetic texts subcorpus, have appeared and are working effectively. Among them, the spoken corpus has its own characteristics. The Kazakh language is one of the languages belonging to the Kypchak-Nogai group of Turkic peoples. The Kazakh language is a language that, as a part of the former Soviet Union, was directly influenced by the Russian language and underwent major changes in its spoken and written forms. After the Republic of Kazakhstan gained independence, the Kazakh language received the status of the state language in 1991. However, today, the prestige of the Russian language is still higher than that of the Kazakh language. Therefore, the direct influence of the Russian language on the structure, style, and vocabulary of the Kazakh language continues. In particular, it can be said that the national practice of the spoken language is disappearing, as the spoken form of Kazakh is not used in official gatherings and events of state importance. In this regard, it is very important to collect and preserve examples of spoken language. Recording exemplary spoken texts, converting them into written form, and providing their audio along with orphoepic explanations will serve as a valuable tool for teaching and learning the Kazakh language. Therefore, the report will cover interesting aspects and scientific foundations related to the creation, content, and methodology of the oral subcorpus of the Kazakh language.

Keywords: spoken corpus, Kazakh language, orthoepic norm, LLM

Procedia PDF Downloads 8
35951 A Corpus Output Error Analysis of Chinese L2 Learners From America, Myanmar, and Singapore

Authors: Qiao-Yu Warren Cai

Abstract:

Due to the rise of big data, building corpora and using them to analyze ChineseL2 learners’ language output has become a trend. Various empirical research has been conducted using Chinese corpora built by different academic institutes. However, most of the research analyzed the data in the Chinese corpora usingcorpus-based qualitative content analysis with descriptive statistics. Descriptive statistics can be used to make summations about the subjects or samples that research has actually measured to describe the numerical data, but the collected data cannot be generalized to the population. Comte, a Frenchpositivist, has argued since the 19th century that human beings’ knowledge, whether the discipline is humanistic and social science or natural science, should be verified in a scientific way to construct a universal theory to explain the truth and human beings behaviors. Inferential statistics, able to make judgments of the probability of a difference observed between groups being dependable or caused by chance (Free Geography Notes, 2015)and to infer from the subjects or examples what the population might think or behave, is just the right method to support Comte’s argument in the field of TCSOL. Also, inferential statistics is a core of quantitative research, but little research has been conducted by combing corpora with inferential statistics. Little research analyzes the differences in Chinese L2 learners’ language corpus output errors by using theOne-way ANOVA so that the findings of previous research are limited to inferring the population's Chinese errors according to the given samples’ Chinese corpora. To fill this knowledge gap in the professional development of Taiwanese TCSOL, the present study aims to utilize the One-way ANOVA to analyze corpus output errors of Chinese L2 learners from America, Myanmar, and Singapore. The results show that no significant difference exists in ‘shì (是) sentence’ and word order errors, but compared with Americans and Singaporeans, it is significantly easier for Myanmar to have ‘sentence blends.’ Based on the above results, the present study provides an instructional approach and contributes to further exploration of how Chinese L2 learners can have (and use) learning strategies to lower errors.

Keywords: Chinese corpus, error analysis, one-way analysis of variance, Chinese L2 learners, Americans, myanmar, Singaporeans

Procedia PDF Downloads 106
35950 Synaesthetic Metaphors in Persian: a Cognitive Corpus Based and Comparative Perspective

Authors: A. Afrashi

Abstract:

Introduction: Synaesthesia is a term denoting the perception or description of the perception of one sense modality in terms of another. In literature, synaesthesia refers to a technique adopted by writers to present ideas, characters or places in such a manner that they appeal to more than one sense like hearing, seeing, smell etc. at a given time. In everyday language too we find many examples of synaesthesia. We commonly hear phrases like ‘loud colors’, ‘frozen silence’ and ‘warm colors’, ‘bitter cold’ etc. Empirical cognitive studies have proved that synaesthetic representations both in literature and everyday languages are constrained ie. they do not map randomly among sensory domains. From the beginning of the 20th century Synaesthesia has been a research domain both in literature and structural linguistics. However the exploration of cognitive mechanisms motivating synaesthesia, have made it an important topic in 21st century cognitive linguistics and literary studies. Synaesthetic metaphors are linguistic representations of those mental mechanisms, the study of which reveals invaluable facts about perception, cognition and conceptualization. According to the main tenets of cognitive approach to language and literature, unified and similar cognitive mechanisms are active both in everyday language and literature, and synaesthesia is one of those cognitive mechanisms. Main objective of the present research is to answer the following questions: What types of sense transfers are accessible in Persian synaesthetic metaphors. How are these types of sense transfers cognitively explained. What are the results of cross-linguistic comparative study of synaestetic metaphors based on the existing observations? Methodology: The present research employs a cognitive - corpus based method, and the theoretical framework adopted to analyze linguistic synaesthesia is the contemporary theory of metaphor, where conceptual metaphor is the result of systemic mappings across cognitive domains. Persian Language Data- base (PLDB) in the Institute for Humanities and Cultural Studies which consists mainly of Persian modern prose, is searched for synaesthetic metaphors. Then for each metaphorical structure, the source and target domains are determined. Then sense transfers are identified and the types of synaesthetic metaphors recognized. Findings: Persian synaesthetic metaphors conform to the hierarchical distribution principle, according to which transfers tend to go from touch to taste to smell to sound and to sight, not vice versa. In other words mapping from more accessible or basic concepts onto less accessible or less basic ones seems more natural. Furthermore the most frequent target domain in Persian synaesthetic metaphors is sound. Certain characteristics of Persian synaesthetic metaphors are comparable with existing related researches carried on English, French, Hungarian and Chinese synaesthetic metaphors. Conclusion: Cognitive corpus based approaches to linguistic synaesthesia, are applicable to stylistics and literary criticism and this recent research domain is an efficient approach to study cross linguistic variations to find out which of the five senses is dominant cross linguistically and cross culturally as the target domain in metaphorical mappings , and so forth receiving dominance in conceptualizations.

Keywords: cognitive semantics, conceptual metaphor, synaesthesia, corpus based approach

Procedia PDF Downloads 562