Search results for: corpus studies
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 11348

Search results for: corpus studies

11258 A Framework for Chinese Domain-Specific Distant Supervised Named Entity Recognition

Authors: Qin Long, Li Xiaoge

Abstract:

The Knowledge Graphs have now become a new form of knowledge representation. However, there is no consensus in regard to a plausible and definition of entities and relationships in the domain-specific knowledge graph. Further, in conjunction with several limitations and deficiencies, various domain-specific entities and relationships recognition approaches are far from perfect. Specifically, named entity recognition in Chinese domain is a critical task for the natural language process applications. However, a bottleneck problem with Chinese named entity recognition in new domains is the lack of annotated data. To address this challenge, a domain distant supervised named entity recognition framework is proposed. The framework is divided into two stages: first, the distant supervised corpus is generated based on the entity linking model of graph attention neural network; secondly, the generated corpus is trained as the input of the distant supervised named entity recognition model to train to obtain named entities. The link model is verified in the ccks2019 entity link corpus, and the F1 value is 2% higher than that of the benchmark method. The re-pre-trained BERT language model is added to the benchmark method, and the results show that it is more suitable for distant supervised named entity recognition tasks. Finally, it is applied in the computer field, and the results show that this framework can obtain domain named entities.

Keywords: distant named entity recognition, entity linking, knowledge graph, graph attention neural network

Procedia PDF Downloads 79
11257 Variables, Annotation, and Metadata Schemas for Early Modern Greek

Authors: Eleni Karantzola, Athanasios Karasimos, Vasiliki Makri, Ioanna Skouvara

Abstract:

Historical linguistics unveils the historical depth of languages and traces variation and change by analyzing linguistic variables over time. This field of linguistics usually deals with a closed data set that can only be expanded by the (re)discovery of previously unknown manuscripts or editions. In some cases, it is possible to use (almost) the entire closed corpus of a language for research, as is the case with the Thesaurus Linguae Graecae digital library for Ancient Greek, which contains most of the extant ancient Greek literature. However, concerning ‘dynamic’ periods when the production and circulation of texts in printed as well as manuscript form have not been fully mapped, representative samples and corpora of texts are needed. Such material and tools are utterly lacking for Early Modern Greek (16th-18th c.). In this study, the principles of the creation of EMoGReC, a pilot representative corpus of Early Modern Greek (16th-18th c.) are presented. Its design follows the fundamental principles of historical corpora. The selection of texts aims to create a representative and balanced corpus that gives insight into diachronic, diatopic and diaphasic variation. The pilot sample includes data derived from fully machine-readable vernacular texts, which belong to 4-5 different textual genres and come from different geographical areas. We develop a hierarchical linguistic annotation scheme, further customized to fit the characteristics of our text corpus. Regarding variables and their variants, we use as a point of departure the bundle of twenty-four features (or categories of features) for prose demotic texts of the 16th c. Tags are introduced bearing the variants [+old/archaic] or [+novel/vernacular]. On the other hand, further phenomena that are underway (cf. The Cambridge Grammar of Medieval and Early Modern Greek) are selected for tagging. The annotated texts are enriched with metalinguistic and sociolinguistic metadata to provide a testbed for the development of the first comprehensive set of tools for the Greek language of that period. Based on a relational management system with interconnection of data, annotations, and their metadata, the EMoGReC database aspires to join a state-of-the-art technological ecosystem for the research of observed language variation and change using advanced computational approaches.

Keywords: early modern Greek, variation and change, representative corpus, diachronic variables.

Procedia PDF Downloads 47
11256 An Automatic Speech Recognition Tool for the Filipino Language Using the HTK System

Authors: John Lorenzo Bautista, Yoon-Joong Kim

Abstract:

This paper presents the development of a Filipino speech recognition tool using the HTK System. The system was trained from a subset of the Filipino Speech Corpus developed by the DSP Laboratory of the University of the Philippines-Diliman. The speech corpus was both used in training and testing the system by estimating the parameters for phonetic HMM-based (Hidden-Markov Model) acoustic models. Experiments on different mixture-weights were incorporated in the study. The phoneme-level word-based recognition of a 5-state HMM resulted in an average accuracy rate of 80.13 for a single-Gaussian mixture model, 81.13 after implementing a phoneme-alignment, and 87.19 for the increased Gaussian-mixture weight model. The highest accuracy rate of 88.70% was obtained from a 5-state model with 6 Gaussian mixtures.

Keywords: Filipino language, Hidden Markov Model, HTK system, speech recognition

Procedia PDF Downloads 462
11255 A Corpus-Based Study of Evaluative Language in Leading Articles in British Broadsheet and Tabloid Newspapers

Authors: Fatimah AlSaiari

Abstract:

In recent years, newspapers in the United Kingdom have been no longer just a means of sharing news about what happens in the world; they are also used to influence target readers by having them become more up-to-date, well-informed, entertained, exasperated, delighted, and infuriated. To achieve these objectives and maintain influence on public opinion, journalists use a particular language in which they can convey emotions and opinions, organize their discourse, and establish solidarity with their audience. This type of language has been widely analyzed under different labels, such as evaluation, appraisal, and stance. There is a considerable amount of linguistic and non-linguistic research devoted to analyzing this type of interpersonal language in journalistic discourse, and most of these studies were carried out to challenge the traditional assumptions of the objectivity and impartiality of news reporting. However, very little research has been undertaken on evaluative language in newspaper institutional editorials, and there is hardly any systematic or exhaustive analysis of this type of language in British tabloid and broadsheet newspapers. This study will attempt to provide new insights into the nature of authorial and non-authorial evaluation in leading articles in popular and quality British newspapers, along with their targets, sources, and discourse functions. The study will also attempt to develop a framework of evaluation that can be applied to evaluative lexical items in newspaper opinion texts. The framework is both theory-driven (i.e., it builds on and modifies previous frameworks of evaluation such as appraisal theory and parameter-based approach) and data-driven (i.e., it elicits the evaluative categories from the analysis of the corpus, which helps in the development of the current framework). To achieve this aim, a corpus of 140 leading articles were selected. The findings revealed that the tabloids tended to express their stance through explicitness, dramatization, frequent reference to social actors’ emotions and beliefs, and exaggeration in negativity, while the broadsheets preferred to express their stance through mitigation ambiguity and implicitness. conceptual themes and propositions were more preferable targets for expressing stance in the broadsheets while human behavior and characters were preferable targets for the tabloids.

Keywords: appraisal theory, evaluative language, British newspapers, broadsheets & tabloids, evaluative adjectives

Procedia PDF Downloads 273
11254 The Power of Words: A Corpus Analysis of Campaign Speeches of President Donald J. Trump

Authors: Aiza Dalman

Abstract:

Words are powerful when these are used wisely and strategically. In this study, twelve (12) campaign speeches of President Donald J. Trump were analyzed as to frequently used words and ethos, pathos and logos being employed. The speeches were read thoroughly, analyzed and interpreted. With the use of Word Counter Tool and Text Analyzer software accessible online, it was found out that the word ‘will’ has the highest frequency of 121, followed by Hillary (58), American (38), going (35), plan and Clinton (32), illegal (30), government (28), corruption (26) and criminal (24). When the speeches were analyzed as to ethos, pathos and logos, on the other hand, it revealed that these were all employed in his speeches. The statements under these pointed out against Hillary or in his favor. The unique strategy of President Donald J. Trump as to frequently used words and ethos, pathos and logos in persuading people perhaps lead the way to his victory.

Keywords: campaign speeches, corpus analysis, ethos, logos and pathos, power of words

Procedia PDF Downloads 264
11253 A Corpus Study of English Verbs in Chinese EFL Learners’ Academic Writing Abstracts

Authors: Shuaili Ji

Abstract:

The correct use of verbs is an important element of high-quality research articles, and thus for Chinese EFL learners, it is significant to master characteristics of verbs and to precisely use verbs. However, some researches have shown that there are differences in using verbs between learners and native speakers and learners have difficulty in using English verbs. This corpus-based quantitative research can enhance learners’ knowledge of English verbs and promote the quality of research article abstracts even of the whole academic writing. The aim of this study is to find the differences between learners’ and native speakers’ use of verbs and to study the factors that contribute to those differences. To this end, the research question is as follows: What are the differences between most frequently used verbs by learners and those by native speakers? The research question is answered through a study that uses corpus-based data-driven approach to analyze the verbs used by learners in their abstract writings in terms of collocation, colligation and semantic prosody. The results show that: (1) EFL learners obviously overused ‘be, can, find, make’ and underused ‘investigate, examine, may’. As to modal verbs, learners obviously overused ‘can’ while underused ‘may’. (2) Learners obviously overused ‘we find + object clauses’ while underused ‘nouns (results, findings, data) + suggest/indicate/reveal + object clauses’ when expressing research results. (3) Learners tended to transfer the collocation, colligation and semantic prosody of shǐ and zuò to make. (4) Learners obviously overused ‘BE+V-ed’ and used BE as the main verb. They also obviously overused the basic forms of BE such as be, is, are, while obviously underused its inflections (was, were). These results manifested learners’ lack of accuracy and idiomatic property in verb usage. Due to the influence of the concept transfer of Chinese, the verbs in learners’ abstracts showed obvious transfer of mother language. In addition, learners have not fully mastered the use of verbs, avoiding using complex colligations to prevent errors. Based on these findings, the present study has implications for English teaching, seeking to have implications for English academic abstract writing in China. Further research could be undertaken to study the use of verbs in the whole dissertation to find out whether the characteristic of the verbs in abstracts can apply in the whole dissertation or not.

Keywords: academic writing abstracts, Chinese EFL learners, corpus-based, data-driven, verbs

Procedia PDF Downloads 313
11252 MicroRNA in Bovine Corpus Luteum during Early Pregnancy

Authors: Rreze Gecaj, Corina Schanzenbach, Benedikt Kirchner, Michael Pfaffl, Bajram Berisha

Abstract:

The maintenance of corpus lutem (CL) during early pregnancy in cattle is a critical and multifarious process. A luteotrophic mechanism originating from the embryo is widely accepted as the triggering signal for the CL maintenance. In the cattle, it is the interferon-tau (IFNT) secretion form conceptus that prevents CL regression and ensures progesterone production for the establishment of pregnancy. In addition to endocrine and paracrine signals, microRNA (miRNA) can also support CL sustainability during early pregnancy. MiRNA are small non-coding nucleic acids that regulate gene expression post-transcriptionally and are shown to be involved in the modulation of CL function. However, the examination of miRNAs in corpus luteum function at the early pregnancy still remains largely uncovered. This study aims at profiling the expression of miRNA in CL during the early pregnancy in cattle by comparing it with the CL form late cycle and with the regressed CL. Corpora lutea were assigned in two different groups during the cycle (C13 group, late CL: days 13-18 and C18, regressed CL group: day >18) and during the early pregnancy (group P: 1-2 month). The estrous cycle was determined by macroscopic examination and to age the fetus crown-rump length measurement was applied. A total of 9 corpora lutea from individual animals were included in the study, three corpora lutea for each group. MiRNAs population was profiled using small RNA next-generation sequencing and biologically significant miRNAs were evaluated for their differential expression using the DESeq2-methodology. We show that 6 differentially expressed miRNAs (bta-mir-2890, -2332, -2441-3p, -148b, -1248 and -29c) are common to both comparisons, P vs C13 and P vs C18. While for each stage individually we have identified unique miRNAs differentially expressed only for the given comparison. bta-miR-23a and -769 were unique miRNAs differentially expressed in P vs C13, whereas forty-four unique miRNAs were identified as differentially expressed in P vs C18. These data confirm that miRNAs are highly abundant in luteal tissue during early pregnancy and potentially regulate the CL maintenance at this stage of fetus development.

Keywords: bovine, corpus luteum, microRNA, pregnancy, RNA-Seq

Procedia PDF Downloads 244
11251 The Analysis of Deceptive and Truthful Speech: A Computational Linguistic Based Method

Authors: Seham El Kareh, Miramar Etman

Abstract:

Recently, detecting liars and extracting features which distinguish them from truth-tellers have been the focus of a wide range of disciplines. To the author’s best knowledge, most of the work has been done on facial expressions and body gestures but only few works have been done on the language used by both liars and truth-tellers. This paper sheds light on four axes. The first axis copes with building an audio corpus for deceptive and truthful speech for Egyptian Arabic speakers. The second axis focuses on examining the human perception of lies and proving our need for computational linguistic-based methods to extract features which characterize truthful and deceptive speech. The third axis is concerned with building a linguistic analysis program that could extract from the corpus the inter- and intra-linguistic cues for deceptive and truthful speech. The program built here is based on selected categories from the Linguistic Inquiry and Word Count program. Our results demonstrated that Egyptian Arabic speakers on one hand preferred to use first-person pronouns and present tense compared to the past tense when lying and their lies lacked of second-person pronouns, and on the other hand, when telling the truth, they preferred to use the verbs related to motion and the nouns related to time. The results also showed that there is a need for bigger data to prove the significance of words related to emotions and numbers.

Keywords: Egyptian Arabic corpus, computational analysis, deceptive features, forensic linguistics, human perception, truthful features

Procedia PDF Downloads 191
11250 Corpus-Based Neural Machine Translation: Empirical Study Multilingual Corpus for Machine Translation of Opaque Idioms - Cloud AutoML Platform

Authors: Khadija Refouh

Abstract:

Culture bound-expressions have been a bottleneck for Natural Language Processing (NLP) and comprehension, especially in the case of machine translation (MT). In the last decade, the field of machine translation has greatly advanced. Neural machine translation NMT has recently achieved considerable development in the quality of translation that outperformed previous traditional translation systems in many language pairs. Neural machine translation NMT is an Artificial Intelligence AI and deep neural networks applied to language processing. Despite this development, there remain some serious challenges that face neural machine translation NMT when translating culture bounded-expressions, especially for low resources language pairs such as Arabic-English and Arabic-French, which is not the case with well-established language pairs such as English-French. Machine translation of opaque idioms from English into French are likely to be more accurate than translating them from English into Arabic. For example, Google Translate Application translated the sentence “What a bad weather! It runs cats and dogs.” to “يا له من طقس سيء! تمطر القطط والكلاب” into the target language Arabic which is an inaccurate literal translation. The translation of the same sentence into the target language French was “Quel mauvais temps! Il pleut des cordes.” where Google Translate Application used the accurate French corresponding idioms. This paper aims to perform NMT experiments towards better translation of opaque idioms using high quality clean multilingual corpus. This Corpus will be collected analytically from human generated idiom translation. AutoML translation, a Google Neural Machine Translation Platform, is used as a custom translation model to improve the translation of opaque idioms. The automatic evaluation of the custom model will be compared to the Google NMT using Bilingual Evaluation Understudy Score BLEU. BLEU is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. Human evaluation is integrated to test the reliability of the Blue Score. The researcher will examine syntactical, lexical, and semantic features using Halliday's functional theory.

Keywords: multilingual corpora, natural language processing (NLP), neural machine translation (NMT), opaque idioms

Procedia PDF Downloads 125
11249 A Corpus-Linguistic Analysis of Online Iranian News Coverage on Syrian Revolution

Authors: Amaal Ali Al-Gamde

Abstract:

The Syrian revolution is a major issue in the Middle East, which draws in world powers and receives a great focus in international mass media since 2011. The heavy global reliance on cyber news and digital sources plays a key role in conveying a sense of bias to a wide range of online readers. Thus, based on the assumption that media discourse possesses ideological implications, this study investigates the representation of Syrian revolution in online media. The paper explores the discursive constructions of anti and pro-government powers in Syrian revolution in 1000,000-word corpus of Fars online reports (an Iranian news agency), issued between 2013 and 2015. Taking a corpus assisted discourse analysis approach, the analysis investigates three types of lexicosemantic relations, the semantic macrostructures within which the two social actors are framed, the lexical collocations characterizing the news discourse and the discourse prosodies they tell about the two sides of the conflict. The study utilizes computer-based approaches, sketch engine and AntConc software to minimize the bias of the subjective analysis. The analysis moves from the insights of lexical frequencies and keyness scores to examine themes and the collocational patterns. The findings reveal the Fars agency’s ideological mode of representations in reporting events of Syrian revolution in two ways. The first is by stereotyping the opposition groups under the umbrella of terrorism, using words such as (law breakers, foreign-backed groups, militant groups, terrorists) to legitimize the atrocities of security forces against protesters and enhance horror among civilians. The second is through emphasizing the power of the government and depicting it as the defender of the Arab land by foregrounding the discourse of international conspiracy against Syria. The paper concludes discussing the potential importance of triangulating corpus linguistic tools with critical discourse analysis to elucidate more about discourses and reality.

Keywords: discourse prosody, ideology, keyness, semantic macrostructure

Procedia PDF Downloads 115
11248 Query in Grammatical Forms and Corpus Error Analysis

Authors: Katerina Florou

Abstract:

Two decades after coined the term "learner corpora" as collections of texts created by foreign or second language learners across various language contexts, and some years following suggestion to incorporate "focusing on form" within a Task-Based Learning framework, this study aims to explore how learner corpora, whether annotated with errors or not, can facilitate a focus on form in an educational setting. Argues that analyzing linguistic form serves the purpose of enabling students to delve into language and gain an understanding of different facets of the foreign language. This same objective is applicable when analyzing learner corpora marked with errors or in their raw state, but in this scenario, the emphasis lies on identifying incorrect forms. Teachers should aim to address errors or gaps in the students' second language knowledge while they engage in a task. Building on this recommendation, we compared the written output of two student groups: the first group (G1) employed the focusing on form phase by studying a specific aspect of the Italian language, namely the past participle, through examples from native speakers and grammar rules; the second group (G2) focused on form by scrutinizing their own errors and comparing them with analogous examples from a native speaker corpus. In order to test our hypothesis, we created four learner corpora. The initial two were generated during the task phase, with one representing each group of students, while the remaining two were produced as a follow-up activity at the end of the lesson. The results of the first comparison indicated that students' exposure to their own errors can enhance their grasp of a grammatical element. The study is in its second stage and more results are to be announced.

Keywords: Corpus interlanguage analysis, task based learning, Italian language as F1, learner corpora

Procedia PDF Downloads 33
11247 Number Variation of the Personal Pronoun We in American Spoken English

Authors: Qiong Hu, Ming Yue

Abstract:

Language variation signals the newest usage of language community, which might become the developmental trend of that language. The personal pronoun we is prescribed as a plural pronoun in grammar, but its number value is more flexible in actual use. Based on the homemade Friends corpus, the present research explores the number value of the first person pronoun we in nowadays American spoken English. With consideration of the subjectivity of we, this paper used ‘we+ PCU (Perception-cognation-utterance) verbs’ collocations and ‘we+ plural categories’ as the parameters. Results from corpus data and manual annotation show that: 1) the overall frequency of we has been increasing; 2) we has been increasingly used with other plural categories, indicating a weakening of its plural reference; and 3) we has been increasingly used with PCU (perception-cognition-utterance) verbs of strong subjectivity, indicating a strengthening of its singular reference. All these seem to support our hypothesis that we is undergoing the process of further grammaticalization towards a singular reference, though future evidence is needed to attest the bold prediction.

Keywords: number, PCU verbs, personal pronoun we,

Procedia PDF Downloads 219
11246 The Construction of Malaysian Airline Tragedies in Malaysian and British Online News: A Multidisciplinary Study

Authors: Theng Theng Ong

Abstract:

This study adopts a multidisciplinary method by combining the corpus-based discourse analysis study and language attitude study to explore the construction of Malaysia airline tragedies: MH370, MH17 and QZ8501 in the selected Malaysian and United Kingdom (UK) online news. The study aims to determine the ways in which Malaysian Airline tragedies MH370, MH17 and QZ8501 are linguistically defined and constructed in terms of keyword and collocation. The study also seeks to identify the types of discourse that are presented in the new articles. The differences or similarities in terms of keywords, topics or issues covered by the selected Malaysian and UK news media will also be examined. Finally, the language attitude study will be carried out to examine the Malaysia and UK university students’ attitudes toward the keywords, topics or issues covered by the selected Malaysian and UK news media pertaining to Malaysian Airline tragedies MH370, MH17 and QZ8501. The analysis is divided into two parts with the first part focusing on corpus-based discourse analysis on the media text. The second part of the study is to investigate Malaysians and UK news readers’ attitudes towards the online news being reported by the Malaysian and UK news media pertaining to the Airline tragedies. The main findings of corpus-based discourse analysis are essential in designing the questions in the questionnaires and interview and therefore led to the identification of the attitudes among Malaysian and UK news. This study adopts a multidisciplinary method by combining the corpus-based discourse analysis study and language attitude study to explore the construction of Malaysia airline tragedies: MH370, MH17 and QZ8501 in the selected Malaysian and United Kingdom (UK) online news. The study aims to determine the ways in which Malaysian Airline tragedies MH370, MH17 and QZ8501 are linguistically defined and constructed in terms of keyword and collocation. The study also seeks to identify the types of discourse that are presented in the new articles. The differences or similarities in terms of keywords, topics or issues covered by the selected Malaysian and UK news media will also be examined. Finally, the language attitude study will be carried out to examine the Malaysia and UK university students’ attitudes toward the keywords, topics or issues covered by the selected Malaysian and UK news media pertaining to Malaysian Airline tragedies MH370, MH17 and QZ8501. The analysis is divided into two parts with the first part focusing on corpus-based discourse analysis on the media text. The second part of the study is to investigate Malaysians and UK news readers’ attitudes towards the online news being reported by the Malaysian and UK news media pertaining to the Airline tragedies. The main findings of corpus-based discourse analysis are essential in designing the questions in the questionnaires and interview and therefore led to the identification of the attitudes among Malaysian and UK news.

Keywords: corpus linguistics, critical discourse analysis, news media, tragedies study

Procedia PDF Downloads 323
11245 We Are the 99 percent – the Occupy-Movement in Social Media

Authors: Wolfram Karg

Abstract:

The Occupy-Movement came into in 2011 existence in the US as a reaction to one of the worst economic crisis since World War II. With cuts in benefits and social services, with people being evicted from their homes on the one hand and high bonuses granted to their managers of the very same companies, a strong feeling of injustice besieged people in the US and caused them to voice their anger peacefully in social media and on the streets. Due to the world-wide-web, users all around the world read about this movement and recognized the same injustice in their own countries, making Occupy a global movement. The vast array of topics covered by Occupy offers a unique chance to carry out a corpus-based discourse analysis based on the DIMEAN-Model. The focus on this paper is limited to two aspects of DIMEAN: intertextual references and the use of connectors in texts. Because the discourse is to a large extent carried out via posts in blogs, online-articles and comments, the paper also analyses, in how far modern (i.e. computer-based media) there is a correlation between the use of connectors in different communicative types used by the Occupy-Movement.

Keywords: discourse, new media, occupy, corpus analysis

Procedia PDF Downloads 473
11244 Lexical Collocations in Medical Articles of Non-Native vs Native English-Speaking Researchers

Authors: Waleed Mandour

Abstract:

This study presents multidimensional scrutiny of Benson et al.’s seven-category taxonomy of lexical collocations used by Egyptian medical authors and their peers of native-English speakers. It investigates 212 medical papers, all published during a span of 6 years (from 2013 to 2018). The comparison is held to the medical research articles submitted by native speakers of English (25,238 articles in total with over 103 million words) as derived from the Directory of Open Access Journals (a 2.7 billion-word corpus). The non-native speakers compiled corpus was properly annotated and marked-up manually by the researcher according to the standards of Weisser. In terms of statistical comparisons, though, deployed were the conventional frequency-based analysis besides the relevant criteria, such as association measures (AMs) in which LogDice is deployed as per the recommendation of Kilgariff et al. when comparing large corpora. Despite the terminological convergence in the subject corpora, comparison results confirm the previous literature of which the non-native speakers’ compositions reveal limited ranges of lexical collocations in terms of their distribution. However, there is a ubiquitous tendency of overusing the NS-high-frequency multi-words in all lexical categories investigated. Furthermore, Egyptian authors, conversely to their English-speaking peers, tend to embrace more collocations denoting quantitative rather than qualitative analyses in their produced papers. This empirical work, per se, contributes to the English for Academic Purposes (EAP) and English as a Lingua Franca in Academic settings (ELFA). In addition, there are pedagogical implications that would promote a better quality of medical research papers published in Egyptian universities.

Keywords: corpus linguistics, EAP, ELFA, lexical collocations, medical discourse

Procedia PDF Downloads 116
11243 Sentence Structure for Free Word Order Languages in Context with Anaphora Resolution: A Case Study of Hindi

Authors: Pardeep Singh, Kamlesh Dutta

Abstract:

Many languages have fixed sentence structure and others are free word order. The accuracy of anaphora resolution of syntax based algorithm depends on structure of the sentence. So, it is important to analyze the structure of any language before implementing these algorithms. In this study, we analyzed the sentence structure exploiting the case marker in Hindi as well as some special tag for subject and object. We also investigated the word order for Hindi. Word order typology refers to the study of the order of the syntactic constituents of a language. We analyzed 165 news items of Ranchi Express from EMILEE corpus of plain text. It consisted of 1745 sentences. Eight file of dialogue based from the same corpus has been analyzed which will have 1521 sentences. The percentages of subject object verb structure (SOV) and object subject verb (OSV) are 66.90 and 33.10, respectively.

Keywords: anaphora resolution, free word order languages, SOV, OSV

Procedia PDF Downloads 457
11242 Studying Language of Immediacy and Language of Distance from a Corpus Linguistic Perspective: A Pilot Study of Evaluation Markers in French Television Weather Reports

Authors: Vince Liégeois

Abstract:

Language of immediacy and distance: Within their discourse theory, Koch & Oesterreicher establish a distinction between a language of immediacy and a language of distance. The former refers to those discourses which are oriented more towards a spoken norm, whereas the latter entails discourses oriented towards a written norm, regardless of whether they are realised phonically or graphically. This means that an utterance can be realised phonically but oriented more towards the written language norm (e.g., a scientific presentation or eulogy) or realised graphically but oriented towards a spoken norm (e.g., a scribble or chat messages). Research desiderata: The methodological approach from Koch & Oesterreicher has often been criticised for not providing a corpus-linguistic methodology, which makes it difficult to work with quantitative data or address large text collections within this research paradigm. Consequently, the Koch & Oesterreicher approach has difficulties gaining ground in those research areas which rely more on corpus linguistic research models, like text linguistics and LSP-research. A combinatory approach: Accordingly, we want to establish a combinatory approach with corpus-based linguistic methodology. To this end, we propose to (i) include data about the context of an utterance (e.g., monologicity/dialogicity, familiarity with the speaker) – which were called “conditions of communication” in the original work of Koch & Oesterreicher – and (ii) correlate the linguistic phenomenon at the centre of the inquiry (e.g., evaluation markers) to a group of linguistic phenomena deemed typical for either distance- or immediacy-language. Based on these two parameters, linguistic phenomena and texts could then be mapped on an immediacy-distance continuum. Pilot study: To illustrate the benefits of this approach, we will conduct a pilot study on evaluation phenomena in French television weather reports, a form of domain-sensitive discourse which has often been cited as an example of a “text genre”. Within this text genre, we will look at so-called “evaluation markers,” e.g., fixed strings like bad weather, stifling hot, and “no luck today!”. These evaluation markers help to communicate the coming weather situation towards the lay audience but have not yet been studied within the Koch & Oesterreicher research paradigm. Accordingly, we want to figure out whether said evaluation markers are more typical for those weather reports which tend more towards immediacy or those which tend more towards distance. To this aim, we collected a corpus with different kinds of television weather reports,e.g., as part of the news broadcast, including dialogue. The evaluation markers themselves will be studied according to the explained methodology, by correlating them to (i) metadata about the context and (ii) linguistic phenomena characterising immediacy-language: repetition, deixis (personal, spatial, and temporal), a freer choice of tense and right- /left-dislocation. Results: Our results indicate that evaluation markers are more dominantly present in those weather reports inclining towards immediacy-language. Based on the methodology established above, we have gained more insight into the working of evaluation markers in the domain-sensitive text genre of (television) weather reports. For future research, it will be interesting to determine whether said evaluation markers are also typical for immediacy-language-oriented in other domain-sensitive discourses.

Keywords: corpus-based linguistics, evaluation markers, language of immediacy and distance, weather reports

Procedia PDF Downloads 196
11241 A Corpus Output Error Analysis of Chinese L2 Learners From America, Myanmar, and Singapore

Authors: Qiao-Yu Warren Cai

Abstract:

Due to the rise of big data, building corpora and using them to analyze ChineseL2 learners’ language output has become a trend. Various empirical research has been conducted using Chinese corpora built by different academic institutes. However, most of the research analyzed the data in the Chinese corpora usingcorpus-based qualitative content analysis with descriptive statistics. Descriptive statistics can be used to make summations about the subjects or samples that research has actually measured to describe the numerical data, but the collected data cannot be generalized to the population. Comte, a Frenchpositivist, has argued since the 19th century that human beings’ knowledge, whether the discipline is humanistic and social science or natural science, should be verified in a scientific way to construct a universal theory to explain the truth and human beings behaviors. Inferential statistics, able to make judgments of the probability of a difference observed between groups being dependable or caused by chance (Free Geography Notes, 2015)and to infer from the subjects or examples what the population might think or behave, is just the right method to support Comte’s argument in the field of TCSOL. Also, inferential statistics is a core of quantitative research, but little research has been conducted by combing corpora with inferential statistics. Little research analyzes the differences in Chinese L2 learners’ language corpus output errors by using theOne-way ANOVA so that the findings of previous research are limited to inferring the population's Chinese errors according to the given samples’ Chinese corpora. To fill this knowledge gap in the professional development of Taiwanese TCSOL, the present study aims to utilize the One-way ANOVA to analyze corpus output errors of Chinese L2 learners from America, Myanmar, and Singapore. The results show that no significant difference exists in ‘shì (是) sentence’ and word order errors, but compared with Americans and Singaporeans, it is significantly easier for Myanmar to have ‘sentence blends.’ Based on the above results, the present study provides an instructional approach and contributes to further exploration of how Chinese L2 learners can have (and use) learning strategies to lower errors.

Keywords: Chinese corpus, error analysis, one-way analysis of variance, Chinese L2 learners, Americans, myanmar, Singaporeans

Procedia PDF Downloads 86
11240 Competition between Verb-Based Implicit Causality and Theme Structure's Influence on Anaphora Bias in Mandarin Chinese Sentences: Evidence from Corpus

Authors: Linnan Zhang

Abstract:

Linguists, as well as psychologists, have shown great interests in implicit causality in reference processing. However, most frequently-used approaches to this issue are psychological experiments (such as eye tracking or self-paced reading, etc.). This research is a corpus-based one and is assisted with statistical tool – software R. The main focus of the present study is about the competition between verb-based implicit causality and theme structure’s influence on anaphora bias in Mandarin Chinese sentences. In Accessibility Theory, it is believed that salience, which is also known as accessibility, and relevance are two important factors in reference processing. Theme structure, which is a special syntactic structure in Chinese, determines the salience of an antecedent on the syntactic level while verb-based implicit causality is a key factor to the relevance between antecedent and anaphora. Therefore, it is a study about anaphora, combining psychology with linguistics. With analysis of the sentences from corpus as well as the statistical analysis of Multinomial Logistic Regression, major findings of the present study are as follows: 1. When the sentence is stated in a ‘cause-effect’ structure, the theme structure will always be the antecedent no matter forward biased verbs or backward biased verbs co-occur; in non-theme structure, the anaphora bias will tend to be the opposite of the verb bias; 2. When the sentence is stated in a ‘effect-cause’ structure, theme structure will not always be the antecedent and the influence of verb-based implicit causality will outweigh that of theme structure; moreover, the anaphora bias will be the same with the bias of verbs. All the results indicate that implicit causality functions conditionally and the noun in theme structure will not be the high-salience antecedent under any circumstances.

Keywords: accessibility theory, anaphora, theme strcture, verb-based implicit causality

Procedia PDF Downloads 182
11239 An Interdisciplinary Approach to Investigating Style: A Case Study of a Chinese Translation of Gilbert’s (2006) Eat Pray Love

Authors: Elaine Y. L. Ng

Abstract:

Elizabeth Gilbert’s (2006) biography Eat, Pray, Love describes her travels to Italy, India, and Indonesia after a painful divorce. The author’s experiences with love, loss, search for happiness, and meaning have resonated with a huge readership. As regards the translation of Gilbert’s (2006) Eat, Pray, Love into Chinese, it was first translated by a Taiwanese translator He Pei-Hua and published in Taiwan in 2007 by Make Boluo Wenhua Chubanshe with the fairly catching title “Enjoy! Traveling Alone.” The same translation was translocated to China, republished in simplified Chinese characters by Shanxi Shifan Daxue Chubanshe in 2008 and renamed in China, entitled “To Be a Girl for the Whole Life.” Later on, the same translation in simplified Chinese characters was reprinted by Hunan Wenyi Chubanshe in 2013. This study employs Munday’s (2002) systemic model for descriptive translation studies to investigate the translation of Gilbert’s (2006) Eat, Pray, Love into Chinese by the Taiwanese translator Hu Pei-Hua. It employs an interdisciplinary approach, combining systemic functional linguistics and corpus stylistics with sociohistorical research within a descriptive framework to study the translator’s discursive presence in the text. The research consists of three phases. The first phase is to locate the target text within its socio-cultural context. The target-text context concerning the para-texts, readers’ responses, and the publishers’ orientation will be explored. The second phase is to compare the source text and the target text for the categorization of translation shifts by using the methodological tools of systemic functional linguistics and corpus stylistics. The investigation concerns the rendering of mental clauses and speech and thought presentation. The final phase is an explanation of the causes of translation shifts. The linguistic findings are related to the extra-textual information collected in an effort to ascertain the motivations behind the translator’s choices. There exist sets of possible factors that may have contributed to shaping the textual features of the given translation within a specific socio-cultural context. The study finds that the translator generally reproduces the mental clauses and speech and thought presentation closely according to the original. Nevertheless, the language of the translation has been widely criticized to be unidiomatic and stiff, losing the elegance of the original. In addition, the several Chinese translations of the given text produced by one Taiwanese and two Chinese publishers are basically the same. They are repackaged slightly differently, mainly with the change of the book cover and its captions for each version. By relating the textual findings to the extra-textual data of the study, it is argued that the popularity of the Chinese translation of Gilbert’s (2006) Eat, Pray, Love may not be attributed to the quality of the translation. Instead, it may have to do with the way the work is promoted strategically by the social media manipulated by the four e-bookstores promoting and selling the book online in China.

Keywords: chinese translation of eat pray love, corpus stylistics, motivations for translation shifts, systemic approach to translation studies

Procedia PDF Downloads 158
11238 Metaphor Institutionalization as Phase Transition: Case Studies of Chinese Metaphors

Authors: Xuri Tang, Ting Pan

Abstract:

Metaphor institutionalization refers to the propagation of a metaphor that leads to its acceptance in speech community as a norm of the language. Such knowledge is important to both theoretical studies of metaphor and practical disciplines such as lexicography and language generation. This paper reports an empirical study of metaphor institutionalization of 14 Chinese metaphors. It first explores the pattern of metaphor institutionalization by fitting the logistic function (or S-shaped curve) to time series data of conventionality of the metaphors that are automatically obtained from a large-scale diachronic Chinese corpus. Then it reports a questionnaire-based survey on the propagation scale of each metaphor, which is measured by the average number of subjects that can easily understand the metaphorical expressions. The study provides two pieces of evidence supporting the hypothesis that metaphor institutionalization is a phrase transition: (1) the pattern of metaphor institutionalization is an S-shaped curve and (2) institutionalized metaphors generally do not propagate to the whole community but remain in equilibrium state. This conclusion helps distinguish metaphor institutionalization from topicalization and other types of semantic change.

Keywords: metaphor institutionalization, phase transition, propagation scale, s-shaped curve

Procedia PDF Downloads 155
11237 Named Entity Recognition System for Tigrinya Language

Authors: Sham Kidane, Fitsum Gaim, Ibrahim Abdella, Sirak Asmerom, Yoel Ghebrihiwot, Simon Mulugeta, Natnael Ambassager

Abstract:

The lack of annotated datasets is a bottleneck to the progress of NLP in low-resourced languages. The work presented here consists of large-scale annotated datasets and models for the named entity recognition (NER) system for the Tigrinya language. Our manually constructed corpus comprises over 340K words tagged for NER, with over 118K of the tokens also having parts-of-speech (POS) tags, annotated with 12 distinct classes of entities, represented using several types of tagging schemes. We conducted extensive experiments covering convolutional neural networks and transformer models; the highest performance achieved is 88.8% weighted F1-score. These results are especially noteworthy given the unique challenges posed by Tigrinya’s distinct grammatical structure and complex word morphologies. The system can be an essential building block for the advancement of NLP systems in Tigrinya and other related low-resourced languages and serve as a bridge for cross-referencing against higher-resourced languages.

Keywords: Tigrinya NER corpus, TiBERT, TiRoBERTa, BiLSTM-CRF

Procedia PDF Downloads 91
11236 Converse to the Sherman Inequality with Applications in Information Theory

Authors: Ana Barbir, S. Ivelic Bradanovic, D. Pecaric, J. Pecaric

Abstract:

We proved a converse to Sherman's inequality. Using the concept of f-divergence we obtained some inequalities for the well-known entropies, such as Shannon entropies that have many applications in many applied sciences, for example, in information theory, biology and economics Zipf-Mandelbrot law gave improvement in account for the low-rankwords in corpus. Applications of Zipf-Mandelbrot law can be found in linguistics, information sciences and also mostly applicable in ecological eld studies. We also introduced an entropy by applying the Zipf-Mandelbrot law and derived some related inequalities.

Keywords: f-divergence, majorization inequality, Sherman inequality, Zipf-Mandelbrot entropy

Procedia PDF Downloads 153
11235 'Wandering Uterus': An Analogy of Perception of Women in Hippocratic Corpus and Post-Modern Times

Authors: Ankita Sharma

Abstract:

The study proposes to review the perception of women in the Classical Age (500-336 BC) when Greek Philosophy was in bloom. It was observed that women had very few rights and were still under the control of men. One of the possible reasons for this exclusion was woman’s biology that had a huge influence on her being seen as inferior to men. The text ‘Hippocratic Corpus’ focuses on the biological construct of the female body in classical Greek science that perpetuated the idea of women as second-class citizens and were considered inherently weaker than men. The research highlights the significance of the text that was used to encourage women of that time to get married and produce children and how till today the perception remains the same. The Greek belief of need for confinement and control of 'wandering uterus' has led to superior understanding of men. The pivotal emphasis of this research is to women and their bodies that are depicted in a misogynistic way which paved the way for Hippocratic writers to influence the society’s attitude towards women in their writings. It is intended to draw attention to the prevailing cultural assumptions and preconceived notions about female anatomy that had a pervasive influence in the following centuries with its roots being in ancient science.

Keywords: classical Greek theory, women, wandering womb, modern ideology

Procedia PDF Downloads 175
11234 How Is a Machine-Translated Literary Text Organized in Coherence? An Analysis Based upon Theme-Rheme Structure

Authors: Jiang Niu, Yue Jiang

Abstract:

With the ultimate goal to automatically generate translated texts with high quality, machine translation has made tremendous improvements. However, its translations of literary works are still plagued with problems in coherence, esp. the translation between distant language pairs. One of the causes of the problems is probably the lack of linguistic knowledge to be incorporated into the training of machine translation systems. In order to enable readers to better understand the problems of machine translation in coherence, to seek out the potential knowledge to be incorporated, and thus to improve the quality of machine translation products, this study applies Theme-Rheme structure to examine how a machine-translated literary text is organized and developed in terms of coherence. Theme-Rheme structure in Systemic Functional Linguistics is a useful tool for analysis of textual coherence. Theme is the departure point of a clause and Rheme is the rest of the clause. In a text, as Themes and Rhemes may be connected with each other in meaning, they form thematic and rhematic progressions throughout the text. Based on this structure, we can look into how a text is organized and developed in terms of coherence. Methodologically, we chose Chinese and English as the language pair to be studied. Specifically, we built a comparable corpus with two modes of English translations, viz. machine translation (MT) and human translation (HT) of one Chinese literary source text. The translated texts were annotated with Themes, Rhemes and their progressions throughout the texts. The annotated texts were analyzed from two respects, the different types of Themes functioning differently in achieving coherence, and the different types of thematic and rhematic progressions functioning differently in constructing texts. By analyzing and contrasting the two modes of translations, it is found that compared with the HT, 1) the MT features “pseudo-coherence”, with lots of ill-connected fragments of information using “and”; 2) the MT system produces a static and less interconnected text that reads like a list; these two points, in turn, lead to the less coherent organization and development of the MT than that of the HT; 3) novel to traditional and previous studies, Rhemes do contribute to textual connection and coherence though less than Themes do and thus are worthy of notice in further studies. Hence, the findings suggest that Theme-Rheme structure be applied to measuring and assessing the coherence of machine translation, to being incorporated into the training of the machine translation system, and Rheme be taken into account when studying the textual coherence of both MT and HT.

Keywords: coherence, corpus-based, literary translation, machine translation, Theme-Rheme structure

Procedia PDF Downloads 186
11233 A Corpus-Based Contrastive Analysis of Directive Speech Act Verbs in English and Chinese Legal Texts

Authors: Wujian Han

Abstract:

In the process of human interaction and communication, speech act verbs are considered to be the most active component and the main means for information transmission, and are also taken as an indication of the structure of linguistic behavior. The theoretical value and practical significance of such everyday built-in metalanguage have long been recognized. This paper, which is part of a bigger study, is aimed to provide useful insights for a more precise and systematic application to speech act verbs translation between English and Chinese, especially with regard to the degree to which generic integrity is maintained in the practice of translation of legal documents. In this study, the corpus, i.e. Chinese legal texts and their English translations, English legal texts, ordinary Chinese texts, and ordinary English texts, serve as a testing ground for examining contrastively the usage of English and Chinese directive speech act verbs in legal genre. The scope of this paper is relatively wide and essentially covers all directive speech act verbs which are used in ordinary English and Chinese, such as order, command, request, prohibit, threat, advice, warn and permit. The researcher, by combining the corpus methodology with a contrastive perspective, explored a range of characteristics of English and Chinese directive speech act verbs including their semantic, syntactic and pragmatic features, and then contrasted them in a structured way. It has been found that there are similarities between English and Chinese directive speech act verbs in legal genre, such as similar semantic components between English speech act verbs and their translation equivalents in Chinese, formal and accurate usage of English and Chinese directive speech act verbs in legal contexts. But notable differences have been identified in areas of difference between their usage in the original Chinese and English legal texts such as valency patterns and frequency of occurrences. For example, the subjects of some directive speech act verbs are very frequently omitted in Chinese legal texts, but this is not the case in English legal texts. One of the practicable methods to achieve adequacy and conciseness in speech act verb translation from Chinese into English in legal genre is to repeat the subjects or the message with discrepancy, and vice versa. In addition, translation effects such as overuse and underuse of certain directive speech act verbs are also found in the translated English texts compared to the original English texts. Legal texts constitute a particularly valuable material for speech act verb study. Building up such a contrastive picture of the Chinese and English speech act verbs in legal language would yield results of value and interest to legal translators and students of language for legal purposes and have practical application to legal translation between English and Chinese.

Keywords: contrastive analysis, corpus-based, directive speech act verbs, legal texts, translation between English and Chinese

Procedia PDF Downloads 472
11232 Comparison of Verb Complementation Patterns in Selected Pakistani and British English Newspaper Social Columns: A Corpus-Based Study

Authors: Zafar Iqbal Bhatti

Abstract:

The present research aims to examine and evaluate the frequencies and practices of verb complementation patterns in English newspaper social columns published in Pakistan and Britain. The research will demonstrate that Pakistani English is a non-native variety of English having its own unique usual and logical characteristics, affected by way of the native languages and the culture, upon syntactic levels, making the variety users aware that any differences from British or American English that are systematic and regular, or another English language, are not even if they are unique, erroneous forms and typical characteristics of several kinds. The objectives are to examine the verb complementation patterns that British and Pakistani social columnists use in relation to their syntactic categories. Secondly, to compare the verb complementation patterns used in Pakistani and British English newspapers social columns. This study will figure out various verb complementation patterns in Pakistani and British English newspaper social columns and their occurrence and distribution. The word classes express different functions of words, such as action, event, or state of being. This research aims to evaluate whether there are any appreciable differences in the verb complementation patterns used in Pakistani and British English newspaper social columns. The results will show the number of varieties of verb complementation patterns in selected English newspapers social columns. This study will fill the gap of previous studies conducted in this field as they only explore a little about the differences between Pakistani and British English newspapers. It will also figure out a variety of languages used in Pakistani and British English journals, as well as regional and cultural values and variations. The researcher will use AntConc software in this study to extract the data for analysis. The researcher will use a concordance tool to identify verb complementation patterns in selected data. Then the researcher will manually categorize them because the same type of adverb can sometimes be used for various purposes. From 1st June 2022 to 30th Sep. 2022, a four-month written corpus of the social columns of PE and BE newspapers will be collected and analyzed. For the analysis of the research questions, 50 social columns will be selected from Pakistani newspapers and 50 from British newspapers. The researcher will collect a representative sample of data from Pakistani and British English newspaper social columns. The researcher will manually analyze the complementation patterns of each verb in each sentence, and then the researcher will determine how frequently each pattern occurs. The researcher will use syntactic characteristics of the verb complementation elements according to the description by Downing and Locke (2006). The researcher will examine all of the verb complementation patterns in the data, and the frequency and distribution of each verb complementation pattern will be evaluated using the software. The researcher will explore every possible verb complementation pattern in Pakistani and British English before calculating the occurrence and abundance of each verb pattern. The researcher will explore every possible verb complementation pattern in Pakistani English before calculating the frequency and distribution of each pattern.

Keywords: verb complementation, syntactic categories, newspaper social columns, corpus

Procedia PDF Downloads 29
11231 Agenesis of the Corpus Callosum: The Role of Neuropsychological Assessment with Implications to Psychosocial Rehabilitation

Authors: Ron Dick, P. S. D. V. Prasadarao, Glenn Coltman

Abstract:

Agenesis of the corpus callosum (ACC) is a failure to develop corpus callosum - the large bundle of fibers of the brain that connects the two cerebral hemispheres. It can occur as a partial or complete absence of the corpus callosum. In the general population, its estimated prevalence rate is 1 in 4000 and a wide range of genetic, infectious, vascular, and toxic causes have been attributed to this heterogeneous condition. The diagnosis of ACC is often achieved by neuroimaging procedures. Though persons with ACC can perform normally on intelligence tests they generally present with a range of neuropsychological and social deficits. The deficit profile is characterized by poor coordination of motor movements, slow reaction time, processing speed and, poor memory. Socially, they present with deficits in communication, language processing, the theory of mind, and interpersonal relationships. The present paper illustrates the role of neuropsychological assessment with implications to psychosocial management in a case of agenesis of the corpus callosum. Method: A 27-year old left handed Caucasian male with a history of ACC was self-referred for a neuropsychological assessment to assist him in his employment options. Parents noted significant difficulties with coordination and balance at an early age of 2-3 years and he was diagnosed with dyspraxia at the age of 14 years. History also indicated visual impairment, hypotonia, poor muscle coordination, and delayed development of motor milestones. MRI scan indicated agenesis of the corpus callosum with ventricular morphology, widely spaced parallel lateral ventricles and mild dilatation of the posterior horns; it also showed colpocephaly—a disproportionate enlargement of the occipital horns of the lateral ventricles which might be affecting his motor abilities and visual defects. The MRI scan ruled out other structural abnormalities or neonatal brain injury. At the time of assessment, the subject presented with such problems as poor coordination, slowed processing speed, poor organizational skills and time management, and difficulty with social cues and facial expressions. A comprehensive neuropsychological assessment was planned and conducted to assist in identifying the current neuropsychological profile to facilitate the formulation of a psychosocial and occupational rehabilitation programme. Results: General intellectual functioning was within the average range and his performance on memory-related tasks was adequate. Significant visuospatial and visuoconstructional deficits were evident across tests; constructional difficulties were seen in tasks such as copying a complex figure, building a tower and manipulating blocks. Poor visual scanning ability and visual motor speed were evident. Socially, the subject reported heightened social anxiety, difficulty in responding to cues in the social environment, and difficulty in developing intimate relationships. Conclusion: Persons with ACC are known to present with specific cognitive deficits and problems in social situations. Findings from the current neuropsychological assessment indicated significant visuospatial difficulties, poor visual scanning and problems in social interactions. His general intellectual functioning was within the average range. Based on the findings from the comprehensive neuropsychological assessment, a structured psychosocial rehabilitation programme was developed and recommended.

Keywords: agenesis, callosum, corpus, neuropsychology, psychosocial, rehabilitation

Procedia PDF Downloads 265
11230 Overview of Research Contexts about XR Technologies in Architectural Practice

Authors: Adeline Stals

Abstract:

The transformation of architectural design practices has been underway for almost forty years due to the development and democratization of computer technology. New and more efficient tools are constantly being proposed to architects, amplifying a technological wave that sometimes stimulates them, sometimes overwhelms them, depending essentially on their digital culture and the context (socio-economic, structural, organizational) in which they work on a daily basis. Our focus is on VR, AR, and MR technologies dedicated to architecture. The commercialization of affordable headsets like the Oculus Rift, the HTC Vive or more low-tech like the Google CardBoard, makes it more accessible to benefit from these technologies. In that regard, researchers report the growing interest of these tools for architects, given the new perspectives they open up in terms of workflow, representation, collaboration, and client’s involvement. However, studies rarely mention the consequences of the sample studied on results. Our research provides an overview of VR, AR, and MR researches among a corpus of papers selected from conferences and journals. A closer look at the sample of these research projects highlights the necessity to take into consideration the context of studies in order to develop tools truly dedicated to the real practices of specific architect profiles. This literature review formalizes milestones for future challenges to address. The methodology applied is based on a systematic review of two sources of publications. The first one is the Cumincad database, which regroups publications from conferences exclusively about digital in architecture. Additionally, the second part of the corpus is based on journal publications. Journals have been selected considering their ranking on Scimago. Among the journals in the predefined category ‘architecture’ and in Quartile 1 for 2018 (last update when consulted), we have retained the ones related to the architectural design process: Design Studies, CoDesign, Architectural Science Review, Frontiers of Architectural Research and Archnet-IJAR. Beside those journals, IJAC, not classified in the ‘architecture’ category, is selected by the author for its adequacy with architecture and computing. For all requests, the search terms were ‘virtual reality’, ‘augmented reality’, and ‘mixed reality’ in title and/or keywords for papers published between 2015 and 2019 (included). This frame time is defined considering the fast evolution of these technologies in the past few years. Accordingly, the systematic review covers 202 publications. The literature review on studies about XR technologies establishes the state of the art of the current situation. It highlights that studies are mostly based on experimental contexts with controlled conditions (pedagogical, e.g.) or on practices established in large architectural offices of international renown. However, few studies focus on the strategies and practices developed by offices of smaller size, which represent the largest part of the market. Indeed, a European survey studying the architectural profession in Europe in 2018 reveals that 99% of offices are composed of less than ten people, and 71% of only one person. The study also showed that the number of medium-sized offices is continuously decreasing in favour of smaller structures. In doing so, a frontier seems to remain between the worlds of research and practice, especially for the majority of small architectural practices having a modest use of technology. This paper constitutes a reference for the next step of the research and for further worldwide researches by facilitating their contextualization.

Keywords: architectural design, literature review, SME, XR technologies

Procedia PDF Downloads 95
11229 The Effects of Culture and Language on Social Impression Formation from Voice Pleasantness: A Study with French and Iranian People

Authors: L. Bruckert, A. Mansourzadeh

Abstract:

The voice has a major influence on interpersonal communication in everyday life via the perception of pleasantness. The evolutionary perspective postulates that the mechanisms underlying the pleasantness judgments are universal adaptations that have evolved in the service of choosing a mate (through the process of sexual selection). From this point of view, the favorite voices would be those with more marked sexually dimorphic characteristics; for example, in men with lower voice pitch, pitch is the main criterion. On the other hand, one can postulate that the mechanisms involved are gradually established since childhood through exposure to the environment, and thus the prosodic elements could take precedence in everyday life communication as it conveys information about the speaker's attitude (willingness to communicate, interest toward the interlocutors). Our study focuses on voice pleasantness and its relationship with social impression formation, exploring both the spectral aspects (pitch, timbre) and the prosodic ones. In our study, we recorded the voices through two vocal corpus (five vowels and a reading text) of 25 French males speaking French and 25 Iranian males speaking Farsi. French listeners (40 male/40 female) listened to the French voices and made a judgment either on the voice's pleasantness or on the speaker (judgment about his intelligence, honesty, sociability). The regression analyses from our acoustic measures showed that the prosodic elements (for example, the intonation and the speech rate) are the most important criteria concerning pleasantness, whatever the corpus or the listener's gender. Moreover, the correlation analyses showed that the speakers with the voices judged as the most pleasant are considered the most intelligent, sociable, and honest. The voices in Farsi have been judged by 80 other French listeners (40 male/40 female), and we found the same effect of intonation concerning the judgment of pleasantness with the corpus «vowel» whereas with the corpus «text» the pitch is more important than the prosody. It may suggest that voice perception contains some elements invariant across culture/language, whereas others are influenced by the cultural/linguistic background of the listener. Shortly in the future, Iranian people will be asked to listen either to the French voices for half of them or to the Farsi voices for the other half and produce the same judgments as the French listeners. This experimental design could potentially make it possible to distinguish what is linked to culture and what is linked to language in the case of differences in voice perception.

Keywords: cross-cultural psychology, impression formation, pleasantness, voice perception

Procedia PDF Downloads 53