Search results for: corpus analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 27551

Search results for: corpus analysis

27401 Investigating Translations of Websites of Pakistani Public Offices

Authors: Sufia Maroof

Abstract:

This empirical study investigated the web-translations of five Pakistani public offices (FPSC, FIA, HEC, USB, and Ministry of Finance) offering Urdu tab as an option to access information on their official websites. Triangulation of quantitative and qualitative research design informed the researcher of the semantic, lexical and syntactic caveats in these translations. The study hypothesized that majority of the Pakistani population is oblivious of the Supreme Court’s amendments in language policy concerning national and official language; hence, Urdu web-translations of the public departments have not been accessed effectively. Firstly, the researcher conducted an online survey, comprising of two sections, close ended and short answer based questions. Secondly, the researcher compiled corpus of the five selected websites in a tabular form to compare the data. Thirdly, the administrators of the departments had been contacted regarding the methods of translation and the expertise of the personnel involved. The corpus was assessed for TQA after examining the lexical, semantic, syntactical and technical alignment inaccuracies and imperfections. The study suggests the public offices to invest in their Urdu webs by either hiring expert translators or engaging expertise of a translation agency for this project to offer quality translation to public.

Keywords: machine translations, public offices, Urdu translations, websites

Procedia PDF Downloads 121
27400 The Philippines’ War on Drugs: a Pragmatic Analysis on Duterte's Commemorative Speeches

Authors: Ericson O. Alieto, Aprillete C. Devanadera

Abstract:

The main objective of the study is to determine the dominant speech acts in five commemorative speeches of President Duterte. This study employed Speech Act Theory and Discourse analysis to determine how the speech acts features connote the pragmatic meaning of Duterte’s speeches. Identifying the speech acts is significant in elucidating the underlying message or the pragmatic meaning of the speeches. From the 713 sentences or utterances from the speeches, assertive with 208 occurrences from the corpus or 29% is the dominant speech acts. It was followed by expressive with 177 or 25% occurrences, directive accounts for 152 or 15% occurrences. While commisive accounts for 104 or 15% occurrences and declarative got the lowest percentage of occurrences with 72 or 10% only. These sentences when uttered by Duterte carry a certain power of language to move or influence people. Thus, the present study shows the fundamental message perceived by the listeners. Moreover, the frequent use of assertive and expressive not only explains the pragmatic message of the speeches but also reflects the personality of President Duterte.

Keywords: commemorative speech, discourse analysis, duterte, pragmatics

Procedia PDF Downloads 280
27399 VIAN-DH: Computational Multimodal Conversation Analysis Software and Infrastructure

Authors: Teodora Vukovic, Christoph Hottiger, Noah Bubenhofer

Abstract:

The development of VIAN-DH aims at bridging two linguistic approaches: conversation analysis/interactional linguistics (IL), so far a dominantly qualitative field, and computational/corpus linguistics and its quantitative and automated methods. Contemporary IL investigates the systematic organization of conversations and interactions composed of speech, gaze, gestures, and body positioning, among others. These highly integrated multimodal behaviour is analysed based on video data aimed at uncovering so called “multimodal gestalts”, patterns of linguistic and embodied conduct that reoccur in specific sequential positions employed for specific purposes. Multimodal analyses (and other disciplines using videos) are so far dependent on time and resource intensive processes of manual transcription of each component from video materials. Automating these tasks requires advanced programming skills, which is often not in the scope of IL. Moreover, the use of different tools makes the integration and analysis of different formats challenging. Consequently, IL research often deals with relatively small samples of annotated data which are suitable for qualitative analysis but not enough for making generalized empirical claims derived quantitatively. VIAN-DH aims to create a workspace where many annotation layers required for the multimodal analysis of videos can be created, processed, and correlated in one platform. VIAN-DH will provide a graphical interface that operates state-of-the-art tools for automating parts of the data processing. The integration of tools that already exist in computational linguistics and computer vision, facilitates data processing for researchers lacking programming skills, speeds up the overall research process, and enables the processing of large amounts of data. The main features to be introduced are automatic speech recognition for the transcription of language, automatic image recognition for extraction of gestures and other visual cues, as well as grammatical annotation for adding morphological and syntactic information to the verbal content. In the ongoing instance of VIAN-DH, we focus on gesture extraction (pointing gestures, in particular), making use of existing models created for sign language and adapting them for this specific purpose. In order to view and search the data, VIAN-DH will provide a unified format and enable the import of the main existing formats of annotated video data and the export to other formats used in the field, while integrating different data source formats in a way that they can be combined in research. VIAN-DH will adapt querying methods from corpus linguistics to enable parallel search of many annotation levels, combining token-level and chronological search for various types of data. VIAN-DH strives to bring crucial and potentially revolutionary innovation to the field of IL, (that can also extend to other fields using video materials). It will allow the processing of large amounts of data automatically and, the implementation of quantitative analyses, combining it with the qualitative approach. It will facilitate the investigation of correlations between linguistic patterns (lexical or grammatical) with conversational aspects (turn-taking or gestures). Users will be able to automatically transcribe and annotate visual, spoken and grammatical information from videos, and to correlate those different levels and perform queries and analyses.

Keywords: multimodal analysis, corpus linguistics, computational linguistics, image recognition, speech recognition

Procedia PDF Downloads 102
27398 A Corpus-based Study of Adjuncts in Colombian English as a Second Language (ESL) Argumentative Essays

Authors: E. Velasco

Abstract:

Meeting high standards of writing in a Second Language (L2) is extremely important for many students who wish to undertake studies at universities in both English and non-English speaking countries. University lecturers in English speaking countries continue to express dissatisfaction with the apparent poor quality of essay writing skills displayed by English as a Second Language (ESL) students, whose essays are often criticised for their lack of cohesion and coherence. These critiques have extended to contexts such as Colombia, where many ESL students are criticised for their inability to write high-quality academic texts in L2-English, particularly at the tertiary level. If Colombian ESL students are expected to meet high standards of writing when studying locally and abroad, it makes sense to carry out specific research that can perhaps lead to recommendations to support their quest for improving argumentative strategies. Employing Corpus Linguistics methods within a Learner Corpus Research framework, and a combination of Log-Likelihood and Bayes Factor measures, this paper investigated argumentative essays written by Colombian ESL students. The study specifically aimed to analyse conjunctive adjuncts in argumentative essays to find out how Colombian ESL students connect their ideas in discourse. Results suggest that a) Colombian ESL learners need explicit instruction on specific areas of conjunctive adjuncts to counteract overuse, underuse and misuse; b) underuse of endophoric and evidential adjuncts highlights gaps between IELTS-like essays and good quality tertiary-level essays and published papers, and these gaps are linked to prior knowledge brought into writing task, rhetorical functions in writing, and research processes before writing takes place; c) both Colombian ESL learners and L1-English writers (in a reference corpus) overuse some adjuncts and underuse endophoric and evidential adjuncts, when compared to skilled L1-English and L2-English writers, so differences in frequencies of adjuncts has little to do with the writers’ L1, and differences are rather linked to types of essays writers produce (e.g. ESL vs. university essays). Ender Velasco: The pedagogical recommendations deriving from the study are that: a) Colombian ESL learners need to be shown that overuse is not the only way of giving cohesion to argumentative essays and there are other alternatives to cohesion (e.g., implicit adjuncts, lexical chains and collocations); b) syllabi and classroom input need to raise awareness of gaps in writing skills between IELTS-like and tertiary-level argumentative essays, and of how endophoric and evidential adjuncts are used to refer to anaphoric and cataphoric sections of essays, and to other people’s work or ideas; c) syllabi and classroom input need to include essay-writing tasks based on previous research/reading which learners need to incorporate into their arguments, and tasks that raise awareness of referencing systems (e.g., APA); d) classroom input needs to include explicit instruction on use of punctuation, functions and/or syntax with specific conjunctive adjuncts such as for example, for that reason, although, despite and nevertheless.

Keywords: argumentative essays, colombian english as a second language (esl) learners, conjunctive adjuncts, corpus linguistics

Procedia PDF Downloads 75
27397 On the Semantics and Pragmatics of 'Be Able To': Modality and Actualisation

Authors: Benoît Leclercq, Ilse Depraetere

Abstract:

The goal of this presentation is to shed new light on the semantics and pragmatics of be able to. It presents the results of a corpus analysis based on data from the BNC (British National Corpus), and discusses these results in light of a specific stance on the semantics-pragmatics interface taking into account recent developments. Be able to is often discussed in relation to can and could, all of which can be used to express ability. Such an onomasiological approach often results in the identification of usage constraints for each expression. In the case of be able to, it is the formal properties of the modal expression (unlike can and could, be able to has non-finite forms) that are in the foreground, and the modal expression is described as the verb that conveys future ability. Be able to is also argued to expressed actualised ability in the past (I was able/could to open the door). This presentation aims to provide a more accurate pragmatic-semantic profile of be able to, based on extensive data analysis and one that is embedded in a very explicit view on the semantics-pragmatics interface. A random sample of 3000 examples (1000 for each modal verb) extracted from the BNC was analysed to account for the following issues. First, the challenge is to identify the exact semantic range of be able to. The results show that, contrary to general assumption, be able to does not only express ability but it shares most of the root meanings usually associated with the possibility modals can and could. The data reveal that what is called opportunity is, in fact, the most frequent meaning of be able to. Second, attention will be given to the notion of actualisation. It is commonly argued that be able to is the preferred form when the residue actualises: (1) The only reason he was able to do that was because of the restriction (BNC, spoken) (2) It is only through my imaginative shuffling of the aces that we are able to stay ahead of the pack. (BNC, written) Although this notion has been studied in detail within formal semantic approaches, empirical data is crucially lacking and it is unclear whether actualisation constitutes a conventional (and distinguishing) property of be able to. The empirical analysis provides solid evidence that actualisation is indeed a conventional feature of the modal. Furthermore, the dataset reveals that be able to expresses actualised 'opportunities' and not actualised 'abilities'. In the final part of this paper, attention will be given to the theoretical implications of the empirical findings, and in particular to the following paradox: how can the same expression encode both modal meaning (non-factual) and actualisation (factual)? It will be argued that this largely depends on one's conception of the semantics-pragmatics interface, and that this need not be an issue when actualisation (unlike modality) is analysed as a generalised conversational implicature and thus is considered part of the conventional pragmatic layer of be able to.

Keywords: Actualisation, Modality, Pragmatics, Semantics

Procedia PDF Downloads 123
27396 Neural Machine Translation for Low-Resource African Languages: Benchmarking State-of-the-Art Transformer for Wolof

Authors: Cheikh Bamba Dione, Alla Lo, Elhadji Mamadou Nguer, Siley O. Ba

Abstract:

In this paper, we propose two neural machine translation (NMT) systems (French-to-Wolof and Wolof-to-French) based on sequence-to-sequence with attention and transformer architectures. We trained our models on a parallel French-Wolof corpus of about 83k sentence pairs. Because of the low-resource setting, we experimented with advanced methods for handling data sparsity, including subword segmentation, back translation, and the copied corpus method. We evaluate the models using the BLEU score and find that transformer outperforms the classic seq2seq model in all settings, in addition to being less sensitive to noise. In general, the best scores are achieved when training the models on word-level-based units. For subword-level models, using back translation proves to be slightly beneficial in low-resource (WO) to high-resource (FR) language translation for the transformer (but not for the seq2seq) models. A slight improvement can also be observed when injecting copied monolingual text in the target language. Moreover, combining the copied method data with back translation leads to a substantial improvement of the translation quality.

Keywords: backtranslation, low-resource language, neural machine translation, sequence-to-sequence, transformer, Wolof

Procedia PDF Downloads 141
27395 Finding Related Scientific Documents Using Formal Concept Analysis

Authors: Nadeem Akhtar, Hira Javed

Abstract:

An important aspect of research is literature survey. Availability of a large amount of literature across different domains triggers the need for optimized systems which provide relevant literature to researchers. We propose a search system based on keywords for text documents. This experimental approach provides a hierarchical structure to the document corpus. The documents are labelled with keywords using KEA (Keyword Extraction Algorithm) and are automatically organized in a lattice structure using Formal Concept Analysis (FCA). This groups the semantically related documents together. The hierarchical structure, based on keywords gives out only those documents which precisely contain them. This approach open doors for multi-domain research. The documents across multiple domains which are indexed by similar keywords are grouped together. A hierarchical relationship between keywords is obtained. To signify the effectiveness of the approach, we have carried out the experiment and evaluation on Semeval-2010 Dataset. Results depict that the presented method is considerably successful in indexing of scientific papers.

Keywords: formal concept analysis, keyword extraction algorithm, scientific documents, lattice

Procedia PDF Downloads 322
27394 A Corpus-Based Diachronic Study on Indefinite Pronominal Anaphora in English

Authors: Qiong Hu

Abstract:

From old English to modern English, the gender category has changed from grammatical gender system to natural gender system. The word classes that reflected gender has changed from pronouns, adjectives, and numerals in old English to only pronouns in modern English. In present-day English, the third person singular pronouns are the only paradigm that keeps an intact gender. 'He' and 'they' used as epicene pronouns are one of the two commonest phenomena of gender disagreement (the other being those against the natural gender). Considering the convenience of corpus concordance, epicene pronoun usage is selected in this study in which the anaphors are restricted to possessives (eg. his, their), and the antecedents are restricted to compound indefinite pronouns (eg. someone, somebody). Factors like writing form (eg. someone vs. some one), the semantics of the prefixes (eg. some- vs. any-), and suffixes (eg. -one vs. -body), as well as frequency, are taken into consideration. Statistics indicate that 'their' is increasingly used as the epicene pronoun compared with the decline of 'his' (when both writing forms are considered). This is influenced by social factors such as feminist movement, as well as the semantics and frequency of antecedents. Their (plural) used in anaphoric reference to various indefinite pronouns (singular in form) can also be treated as number variation in third person pronouns, and the trend that 'their' in place of his can also be treated as a change in number category. Among different candidates for the gender-neutral function, 'their' is proven to be the most promising one based on the diachronic data. This does not reject any new competitors in the future which still remains to be seen.

Keywords: language variation and change, epicene pronouns, gender, number

Procedia PDF Downloads 181
27393 Learning Vocabulary with SkELL: Developing a Methodology with University Students in Japan Using Action Research

Authors: Henry R. Troy

Abstract:

Corpora are becoming more prevalent in the language classroom, especially in the development of dictionaries and course materials. Nevertheless, corpora are still perceived by many educators as difficult to use directly in the classroom, a process which is also known as “data-driven learning” (DDL). Action research has been identified as a method by which DDL’s efficiency can be increased, but it is also an approach few studies on DDL have employed. Studies into the effectiveness of DDL in language education in Japan are also rare, and investigations focused more on student and teacher reactions rather than pre and post-test scores are rarer still. This study investigates the student and teacher reactions to the use of SkELL, a free online corpus designed to be user-friendly, for vocabulary learning at a university in Japan. Action research is utilized to refine the teaching methodology, with changes to the method based on student and teacher feedback received via surveys submitted after each of the four implementations of DDL. After some training, the students used tablets to study the target vocabulary autonomously in pairs and groups, with the teacher acting as facilitator. The results show that the students enjoyed using SkELL and felt it was effective for vocabulary learning, while the teaching methodology grew in efficiency throughout the course. These findings suggest that action research can be a successful method for increasing the efficacy of DDL in the language classroom, especially with teachers and students who are new to the practice.

Keywords: action research, corpus linguistics, data-driven learning, vocabulary learning

Procedia PDF Downloads 231
27392 Variation in Italian Specialized Economic Texts

Authors: Abdelmagid Basyouny Sakr

Abstract:

Terminological variation is a reality and it is now recognized by terminologists. This paper investigates the terminological variation in the context of specialized economic texts in Italian. It aims to find whether certain patterns or tendencies can be derived from the analysis of these texts. Term variants pose two different kinds of difficulties. The first one is being able to recognize linguistic expressions that denote the same concept in running text. Another one lies in knowing which variant should be considered and for what purpose. This would help to differentiate between variants that could be candidates for inclusion in terminological resources and the ones which are synonyms or contextual variants. New insights about terminological variation in specialized texts could contribute to improve specialized dictionaries which will better account for the different ways in which a given thought is expressed.

Keywords: corpus linguistics, specialized communication, terms and concepts, terminological variation

Procedia PDF Downloads 150
27391 A Pragmatic Approach of Memes Created in Relation to the COVID-19 Pandemic

Authors: Alexandra-Monica Toma

Abstract:

Internet memes are an element of computer mediated communication and an important part of online culture that combines text and image in order to generate meaning. This term coined by Richard Dawkings refers to more than a mere way to briefly communicate ideas or emotions, thus naming a complex and an intensely perpetuated phenomenon in the virtual environment. This paper approaches memes as a cultural artefact and a virtual trope that mirrors societal concerns and issues, and analyses the pragmatics of their use. Memes have to be analysed in series, usually relating to some image macros, which is proof of the interplay between imitation and creativity in the memes’ writing process. We believe that their potential to become viral relates to three key elements: adaptation to context, reference to a successful meme series, and humour (jokes, irony, sarcasm), with various pragmatic functions. The study also uses the concept of multimodality and stresses how the memes’ text interacts with the image, discussing three types of relations: symmetry, amplification, and contradiction. Moreover, the paper proves that memes could be employed as speech acts with illocutionary force, when the interaction between text and image is enriched through the connection to a specific situation. The features mentioned above are analysed in a corpus that consists of memes related to the COVID-19 pandemic. This corpus shows them to be highly adaptable to context, which helps build the feeling of connection and belonging in an otherwise tremendously fragmented world. Some of them are created based on well-known image macros, and their humour results from an intricate dialogue between texts and contexts. Memes created in relation to the COVID-19 pandemic can be considered speech acts and are often used as such, as proven in the paper. Consequently, this paper tackles the key features of memes, makes a thorough analysis of the memes sociocultural, linguistic, and situational context, and emphasizes their intertextuality, with special accent on their illocutionary potential.

Keywords: context, memes, multimodality, speech acts

Procedia PDF Downloads 191
27390 Beyond Chol Soo Lee’s Death Row Release: Transinstitutionalization, Mortification, and the Limits of Legal Activism in 20th Century America

Authors: Minhae Shim Roth

Abstract:

The “Deinstitutionalization movement” refers to the spatial transition in the United States during the mid-20th century when the treatment of mental illness purportedly moved from long-term psychiatric institutions to community integrated care. Contrary to the accepted narrative of mental health care in the U.S., asylums did not close or empty. Some remained psychiatric hospitals, which came to be called forensic hospitals or state hospitals; others were converted into prisons or carceral institutions. During Deinstitutionalization, the asylum system became an appendage of the carceral system, with state hospitals becoming little more than holding centers for prisoners who were civilly committed, those incompetent to stand trial, offenders with mental health issues, and those found not guilty by reason of insanity. Psychiatric patients who became prisoners and prisoners who became patients became entangled in the phenomenon called transinstitutionalization. This paper investigates the relationship between psychiatric and criminal incarceration in 20th century California and focuses particularly on the case of Korean-American Chol Soo Lee, who fought detention in the psychiatric-prison system through the writ of habeas corpus. This study uses methodologies like critical theory, close reading, and archival research. This paper argues that during his psychiatric hospitalization at Napa State Hospital and incarceration in the California Department of Corrections, Lee underwent what sociologist Erving Goffman coined in his 1960 text Asylums as the process of “mortification.” After a burst of Asian American solidarity and legal aid that resulted in Lee’s triumphant release from Death Row in 1983 through a writ of habeas corpus, Lee struggled in the free world due to the long-lasting consequences of institutionalization, which led to alienation, recidivism, and an early death at the age of 62. This paper examines the trajectory of Lee’s trial and the legal activism behind it within the context of Goffman’s theory of total institutions and offer a nuanced reading of Lee’s case both during and after his incarceration.

Keywords: criminal justice, criminal law, law and mental capacity, habeas corpus, deinstitutionalization, mental health

Procedia PDF Downloads 20
27389 Investigating Iraqi EFL University Students' Productive Knowledge of Grammatical Collocations in English

Authors: Adnan Z. Mkhelif

Abstract:

Grammatical collocations (GCs) are word combinations containing a preposition or a grammatical structure, such as an infinitive (e.g. smile at, interested in, easy to learn, etc.). Such collocations tend to be difficult for Iraqi EFL university students (IUS) to master. To help address this problem, it is important to identify the factors causing it. This study aims at investigating the effects of L2 proficiency, frequency of GCs and their transparency on IUSs’ productive knowledge of GCs. The study involves 112 undergraduate participants with different proficiency levels, learning English in formal contexts in Iraq. The data collection instruments include (but not limited to) a productive knowledge test (designed by the researcher using the British National Corpus (BNC)), as well as the grammar part of the Oxford Placement Test (OPT). The study findings have shown that all the above-mentioned factors have significant effects on IUSs’ productive knowledge of GCs. In addition to establishing evidence of which factors of L2 learning might be relevant to learning GCs, it is hoped that the findings of the present study will contribute to more effective methods of teaching that can better address and help overcome the problems IUSs encounter in learning GCs. The study is thus hoped to have significant theoretical and pedagogical implications for researchers, syllabus designers as well as teachers of English as a foreign/second language.

Keywords: corpus linguistics, frequency, grammatical collocations, L2 vocabulary learning, productive knowledge, proficiency, transparency

Procedia PDF Downloads 243
27388 Modeling False Statements in Texts

Authors: Francielle A. Vargas, Thiago A. S. Pardo

Abstract:

According to the standard philosophical definition, lying is saying something that you believe to be false with the intent to deceive. For deception detection, the FBI trains its agents in a technique named statement analysis, which attempts to detect deception based on parts of speech (i.e., linguistics style). This method is employed in interrogations, where the suspects are first asked to make a written statement. In this poster, we model false statements using linguistics style. In order to achieve this, we methodically analyze linguistic features in a corpus of fake news in the Portuguese language. The results show that they present substantial lexical, syntactic and semantic variations, as well as punctuation and emotion distinctions.

Keywords: deception detection, linguistics style, computational linguistics, natural language processing

Procedia PDF Downloads 210
27387 Verbal Prefix Selection in Old Japanese: A Corpus-Based Study

Authors: Zixi You

Abstract:

There are a number of verbal prefixes in Old Japanese. However, the selection or the compatibility of verbs and verbal prefixes is among the least investigated topics on Old Japanese language. Unlike other types of prefixes, verbal prefixes in dictionaries are more often than not listed with very brief information such as ‘unknown meaning’ or ‘rhythmic function only’. To fill in a part of this knowledge gap, this paper presents an exhaustive investigation based on the newly developed ‘Oxford Corpus of Old Japanese’ (OCOJ), which included nearly all existing resource of Old Japanese language, with detailed linguistics information in TEI-XML tags. In this paper, we propose the possibility that the following three prefixes, i-, sa-, ta- (with ta- being considered as a variation of sa-), are relevant to split intransitivity in Old Japanese, with evidence that unergative verbs favor i- and that unergative verbs favor sa-(ta-). This might be undermined by the fact that transitives are also found to follow i-. However, with several manifestations of split intransitivity in Old Japanese discussed, the behavior of transitives in verbal prefix selection is no longer as surprising as it may seem to be when one look at the selection of verbal prefix in isolation. It is possible that there are one or more features that played essential roles in determining the selection of i-, and the attested transitive verbs happen to have these features. The data suggest that this feature is a sense of ‘change’ of location or state involved in the event donated by the verb, which is a feature of typical unaccusatives. This is further discussed in the ‘affectedness’ hierarchy. The presentation of this paper, which includes a brief demonstration of the OCOJ, is expected to be of the interest of both specialists and general audiences.

Keywords: old Japanese, split intransitivity, unaccusatives, unergatives, verbal prefix selection

Procedia PDF Downloads 408
27386 A Study of the Use of Arguments in Nominalizations as Instanciations of Grammatical Metaphors Finished in -TION in Academic Texts of Native Speakers

Authors: Giovana Perini-Loureiro

Abstract:

The purpose of this research was to identify whether the nominalizations terminating in -TION in the academic discourse of native English speakers contain the arguments required by their input verbs. In the perspective of functional linguistics, ideational metaphors, with nominalization as their most pervasive realization, are lexically dense, and therefore frequent in formal texts. Ideational metaphors allow the academic genre to instantiate objectification, de-personalization, and the ability to construct a chain of arguments. The valence of those nouns present in nominalizations tends to maintain the same elements of the valence from its original verbs, but these arguments are not always expressed. The initial hypothesis was that these arguments would also be present alongside the nominalizations, through anaphora or cataphora. In this study, a qualitative analysis of the occurrences of the five more frequent nominalized terminations in -TION in academic texts was accomplished, and thus a verification of the occurrences of the arguments required by the original verbs. The assembling of the concordance lines was done through COCA (Corpus of Contemporary American English). After identifying the five most frequent nominalizations (attention, action, participation, instruction, intervention), the concordance lines were selected at random to be analyzed, assuring the representativeness and reliability of the sample. It was possible to verify, in all the analyzed instances, the presence of arguments. In most instances, the arguments were not expressed, but recoverable, either in the context or in the shared knowledge among the interactants. It was concluded that the realizations of the arguments which were not expressed alongside the nominalizations are part of a continuum, starting from the immediate context with anaphora and cataphora; up to a knowledge shared outside the text, such as specific area knowledge. The study also has implications for the teaching of academic writing, especially with regards to the impact of nominalizations on the thematic and informational flow of the text. Grammatical metaphors are essential to academic writing, hence acknowledging the occurrence of its arguments is paramount to achieve linguistic awareness and the writing prestige required by the academy.

Keywords: corpus, functional linguistics, grammatical metaphors, nominalizations, academic English

Procedia PDF Downloads 138
27385 Part of Speech Tagging Using Statistical Approach for Nepali Text

Authors: Archit Yajnik

Abstract:

Part of Speech Tagging has always been a challenging task in the era of Natural Language Processing. This article presents POS tagging for Nepali text using Hidden Markov Model and Viterbi algorithm. From the Nepali text, annotated corpus training and testing data set are randomly separated. Both methods are employed on the data sets. Viterbi algorithm is found to be computationally faster and accurate as compared to HMM. The accuracy of 95.43% is achieved using Viterbi algorithm. Error analysis where the mismatches took place is elaborately discussed.

Keywords: hidden markov model, natural language processing, POS tagging, viterbi algorithm

Procedia PDF Downloads 322
27384 How Is a Machine-Translated Literary Text Organized in Coherence? An Analysis Based upon Theme-Rheme Structure

Authors: Jiang Niu, Yue Jiang

Abstract:

With the ultimate goal to automatically generate translated texts with high quality, machine translation has made tremendous improvements. However, its translations of literary works are still plagued with problems in coherence, esp. the translation between distant language pairs. One of the causes of the problems is probably the lack of linguistic knowledge to be incorporated into the training of machine translation systems. In order to enable readers to better understand the problems of machine translation in coherence, to seek out the potential knowledge to be incorporated, and thus to improve the quality of machine translation products, this study applies Theme-Rheme structure to examine how a machine-translated literary text is organized and developed in terms of coherence. Theme-Rheme structure in Systemic Functional Linguistics is a useful tool for analysis of textual coherence. Theme is the departure point of a clause and Rheme is the rest of the clause. In a text, as Themes and Rhemes may be connected with each other in meaning, they form thematic and rhematic progressions throughout the text. Based on this structure, we can look into how a text is organized and developed in terms of coherence. Methodologically, we chose Chinese and English as the language pair to be studied. Specifically, we built a comparable corpus with two modes of English translations, viz. machine translation (MT) and human translation (HT) of one Chinese literary source text. The translated texts were annotated with Themes, Rhemes and their progressions throughout the texts. The annotated texts were analyzed from two respects, the different types of Themes functioning differently in achieving coherence, and the different types of thematic and rhematic progressions functioning differently in constructing texts. By analyzing and contrasting the two modes of translations, it is found that compared with the HT, 1) the MT features “pseudo-coherence”, with lots of ill-connected fragments of information using “and”; 2) the MT system produces a static and less interconnected text that reads like a list; these two points, in turn, lead to the less coherent organization and development of the MT than that of the HT; 3) novel to traditional and previous studies, Rhemes do contribute to textual connection and coherence though less than Themes do and thus are worthy of notice in further studies. Hence, the findings suggest that Theme-Rheme structure be applied to measuring and assessing the coherence of machine translation, to being incorporated into the training of the machine translation system, and Rheme be taken into account when studying the textual coherence of both MT and HT.

Keywords: coherence, corpus-based, literary translation, machine translation, Theme-Rheme structure

Procedia PDF Downloads 195
27383 Patronage Network and Ideological Manipulations in Translation of Literary Texts: A Case Study of George Orwell's “1984” in Persian Translation in the Period 1980 to 2015

Authors: Masoud Hassanzade Novin, Bahloul Salmani

Abstract:

The process of the translation is not merely the linguistic aspects. It is also considered in the cultural framework of both the source and target text cultures. The translation process and translated texts are confronted the new aspect in 20th century which is considered mostly in the patronage framework and ideological grillwork of the target language. To have these factors scrutinized in the process of the translation both micro-element factors and macro-element factors can be taken into consideration. For the purpose of this study through a qualitative type of research based on critical discourse analysis approach, the case study of the novel “1984” written by George Orwell was chosen as the corpus of the study to have the contrastive analysis by its Persian translated texts. Results of the study revealed some distortions embedded in the target texts which were overshadowed by ideological aspect and patronage network. The outcomes of the manipulated terms were different in various categories which revealed the manipulation aspects in the texts translated.

Keywords: critical discourse analysis, ideology, patronage network, translated texts

Procedia PDF Downloads 313
27382 Hierarchical Tree Long Short-Term Memory for Sentence Representations

Authors: Xiuying Wang, Changliang Li, Bo Xu

Abstract:

A fixed-length feature vector is required for many machine learning algorithms in NLP field. Word embeddings have been very successful at learning lexical information. However, they cannot capture the compositional meaning of sentences, which prevents them from a deeper understanding of language. In this paper, we introduce a novel hierarchical tree long short-term memory (HTLSTM) model that learns vector representations for sentences of arbitrary syntactic type and length. We propose to split one sentence into three hierarchies: short phrase, long phrase and full sentence level. The HTLSTM model gives our algorithm the potential to fully consider the hierarchical information and long-term dependencies of language. We design the experiments on both English and Chinese corpus to evaluate our model on sentiment analysis task. And the results show that our model outperforms several existing state of the art approaches significantly.

Keywords: deep learning, hierarchical tree long short-term memory, sentence representation, sentiment analysis

Procedia PDF Downloads 347
27381 Political Communication in Twitter Interactions between Government, News Media and Citizens in Mexico

Authors: Jorge Cortés, Alejandra Martínez, Carlos Pérez, Anaid Simón

Abstract:

The presence of government, news media, and general citizenry in social media allows considering interactions between them as a form of political communication (i.e. the public exchange of contradictory discourses about politics). Twitter’s asymmetrical following model (users can follow, mention or reply to other users that do not follow them) could foster alternative democratic practices and have an impact on Mexican political culture, which has been marked by a lack of direct communication channels between these actors. The research aim is to assess Twitter’s role in political communication practices through the analysis of interaction dynamics between government, news media, and citizens by extracting and visualizing data from Twitter’s API to observe general behavior patterns. The hypothesis is that regardless the fact that Twitter’s features enable direct and horizontal interactions between actors, users repeat traditional dynamics of interaction, without taking full advantage of the possibilities of this medium. Through an interdisciplinary team including Communication Strategies, Information Design, and Interaction Systems, the activity on Twitter generated by the controversy over the presence of Uber in Mexico City was analysed; an issue of public interest, involving aspects such as public opinion, economic interests and a legal dimension. This research includes techniques from social network analysis (SNA), a methodological approach focused on the comprehension of the relationships between actors through the visual representation and measurement of network characteristics. The analysis of the Uber event comprised data extraction, data categorization, corpus construction, corpus visualization and analysis. On the recovery stage TAGS, a Google Sheet template, was used to extract tweets that included the hashtags #UberSeQueda and #UberSeVa, posts containing the string Uber and tweets directed to @uber_mx. Using scripts written in Python, the data was filtered, discarding tweets with no interaction (replies, retweets or mentions) and locations outside of México. Considerations regarding bots and the omission of anecdotal posts were also taken into account. The utility of graphs to observe interactions of political communication in general was confirmed by the analysis of visualizations generated with programs such as Gephi and NodeXL. However, some aspects require improvements to obtain more useful visual representations for this type of research. For example, link¬crossings complicates following the direction of an interaction forcing users to manipulate the graph to see it clearly. It was concluded that some practices prevalent in political communication in Mexico are replicated in Twitter. Media actors tend to group together instead of interact with others. The political system tends to tweet as an advertising strategy rather than to generate dialogue. However, some actors were identified as bridges establishing communication between the three spheres, generating a more democratic exercise and taking advantage of Twitter’s possibilities. Although interactions in Twitter could become an alternative to political communication, this potential depends on the intentions of the participants and to what extent they are aiming for collaborative and direct communications. Further research is needed to get a deeper understanding on the political behavior of Twitter users and the possibilities of SNA for its analysis.

Keywords: interaction, political communication, social network analysis, Twitter

Procedia PDF Downloads 217
27380 Alignment and Antagonism in Flux: A Diachronic Sentiment Analysis of Attitudes towards the Chinese Mainland in the Hong Kong Press

Authors: William Feng, Qingyu Gao

Abstract:

Despite the extensive discussions about Hong Kong’s sentiments towards the Chinese Mainland since the sovereignty transfer in 1997, there has been no large-scale empirical analysis of the changing attitudes in the mainstream media, which both reflect and shape sentiments in the society. To address this gap, the present study uses an optimised semantic-based automatic sentiment analysis method to examine a corpus of news about China from 1997 to 2020 in three main Chinese-language newspapers in Hong Kong, namely Apple Daily, Ming Pao, and Oriental Daily News. The analysis shows that although the Hong Kong press had a positive emotional tone toward China in general, the overall trend of sentiment was becoming increasingly negative. Meanwhile, the alignment and antagonism toward China have both increased, providing empirical evidence of attitudinal polarisation in the Hong Kong society. Specifically, Apple Daily’s depictions of China have become increasingly negative, though with some positive turns before 2008, whilst Oriental Daily News has consistently expressed more favourable sentiments. Ming Pao maintained an impartial stance toward China through an increased but balanced representation of positive and negative sentiments, with its subjectivity and sentiment intensity growing to an industry-standard level. The results provide new insights into the complexity of sentiments towards China in the Hong Kong press and media attitudes in general in terms of the “us” and “them” positioning by explicating the cross-newspaper and cross-period variations using an enhanced sentiment analysis method which incorporates sentiment-oriented and semantic role analysis techniques.

Keywords: media attitude, sentiment analysis, Hong Kong press, one country two systems

Procedia PDF Downloads 105
27379 Cognitive Translation and Conceptual Wine Tasting Metaphors: A Corpus-Based Research

Authors: Christine Demaecker

Abstract:

Many researchers have underlined the importance of metaphors in specialised language. Their use of specific domains helps us understand the conceptualisations used to communicate new ideas or difficult topics. Within the wide area of specialised discourse, wine tasting is a very specific example because it is almost exclusively metaphoric. Wine tasting metaphors express various conceptualisations. They are not linguistic but rather conceptual, as defined by Lakoff & Johnson. They correspond to the linguistic expression of a mental projection from a well-known or more concrete source domain onto the target domain, which is the taste of wine. But unlike most specialised terminologies, the vocabulary is never clearly defined. When metaphorical terms are listed in dictionaries, their definitions remain vague, unclear, and circular. They cannot be replaced by literal linguistic expressions. This makes it impossible to transfer them into another language with the traditional linguistic translation methods. Qualitative research investigates whether wine tasting metaphors could rather be translated with the cognitive translation process, as well described by Nili Mandelblit (1995). The research is based on a corpus compiled from two high-profile wine guides; the Parker’s Wine Buyer’s Guide and its translation into French and the Guide Hachette des Vins and its translation into English. In this small corpus with a total of 68,826 words, 170 metaphoric expressions have been identified in the original English text and 180 in the original French text. They have been selected with the MIPVU Metaphor Identification Procedure developed at the Vrije Universiteit Amsterdam. The selection demonstrates that both languages use the same set of conceptualisations, which are often combined in wine tasting notes, creating conceptual integrations or blends. The comparison of expressions in the source and target texts also demonstrates the use of the cognitive translation approach. In accordance with the principle of relevance, the translation always uses target language conceptualisations, but compared to the original, the highlighting of the projection is often different. Also, when original metaphors are complex with a combination of conceptualisations, at least one element of the original metaphor underlies the target expression. This approach perfectly integrates into Lederer’s interpretative model of translation (2006). In this triangular model, the transfer of conceptualisation could be included at the level of ‘deverbalisation/reverbalisation’, the crucial stage of the model, where the extraction of meaning combines with the encyclopedic background to generate the target text.

Keywords: cognitive translation, conceptual integration, conceptual metaphor, interpretative model of translation, wine tasting metaphor

Procedia PDF Downloads 128
27378 Transitivity Analysis in Reading Passage of English Text Book for Senior High School

Authors: Elitaria Bestri Agustina Siregar, Boni Fasius Siregar

Abstract:

The paper concerned with the transitivity in the reading passage of English textbook for Senior High School. The six types of process were occurred in the passages with percentage as follows: Material Process is 166 (42%), Relational Process is 155 (39%), Mental Process is 39 (10%), Verbal Process is 21 (5%), Existential Process is 13 (3), and Behavioral Process is 5 (1%). The material processes were found to be the most frequently used process type in the samples in our corpus (41,60 %). This indicates that the twenty reading passages are centrally concerned with action and events. Related to developmental psychology theory, this book fits the needs of students of this age.

Keywords: transitivity, types of processes, reading passages, developmental psycholoy

Procedia PDF Downloads 403
27377 Time and Cost Prediction Models for Language Classification Over a Large Corpus on Spark

Authors: Jairson Barbosa Rodrigues, Paulo Romero Martins Maciel, Germano Crispim Vasconcelos

Abstract:

This paper presents an investigation of the performance impacts regarding the variation of five factors (input data size, node number, cores, memory, and disks) when applying a distributed implementation of Naïve Bayes for text classification of a large Corpus on the Spark big data processing framework. Problem: The algorithm's performance depends on multiple factors, and knowing before-hand the effects of each factor becomes especially critical as hardware is priced by time slice in cloud environments. Objectives: To explain the functional relationship between factors and performance and to develop linear predictor models for time and cost. Methods: the solid statistical principles of Design of Experiments (DoE), particularly the randomized two-level fractional factorial design with replications. This research involved 48 real clusters with different hardware arrangements. The metrics were analyzed using linear models for screening, ranking, and measurement of each factor's impact. Results: Our findings include prediction models and show some non-intuitive results about the small influence of cores and the neutrality of memory and disks on total execution time, and the non-significant impact of data input scale on costs, although notably impacts the execution time.

Keywords: big data, design of experiments, distributed machine learning, natural language processing, spark

Procedia PDF Downloads 110
27376 How Unicode Glyphs Revolutionized the Way We Communicate

Authors: Levi Corallo

Abstract:

Typed language made by humans on computers and cell phones has made a significant distinction from previous modes of written language exchanges. While acronyms remain one of the most predominant markings of typed language, another and perhaps more recent revolution in the way humans communicate has been with the use of symbols or glyphs, primarily Emojis—globally introduced on the iPhone keyboard by Apple in 2008. This paper seeks to analyze the use of symbols in typed communication from both a linguistic and machine learning perspective. The Unicode system will be explored and methods of encoding will be juxtaposed with the current machine and human perception. Topics in how typed symbol usage exists in conversation will be explored as well as topics across current research methods dealing with Emojis like sentiment analysis, predictive text models, and so on. This study proposes that sequential analysis is a significant feature for analyzing unicode characters in a corpus with machine learning. Current models that are trying to learn or translate the meaning of Emojis should be starting to learn using bi- and tri-grams of Emoji, as well as observing the relationship between combinations of different Emoji in tandem. The sociolinguistics of an entire new vernacular of language referred to here as ‘typed language’ will also be delineated across my analysis with unicode glyphs from both a semantic and technical perspective.

Keywords: unicode, text symbols, emojis, glyphs, communication

Procedia PDF Downloads 190
27375 The Role and Effects of Communication on Occupational Safety: A Review

Authors: Pieter A. Cornelissen, Joris J. Van Hoof

Abstract:

The interest in improving occupational safety started almost simultaneously with the beginning of the Industrial Revolution. Yet, it was not until the late 1970’s before the role of communication was considered in scientific research regarding occupational safety. In recent years the importance of communication as a means to improve occupational safety has increased. Not only as communication might have a direct effect on safety performance and safety outcomes, but also as it can be viewed as a major component of other important safety-related elements (e.g., training, safety meetings, leadership). And while safety communication is an increasingly important topic in research, its operationalization is often vague and differs among studies. This is not only problematic when comparing results, but also in applying these results to practice and the work floor. By means of an in-depth analysis—building on an existing dataset—this review aims to overcome these problems. The initial database search yielded 25.527 articles, which was reduced to a research corpus of 176 articles. Focusing on the 37 articles of this corpus that addressed communication (related to safety outcomes and safety performance), the current study will provide a comprehensive overview of the role and effects of safety communication and outlines the conditions under which communication contributes to a safer work environment. The study shows that in literature a distinction is commonly made between safety communication (i.e., the exchange or dissemination of safety-related information) and feedback (i.e. a reactive form of communication). And although there is a consensus among researchers that both communication and feedback positively affect safety performance, there is a debate about the directness of this relationship. Whereas some researchers assume a direct relationship between safety communication and safety performance, others state that this relationship is mediated by safety climate. One of the key findings is that despite the strongly present view that safety communication is a formal and top-down safety management tool, researchers stress the importance of open communication that encourages and allows employees to express their worries, experiences, views, and share information. This raises questions with regard to other directions (e.g., bottom-up, horizontal) and forms of communication (e.g., informal). The current review proposes a framework to overcome the often vague and different operationalizations of safety communication. The proposed framework can be used to characterize safety communication in terms of stakeholders, direction, and characteristics of communication (e.g., medium usage).

Keywords: communication, feedback, occupational safety, review

Procedia PDF Downloads 297
27374 The Influence of Screen Translation on Creative Audiovisual Writing: A Corpus-Based Approach

Authors: John D. Sanderson

Abstract:

The popularity of American cinema worldwide has contributed to the development of sociolects related to specific film genres in other cultural contexts by means of screen translation, in many cases eluding norms of usage in the target language, a process whose result has come to be known as 'dubbese'. A consequence for the reception in countries where local audiovisual fiction consumption is far lower than American imported productions is that this linguistic construct is preferred, even though it differs from common everyday speech. The iconography of film genres such as science-fiction, western or sword-and-sandal films, for instance, generates linguistic expectations in international audiences who will accept more easily the sociolects assimilated by the continuous reception of American productions, even if the themes, locations, characters, etc., portrayed on screen may belong in origin to other cultures. And the non-normative language (e.g., calques, semantic loans) used in the preferred mode of linguistic transfer, whether it is translation for dubbing or subtitling, has diachronically evolved in many cases into a status of canonized sociolect, not only accepted but also required, by foreign audiences of American films. However, a remarkable step forward is taken when this typology of artificial linguistic constructs starts being used creatively by nationals of these target cultural contexts. In the case of Spain, the success of American sitcoms such as Friends in the 1990s led Spanish television scriptwriters to include in national productions lexical and syntactical indirect borrowings (Anglicisms not formally identifiable as such because they include elements from their own language) in order to target audiences of the former. However, this commercial strategy had already taken place decades earlier when Spain became a favored location for the shooting of foreign films in the early 1960s. The international popularity of the then newly developed sub-genre known as Spaghetti-Western encouraged Spanish investors to produce their own movies, and local scriptwriters made use of the dubbese developed nationally since the advent of sound in film instead of using normative language. As a result, direct Anglicisms, as well as lexical and syntactical borrowings made up the creative writing of these Spanish productions, which also became commercially successful. Interestingly enough, some of these films were even marketed in English-speaking countries as original westerns (some of the names of actors and directors were anglified to that purpose) dubbed into English. The analysis of these 'back translations' will also foreground some semantic distortions that arose in the process. In order to perform the research on these issues, a wide corpus of American films has been used, which chronologically range from Stagecoach (John Ford, 1939) to Django Unchained (Quentin Tarantino, 2012), together with a shorter corpus of Spanish films produced during the golden age of Spaghetti Westerns, from una tumba para el sheriff (Mario Caiano; in English lone and angry man, William Hawkins) to tu fosa será la exacta, amigo (Juan Bosch, 1972; in English my horse, my gun, your widow, John Wood). The methodology of analysis and the conclusions reached could be applied to other genres and other cultural contexts.

Keywords: dubbing, film genre, screen translation, sociolect

Procedia PDF Downloads 163
27373 Language as an Instrument of Manipulation and Political Control in Nigeria: The 2015 Presidential Election in Perspective

Authors: Abdulmalik Adamu

Abstract:

This study is premised on the assumption that language, particularly, English plays a significant role in the acquisition of power in Nigeria. This is against the backdrop of the fact that for the first time in the political history of Nigeria, an opposition party succeeded in dethroning an incumbent President and ruling political party in an election. Therefore the main objective was to investigate the role of language, particularly English in the acquisition of political power in Nigeria. The corpus generated for this study consisted of excerpts from the media exchange between the spokespersons of the two dominant political parties at the time of the elections in 2015; Olisa Metuh of the Peoples Democratic Party (PDP) and Lai Mohammed of the All Progressive Party (APC). The excerpts were analysed using Critical Discourse Analysis (CDA) as a research tool. The findings revealed the acceptance of the first proposition that English facilitates the acquisition of political power in Nigeria and the rejection of the second proposition that English is an instrument for the exclusion of the populist from political events in Nigeria. The study, therefore, concluded that language, particularly English played a significant role in the acquisition of political power in Nigeria.

Keywords: language, power, politics, Critical Discourse Analysis (CDA)

Procedia PDF Downloads 391
27372 Understanding Consumer Recycling Behavior: A Literature Review of Motivational and Behavioral Aspects

Authors: Karin Johansson, Ola Johansson

Abstract:

Recycling is an important aspect of a sustainable society and depends to a large extent on consumers’ willingness to provide the voluntary work needed to take the first critical step in many return logistics systems. Based on a systematic review of articles on recycling behavior, this paper presents and discusses the findings in relation to Fogg’s Behavioral Model (FBM). Through the analysis of a corpus of 72 articles, the most important research contributions on recycling behavior are summarized and discussed. The choice of using FBM as a framework provides a new way of viewing previous research findings, and aids in identifying knowledge gaps. Based on the review, this work identifies and discusses four areas of potential interest for future research.

Keywords: recycling, reverse logistics, solid waste management, sustainability

Procedia PDF Downloads 139