Search results for: corpus analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 26877

Search results for: corpus analysis

26727 Finding Related Scientific Documents Using Formal Concept Analysis

Authors: Nadeem Akhtar, Hira Javed

Abstract:

An important aspect of research is literature survey. Availability of a large amount of literature across different domains triggers the need for optimized systems which provide relevant literature to researchers. We propose a search system based on keywords for text documents. This experimental approach provides a hierarchical structure to the document corpus. The documents are labelled with keywords using KEA (Keyword Extraction Algorithm) and are automatically organized in a lattice structure using Formal Concept Analysis (FCA). This groups the semantically related documents together. The hierarchical structure, based on keywords gives out only those documents which precisely contain them. This approach open doors for multi-domain research. The documents across multiple domains which are indexed by similar keywords are grouped together. A hierarchical relationship between keywords is obtained. To signify the effectiveness of the approach, we have carried out the experiment and evaluation on Semeval-2010 Dataset. Results depict that the presented method is considerably successful in indexing of scientific papers.

Keywords: formal concept analysis, keyword extraction algorithm, scientific documents, lattice

Procedia PDF Downloads 298
26726 Investigating Translations of Websites of Pakistani Public Offices

Authors: Sufia Maroof

Abstract:

This empirical study investigated the web-translations of five Pakistani public offices (FPSC, FIA, HEC, USB, and Ministry of Finance) offering Urdu tab as an option to access information on their official websites. Triangulation of quantitative and qualitative research design informed the researcher of the semantic, lexical and syntactic caveats in these translations. The study hypothesized that majority of the Pakistani population is oblivious of the Supreme Court’s amendments in language policy concerning national and official language; hence, Urdu web-translations of the public departments have not been accessed effectively. Firstly, the researcher conducted an online survey, comprising of two sections, close ended and short answer based questions. Secondly, the researcher compiled corpus of the five selected websites in a tabular form to compare the data. Thirdly, the administrators of the departments had been contacted regarding the methods of translation and the expertise of the personnel involved. The corpus was assessed for TQA after examining the lexical, semantic, syntactical and technical alignment inaccuracies and imperfections. The study suggests the public offices to invest in their Urdu webs by either hiring expert translators or engaging expertise of a translation agency for this project to offer quality translation to public.

Keywords: machine translations, public offices, Urdu translations, websites

Procedia PDF Downloads 94
26725 A Corpus-based Study of Adjuncts in Colombian English as a Second Language (ESL) Argumentative Essays

Authors: E. Velasco

Abstract:

Meeting high standards of writing in a Second Language (L2) is extremely important for many students who wish to undertake studies at universities in both English and non-English speaking countries. University lecturers in English speaking countries continue to express dissatisfaction with the apparent poor quality of essay writing skills displayed by English as a Second Language (ESL) students, whose essays are often criticised for their lack of cohesion and coherence. These critiques have extended to contexts such as Colombia, where many ESL students are criticised for their inability to write high-quality academic texts in L2-English, particularly at the tertiary level. If Colombian ESL students are expected to meet high standards of writing when studying locally and abroad, it makes sense to carry out specific research that can perhaps lead to recommendations to support their quest for improving argumentative strategies. Employing Corpus Linguistics methods within a Learner Corpus Research framework, and a combination of Log-Likelihood and Bayes Factor measures, this paper investigated argumentative essays written by Colombian ESL students. The study specifically aimed to analyse conjunctive adjuncts in argumentative essays to find out how Colombian ESL students connect their ideas in discourse. Results suggest that a) Colombian ESL learners need explicit instruction on specific areas of conjunctive adjuncts to counteract overuse, underuse and misuse; b) underuse of endophoric and evidential adjuncts highlights gaps between IELTS-like essays and good quality tertiary-level essays and published papers, and these gaps are linked to prior knowledge brought into writing task, rhetorical functions in writing, and research processes before writing takes place; c) both Colombian ESL learners and L1-English writers (in a reference corpus) overuse some adjuncts and underuse endophoric and evidential adjuncts, when compared to skilled L1-English and L2-English writers, so differences in frequencies of adjuncts has little to do with the writers’ L1, and differences are rather linked to types of essays writers produce (e.g. ESL vs. university essays). Ender Velasco: The pedagogical recommendations deriving from the study are that: a) Colombian ESL learners need to be shown that overuse is not the only way of giving cohesion to argumentative essays and there are other alternatives to cohesion (e.g., implicit adjuncts, lexical chains and collocations); b) syllabi and classroom input need to raise awareness of gaps in writing skills between IELTS-like and tertiary-level argumentative essays, and of how endophoric and evidential adjuncts are used to refer to anaphoric and cataphoric sections of essays, and to other people’s work or ideas; c) syllabi and classroom input need to include essay-writing tasks based on previous research/reading which learners need to incorporate into their arguments, and tasks that raise awareness of referencing systems (e.g., APA); d) classroom input needs to include explicit instruction on use of punctuation, functions and/or syntax with specific conjunctive adjuncts such as for example, for that reason, although, despite and nevertheless.

Keywords: argumentative essays, colombian english as a second language (esl) learners, conjunctive adjuncts, corpus linguistics

Procedia PDF Downloads 38
26724 Neural Machine Translation for Low-Resource African Languages: Benchmarking State-of-the-Art Transformer for Wolof

Authors: Cheikh Bamba Dione, Alla Lo, Elhadji Mamadou Nguer, Siley O. Ba

Abstract:

In this paper, we propose two neural machine translation (NMT) systems (French-to-Wolof and Wolof-to-French) based on sequence-to-sequence with attention and transformer architectures. We trained our models on a parallel French-Wolof corpus of about 83k sentence pairs. Because of the low-resource setting, we experimented with advanced methods for handling data sparsity, including subword segmentation, back translation, and the copied corpus method. We evaluate the models using the BLEU score and find that transformer outperforms the classic seq2seq model in all settings, in addition to being less sensitive to noise. In general, the best scores are achieved when training the models on word-level-based units. For subword-level models, using back translation proves to be slightly beneficial in low-resource (WO) to high-resource (FR) language translation for the transformer (but not for the seq2seq) models. A slight improvement can also be observed when injecting copied monolingual text in the target language. Moreover, combining the copied method data with back translation leads to a substantial improvement of the translation quality.

Keywords: backtranslation, low-resource language, neural machine translation, sequence-to-sequence, transformer, Wolof

Procedia PDF Downloads 109
26723 Variation in Italian Specialized Economic Texts

Authors: Abdelmagid Basyouny Sakr

Abstract:

Terminological variation is a reality and it is now recognized by terminologists. This paper investigates the terminological variation in the context of specialized economic texts in Italian. It aims to find whether certain patterns or tendencies can be derived from the analysis of these texts. Term variants pose two different kinds of difficulties. The first one is being able to recognize linguistic expressions that denote the same concept in running text. Another one lies in knowing which variant should be considered and for what purpose. This would help to differentiate between variants that could be candidates for inclusion in terminological resources and the ones which are synonyms or contextual variants. New insights about terminological variation in specialized texts could contribute to improve specialized dictionaries which will better account for the different ways in which a given thought is expressed.

Keywords: corpus linguistics, specialized communication, terms and concepts, terminological variation

Procedia PDF Downloads 120
26722 A Pragmatic Approach of Memes Created in Relation to the COVID-19 Pandemic

Authors: Alexandra-Monica Toma

Abstract:

Internet memes are an element of computer mediated communication and an important part of online culture that combines text and image in order to generate meaning. This term coined by Richard Dawkings refers to more than a mere way to briefly communicate ideas or emotions, thus naming a complex and an intensely perpetuated phenomenon in the virtual environment. This paper approaches memes as a cultural artefact and a virtual trope that mirrors societal concerns and issues, and analyses the pragmatics of their use. Memes have to be analysed in series, usually relating to some image macros, which is proof of the interplay between imitation and creativity in the memes’ writing process. We believe that their potential to become viral relates to three key elements: adaptation to context, reference to a successful meme series, and humour (jokes, irony, sarcasm), with various pragmatic functions. The study also uses the concept of multimodality and stresses how the memes’ text interacts with the image, discussing three types of relations: symmetry, amplification, and contradiction. Moreover, the paper proves that memes could be employed as speech acts with illocutionary force, when the interaction between text and image is enriched through the connection to a specific situation. The features mentioned above are analysed in a corpus that consists of memes related to the COVID-19 pandemic. This corpus shows them to be highly adaptable to context, which helps build the feeling of connection and belonging in an otherwise tremendously fragmented world. Some of them are created based on well-known image macros, and their humour results from an intricate dialogue between texts and contexts. Memes created in relation to the COVID-19 pandemic can be considered speech acts and are often used as such, as proven in the paper. Consequently, this paper tackles the key features of memes, makes a thorough analysis of the memes sociocultural, linguistic, and situational context, and emphasizes their intertextuality, with special accent on their illocutionary potential.

Keywords: context, memes, multimodality, speech acts

Procedia PDF Downloads 159
26721 A Corpus-Based Diachronic Study on Indefinite Pronominal Anaphora in English

Authors: Qiong Hu

Abstract:

From old English to modern English, the gender category has changed from grammatical gender system to natural gender system. The word classes that reflected gender has changed from pronouns, adjectives, and numerals in old English to only pronouns in modern English. In present-day English, the third person singular pronouns are the only paradigm that keeps an intact gender. 'He' and 'they' used as epicene pronouns are one of the two commonest phenomena of gender disagreement (the other being those against the natural gender). Considering the convenience of corpus concordance, epicene pronoun usage is selected in this study in which the anaphors are restricted to possessives (eg. his, their), and the antecedents are restricted to compound indefinite pronouns (eg. someone, somebody). Factors like writing form (eg. someone vs. some one), the semantics of the prefixes (eg. some- vs. any-), and suffixes (eg. -one vs. -body), as well as frequency, are taken into consideration. Statistics indicate that 'their' is increasingly used as the epicene pronoun compared with the decline of 'his' (when both writing forms are considered). This is influenced by social factors such as feminist movement, as well as the semantics and frequency of antecedents. Their (plural) used in anaphoric reference to various indefinite pronouns (singular in form) can also be treated as number variation in third person pronouns, and the trend that 'their' in place of his can also be treated as a change in number category. Among different candidates for the gender-neutral function, 'their' is proven to be the most promising one based on the diachronic data. This does not reject any new competitors in the future which still remains to be seen.

Keywords: language variation and change, epicene pronouns, gender, number

Procedia PDF Downloads 154
26720 Modeling False Statements in Texts

Authors: Francielle A. Vargas, Thiago A. S. Pardo

Abstract:

According to the standard philosophical definition, lying is saying something that you believe to be false with the intent to deceive. For deception detection, the FBI trains its agents in a technique named statement analysis, which attempts to detect deception based on parts of speech (i.e., linguistics style). This method is employed in interrogations, where the suspects are first asked to make a written statement. In this poster, we model false statements using linguistics style. In order to achieve this, we methodically analyze linguistic features in a corpus of fake news in the Portuguese language. The results show that they present substantial lexical, syntactic and semantic variations, as well as punctuation and emotion distinctions.

Keywords: deception detection, linguistics style, computational linguistics, natural language processing

Procedia PDF Downloads 177
26719 Learning Vocabulary with SkELL: Developing a Methodology with University Students in Japan Using Action Research

Authors: Henry R. Troy

Abstract:

Corpora are becoming more prevalent in the language classroom, especially in the development of dictionaries and course materials. Nevertheless, corpora are still perceived by many educators as difficult to use directly in the classroom, a process which is also known as “data-driven learning” (DDL). Action research has been identified as a method by which DDL’s efficiency can be increased, but it is also an approach few studies on DDL have employed. Studies into the effectiveness of DDL in language education in Japan are also rare, and investigations focused more on student and teacher reactions rather than pre and post-test scores are rarer still. This study investigates the student and teacher reactions to the use of SkELL, a free online corpus designed to be user-friendly, for vocabulary learning at a university in Japan. Action research is utilized to refine the teaching methodology, with changes to the method based on student and teacher feedback received via surveys submitted after each of the four implementations of DDL. After some training, the students used tablets to study the target vocabulary autonomously in pairs and groups, with the teacher acting as facilitator. The results show that the students enjoyed using SkELL and felt it was effective for vocabulary learning, while the teaching methodology grew in efficiency throughout the course. These findings suggest that action research can be a successful method for increasing the efficacy of DDL in the language classroom, especially with teachers and students who are new to the practice.

Keywords: action research, corpus linguistics, data-driven learning, vocabulary learning

Procedia PDF Downloads 202
26718 Patronage Network and Ideological Manipulations in Translation of Literary Texts: A Case Study of George Orwell's “1984” in Persian Translation in the Period 1980 to 2015

Authors: Masoud Hassanzade Novin, Bahloul Salmani

Abstract:

The process of the translation is not merely the linguistic aspects. It is also considered in the cultural framework of both the source and target text cultures. The translation process and translated texts are confronted the new aspect in 20th century which is considered mostly in the patronage framework and ideological grillwork of the target language. To have these factors scrutinized in the process of the translation both micro-element factors and macro-element factors can be taken into consideration. For the purpose of this study through a qualitative type of research based on critical discourse analysis approach, the case study of the novel “1984” written by George Orwell was chosen as the corpus of the study to have the contrastive analysis by its Persian translated texts. Results of the study revealed some distortions embedded in the target texts which were overshadowed by ideological aspect and patronage network. The outcomes of the manipulated terms were different in various categories which revealed the manipulation aspects in the texts translated.

Keywords: critical discourse analysis, ideology, patronage network, translated texts

Procedia PDF Downloads 289
26717 Part of Speech Tagging Using Statistical Approach for Nepali Text

Authors: Archit Yajnik

Abstract:

Part of Speech Tagging has always been a challenging task in the era of Natural Language Processing. This article presents POS tagging for Nepali text using Hidden Markov Model and Viterbi algorithm. From the Nepali text, annotated corpus training and testing data set are randomly separated. Both methods are employed on the data sets. Viterbi algorithm is found to be computationally faster and accurate as compared to HMM. The accuracy of 95.43% is achieved using Viterbi algorithm. Error analysis where the mismatches took place is elaborately discussed.

Keywords: hidden markov model, natural language processing, POS tagging, viterbi algorithm

Procedia PDF Downloads 301
26716 How Is a Machine-Translated Literary Text Organized in Coherence? An Analysis Based upon Theme-Rheme Structure

Authors: Jiang Niu, Yue Jiang

Abstract:

With the ultimate goal to automatically generate translated texts with high quality, machine translation has made tremendous improvements. However, its translations of literary works are still plagued with problems in coherence, esp. the translation between distant language pairs. One of the causes of the problems is probably the lack of linguistic knowledge to be incorporated into the training of machine translation systems. In order to enable readers to better understand the problems of machine translation in coherence, to seek out the potential knowledge to be incorporated, and thus to improve the quality of machine translation products, this study applies Theme-Rheme structure to examine how a machine-translated literary text is organized and developed in terms of coherence. Theme-Rheme structure in Systemic Functional Linguistics is a useful tool for analysis of textual coherence. Theme is the departure point of a clause and Rheme is the rest of the clause. In a text, as Themes and Rhemes may be connected with each other in meaning, they form thematic and rhematic progressions throughout the text. Based on this structure, we can look into how a text is organized and developed in terms of coherence. Methodologically, we chose Chinese and English as the language pair to be studied. Specifically, we built a comparable corpus with two modes of English translations, viz. machine translation (MT) and human translation (HT) of one Chinese literary source text. The translated texts were annotated with Themes, Rhemes and their progressions throughout the texts. The annotated texts were analyzed from two respects, the different types of Themes functioning differently in achieving coherence, and the different types of thematic and rhematic progressions functioning differently in constructing texts. By analyzing and contrasting the two modes of translations, it is found that compared with the HT, 1) the MT features “pseudo-coherence”, with lots of ill-connected fragments of information using “and”; 2) the MT system produces a static and less interconnected text that reads like a list; these two points, in turn, lead to the less coherent organization and development of the MT than that of the HT; 3) novel to traditional and previous studies, Rhemes do contribute to textual connection and coherence though less than Themes do and thus are worthy of notice in further studies. Hence, the findings suggest that Theme-Rheme structure be applied to measuring and assessing the coherence of machine translation, to being incorporated into the training of the machine translation system, and Rheme be taken into account when studying the textual coherence of both MT and HT.

Keywords: coherence, corpus-based, literary translation, machine translation, Theme-Rheme structure

Procedia PDF Downloads 170
26715 Investigating Iraqi EFL University Students' Productive Knowledge of Grammatical Collocations in English

Authors: Adnan Z. Mkhelif

Abstract:

Grammatical collocations (GCs) are word combinations containing a preposition or a grammatical structure, such as an infinitive (e.g. smile at, interested in, easy to learn, etc.). Such collocations tend to be difficult for Iraqi EFL university students (IUS) to master. To help address this problem, it is important to identify the factors causing it. This study aims at investigating the effects of L2 proficiency, frequency of GCs and their transparency on IUSs’ productive knowledge of GCs. The study involves 112 undergraduate participants with different proficiency levels, learning English in formal contexts in Iraq. The data collection instruments include (but not limited to) a productive knowledge test (designed by the researcher using the British National Corpus (BNC)), as well as the grammar part of the Oxford Placement Test (OPT). The study findings have shown that all the above-mentioned factors have significant effects on IUSs’ productive knowledge of GCs. In addition to establishing evidence of which factors of L2 learning might be relevant to learning GCs, it is hoped that the findings of the present study will contribute to more effective methods of teaching that can better address and help overcome the problems IUSs encounter in learning GCs. The study is thus hoped to have significant theoretical and pedagogical implications for researchers, syllabus designers as well as teachers of English as a foreign/second language.

Keywords: corpus linguistics, frequency, grammatical collocations, L2 vocabulary learning, productive knowledge, proficiency, transparency

Procedia PDF Downloads 218
26714 A Study of the Use of Arguments in Nominalizations as Instanciations of Grammatical Metaphors Finished in -TION in Academic Texts of Native Speakers

Authors: Giovana Perini-Loureiro

Abstract:

The purpose of this research was to identify whether the nominalizations terminating in -TION in the academic discourse of native English speakers contain the arguments required by their input verbs. In the perspective of functional linguistics, ideational metaphors, with nominalization as their most pervasive realization, are lexically dense, and therefore frequent in formal texts. Ideational metaphors allow the academic genre to instantiate objectification, de-personalization, and the ability to construct a chain of arguments. The valence of those nouns present in nominalizations tends to maintain the same elements of the valence from its original verbs, but these arguments are not always expressed. The initial hypothesis was that these arguments would also be present alongside the nominalizations, through anaphora or cataphora. In this study, a qualitative analysis of the occurrences of the five more frequent nominalized terminations in -TION in academic texts was accomplished, and thus a verification of the occurrences of the arguments required by the original verbs. The assembling of the concordance lines was done through COCA (Corpus of Contemporary American English). After identifying the five most frequent nominalizations (attention, action, participation, instruction, intervention), the concordance lines were selected at random to be analyzed, assuring the representativeness and reliability of the sample. It was possible to verify, in all the analyzed instances, the presence of arguments. In most instances, the arguments were not expressed, but recoverable, either in the context or in the shared knowledge among the interactants. It was concluded that the realizations of the arguments which were not expressed alongside the nominalizations are part of a continuum, starting from the immediate context with anaphora and cataphora; up to a knowledge shared outside the text, such as specific area knowledge. The study also has implications for the teaching of academic writing, especially with regards to the impact of nominalizations on the thematic and informational flow of the text. Grammatical metaphors are essential to academic writing, hence acknowledging the occurrence of its arguments is paramount to achieve linguistic awareness and the writing prestige required by the academy.

Keywords: corpus, functional linguistics, grammatical metaphors, nominalizations, academic English

Procedia PDF Downloads 120
26713 Verbal Prefix Selection in Old Japanese: A Corpus-Based Study

Authors: Zixi You

Abstract:

There are a number of verbal prefixes in Old Japanese. However, the selection or the compatibility of verbs and verbal prefixes is among the least investigated topics on Old Japanese language. Unlike other types of prefixes, verbal prefixes in dictionaries are more often than not listed with very brief information such as ‘unknown meaning’ or ‘rhythmic function only’. To fill in a part of this knowledge gap, this paper presents an exhaustive investigation based on the newly developed ‘Oxford Corpus of Old Japanese’ (OCOJ), which included nearly all existing resource of Old Japanese language, with detailed linguistics information in TEI-XML tags. In this paper, we propose the possibility that the following three prefixes, i-, sa-, ta- (with ta- being considered as a variation of sa-), are relevant to split intransitivity in Old Japanese, with evidence that unergative verbs favor i- and that unergative verbs favor sa-(ta-). This might be undermined by the fact that transitives are also found to follow i-. However, with several manifestations of split intransitivity in Old Japanese discussed, the behavior of transitives in verbal prefix selection is no longer as surprising as it may seem to be when one look at the selection of verbal prefix in isolation. It is possible that there are one or more features that played essential roles in determining the selection of i-, and the attested transitive verbs happen to have these features. The data suggest that this feature is a sense of ‘change’ of location or state involved in the event donated by the verb, which is a feature of typical unaccusatives. This is further discussed in the ‘affectedness’ hierarchy. The presentation of this paper, which includes a brief demonstration of the OCOJ, is expected to be of the interest of both specialists and general audiences.

Keywords: old Japanese, split intransitivity, unaccusatives, unergatives, verbal prefix selection

Procedia PDF Downloads 381
26712 Alignment and Antagonism in Flux: A Diachronic Sentiment Analysis of Attitudes towards the Chinese Mainland in the Hong Kong Press

Authors: William Feng, Qingyu Gao

Abstract:

Despite the extensive discussions about Hong Kong’s sentiments towards the Chinese Mainland since the sovereignty transfer in 1997, there has been no large-scale empirical analysis of the changing attitudes in the mainstream media, which both reflect and shape sentiments in the society. To address this gap, the present study uses an optimised semantic-based automatic sentiment analysis method to examine a corpus of news about China from 1997 to 2020 in three main Chinese-language newspapers in Hong Kong, namely Apple Daily, Ming Pao, and Oriental Daily News. The analysis shows that although the Hong Kong press had a positive emotional tone toward China in general, the overall trend of sentiment was becoming increasingly negative. Meanwhile, the alignment and antagonism toward China have both increased, providing empirical evidence of attitudinal polarisation in the Hong Kong society. Specifically, Apple Daily’s depictions of China have become increasingly negative, though with some positive turns before 2008, whilst Oriental Daily News has consistently expressed more favourable sentiments. Ming Pao maintained an impartial stance toward China through an increased but balanced representation of positive and negative sentiments, with its subjectivity and sentiment intensity growing to an industry-standard level. The results provide new insights into the complexity of sentiments towards China in the Hong Kong press and media attitudes in general in terms of the “us” and “them” positioning by explicating the cross-newspaper and cross-period variations using an enhanced sentiment analysis method which incorporates sentiment-oriented and semantic role analysis techniques.

Keywords: media attitude, sentiment analysis, Hong Kong press, one country two systems

Procedia PDF Downloads 72
26711 Hierarchical Tree Long Short-Term Memory for Sentence Representations

Authors: Xiuying Wang, Changliang Li, Bo Xu

Abstract:

A fixed-length feature vector is required for many machine learning algorithms in NLP field. Word embeddings have been very successful at learning lexical information. However, they cannot capture the compositional meaning of sentences, which prevents them from a deeper understanding of language. In this paper, we introduce a novel hierarchical tree long short-term memory (HTLSTM) model that learns vector representations for sentences of arbitrary syntactic type and length. We propose to split one sentence into three hierarchies: short phrase, long phrase and full sentence level. The HTLSTM model gives our algorithm the potential to fully consider the hierarchical information and long-term dependencies of language. We design the experiments on both English and Chinese corpus to evaluate our model on sentiment analysis task. And the results show that our model outperforms several existing state of the art approaches significantly.

Keywords: deep learning, hierarchical tree long short-term memory, sentence representation, sentiment analysis

Procedia PDF Downloads 324
26710 Political Communication in Twitter Interactions between Government, News Media and Citizens in Mexico

Authors: Jorge Cortés, Alejandra Martínez, Carlos Pérez, Anaid Simón

Abstract:

The presence of government, news media, and general citizenry in social media allows considering interactions between them as a form of political communication (i.e. the public exchange of contradictory discourses about politics). Twitter’s asymmetrical following model (users can follow, mention or reply to other users that do not follow them) could foster alternative democratic practices and have an impact on Mexican political culture, which has been marked by a lack of direct communication channels between these actors. The research aim is to assess Twitter’s role in political communication practices through the analysis of interaction dynamics between government, news media, and citizens by extracting and visualizing data from Twitter’s API to observe general behavior patterns. The hypothesis is that regardless the fact that Twitter’s features enable direct and horizontal interactions between actors, users repeat traditional dynamics of interaction, without taking full advantage of the possibilities of this medium. Through an interdisciplinary team including Communication Strategies, Information Design, and Interaction Systems, the activity on Twitter generated by the controversy over the presence of Uber in Mexico City was analysed; an issue of public interest, involving aspects such as public opinion, economic interests and a legal dimension. This research includes techniques from social network analysis (SNA), a methodological approach focused on the comprehension of the relationships between actors through the visual representation and measurement of network characteristics. The analysis of the Uber event comprised data extraction, data categorization, corpus construction, corpus visualization and analysis. On the recovery stage TAGS, a Google Sheet template, was used to extract tweets that included the hashtags #UberSeQueda and #UberSeVa, posts containing the string Uber and tweets directed to @uber_mx. Using scripts written in Python, the data was filtered, discarding tweets with no interaction (replies, retweets or mentions) and locations outside of México. Considerations regarding bots and the omission of anecdotal posts were also taken into account. The utility of graphs to observe interactions of political communication in general was confirmed by the analysis of visualizations generated with programs such as Gephi and NodeXL. However, some aspects require improvements to obtain more useful visual representations for this type of research. For example, link¬crossings complicates following the direction of an interaction forcing users to manipulate the graph to see it clearly. It was concluded that some practices prevalent in political communication in Mexico are replicated in Twitter. Media actors tend to group together instead of interact with others. The political system tends to tweet as an advertising strategy rather than to generate dialogue. However, some actors were identified as bridges establishing communication between the three spheres, generating a more democratic exercise and taking advantage of Twitter’s possibilities. Although interactions in Twitter could become an alternative to political communication, this potential depends on the intentions of the participants and to what extent they are aiming for collaborative and direct communications. Further research is needed to get a deeper understanding on the political behavior of Twitter users and the possibilities of SNA for its analysis.

Keywords: interaction, political communication, social network analysis, Twitter

Procedia PDF Downloads 188
26709 How Unicode Glyphs Revolutionized the Way We Communicate

Authors: Levi Corallo

Abstract:

Typed language made by humans on computers and cell phones has made a significant distinction from previous modes of written language exchanges. While acronyms remain one of the most predominant markings of typed language, another and perhaps more recent revolution in the way humans communicate has been with the use of symbols or glyphs, primarily Emojis—globally introduced on the iPhone keyboard by Apple in 2008. This paper seeks to analyze the use of symbols in typed communication from both a linguistic and machine learning perspective. The Unicode system will be explored and methods of encoding will be juxtaposed with the current machine and human perception. Topics in how typed symbol usage exists in conversation will be explored as well as topics across current research methods dealing with Emojis like sentiment analysis, predictive text models, and so on. This study proposes that sequential analysis is a significant feature for analyzing unicode characters in a corpus with machine learning. Current models that are trying to learn or translate the meaning of Emojis should be starting to learn using bi- and tri-grams of Emoji, as well as observing the relationship between combinations of different Emoji in tandem. The sociolinguistics of an entire new vernacular of language referred to here as ‘typed language’ will also be delineated across my analysis with unicode glyphs from both a semantic and technical perspective.

Keywords: unicode, text symbols, emojis, glyphs, communication

Procedia PDF Downloads 167
26708 Transitivity Analysis in Reading Passage of English Text Book for Senior High School

Authors: Elitaria Bestri Agustina Siregar, Boni Fasius Siregar

Abstract:

The paper concerned with the transitivity in the reading passage of English textbook for Senior High School. The six types of process were occurred in the passages with percentage as follows: Material Process is 166 (42%), Relational Process is 155 (39%), Mental Process is 39 (10%), Verbal Process is 21 (5%), Existential Process is 13 (3), and Behavioral Process is 5 (1%). The material processes were found to be the most frequently used process type in the samples in our corpus (41,60 %). This indicates that the twenty reading passages are centrally concerned with action and events. Related to developmental psychology theory, this book fits the needs of students of this age.

Keywords: transitivity, types of processes, reading passages, developmental psycholoy

Procedia PDF Downloads 368
26707 Language as an Instrument of Manipulation and Political Control in Nigeria: The 2015 Presidential Election in Perspective

Authors: Abdulmalik Adamu

Abstract:

This study is premised on the assumption that language, particularly, English plays a significant role in the acquisition of power in Nigeria. This is against the backdrop of the fact that for the first time in the political history of Nigeria, an opposition party succeeded in dethroning an incumbent President and ruling political party in an election. Therefore the main objective was to investigate the role of language, particularly English in the acquisition of political power in Nigeria. The corpus generated for this study consisted of excerpts from the media exchange between the spokespersons of the two dominant political parties at the time of the elections in 2015; Olisa Metuh of the Peoples Democratic Party (PDP) and Lai Mohammed of the All Progressive Party (APC). The excerpts were analysed using Critical Discourse Analysis (CDA) as a research tool. The findings revealed the acceptance of the first proposition that English facilitates the acquisition of political power in Nigeria and the rejection of the second proposition that English is an instrument for the exclusion of the populist from political events in Nigeria. The study, therefore, concluded that language, particularly English played a significant role in the acquisition of political power in Nigeria.

Keywords: language, power, politics, Critical Discourse Analysis (CDA)

Procedia PDF Downloads 366
26706 Cognitive Translation and Conceptual Wine Tasting Metaphors: A Corpus-Based Research

Authors: Christine Demaecker

Abstract:

Many researchers have underlined the importance of metaphors in specialised language. Their use of specific domains helps us understand the conceptualisations used to communicate new ideas or difficult topics. Within the wide area of specialised discourse, wine tasting is a very specific example because it is almost exclusively metaphoric. Wine tasting metaphors express various conceptualisations. They are not linguistic but rather conceptual, as defined by Lakoff & Johnson. They correspond to the linguistic expression of a mental projection from a well-known or more concrete source domain onto the target domain, which is the taste of wine. But unlike most specialised terminologies, the vocabulary is never clearly defined. When metaphorical terms are listed in dictionaries, their definitions remain vague, unclear, and circular. They cannot be replaced by literal linguistic expressions. This makes it impossible to transfer them into another language with the traditional linguistic translation methods. Qualitative research investigates whether wine tasting metaphors could rather be translated with the cognitive translation process, as well described by Nili Mandelblit (1995). The research is based on a corpus compiled from two high-profile wine guides; the Parker’s Wine Buyer’s Guide and its translation into French and the Guide Hachette des Vins and its translation into English. In this small corpus with a total of 68,826 words, 170 metaphoric expressions have been identified in the original English text and 180 in the original French text. They have been selected with the MIPVU Metaphor Identification Procedure developed at the Vrije Universiteit Amsterdam. The selection demonstrates that both languages use the same set of conceptualisations, which are often combined in wine tasting notes, creating conceptual integrations or blends. The comparison of expressions in the source and target texts also demonstrates the use of the cognitive translation approach. In accordance with the principle of relevance, the translation always uses target language conceptualisations, but compared to the original, the highlighting of the projection is often different. Also, when original metaphors are complex with a combination of conceptualisations, at least one element of the original metaphor underlies the target expression. This approach perfectly integrates into Lederer’s interpretative model of translation (2006). In this triangular model, the transfer of conceptualisation could be included at the level of ‘deverbalisation/reverbalisation’, the crucial stage of the model, where the extraction of meaning combines with the encyclopedic background to generate the target text.

Keywords: cognitive translation, conceptual integration, conceptual metaphor, interpretative model of translation, wine tasting metaphor

Procedia PDF Downloads 95
26705 Time and Cost Prediction Models for Language Classification Over a Large Corpus on Spark

Authors: Jairson Barbosa Rodrigues, Paulo Romero Martins Maciel, Germano Crispim Vasconcelos

Abstract:

This paper presents an investigation of the performance impacts regarding the variation of five factors (input data size, node number, cores, memory, and disks) when applying a distributed implementation of Naïve Bayes for text classification of a large Corpus on the Spark big data processing framework. Problem: The algorithm's performance depends on multiple factors, and knowing before-hand the effects of each factor becomes especially critical as hardware is priced by time slice in cloud environments. Objectives: To explain the functional relationship between factors and performance and to develop linear predictor models for time and cost. Methods: the solid statistical principles of Design of Experiments (DoE), particularly the randomized two-level fractional factorial design with replications. This research involved 48 real clusters with different hardware arrangements. The metrics were analyzed using linear models for screening, ranking, and measurement of each factor's impact. Results: Our findings include prediction models and show some non-intuitive results about the small influence of cores and the neutrality of memory and disks on total execution time, and the non-significant impact of data input scale on costs, although notably impacts the execution time.

Keywords: big data, design of experiments, distributed machine learning, natural language processing, spark

Procedia PDF Downloads 82
26704 The Influence of Screen Translation on Creative Audiovisual Writing: A Corpus-Based Approach

Authors: John D. Sanderson

Abstract:

The popularity of American cinema worldwide has contributed to the development of sociolects related to specific film genres in other cultural contexts by means of screen translation, in many cases eluding norms of usage in the target language, a process whose result has come to be known as 'dubbese'. A consequence for the reception in countries where local audiovisual fiction consumption is far lower than American imported productions is that this linguistic construct is preferred, even though it differs from common everyday speech. The iconography of film genres such as science-fiction, western or sword-and-sandal films, for instance, generates linguistic expectations in international audiences who will accept more easily the sociolects assimilated by the continuous reception of American productions, even if the themes, locations, characters, etc., portrayed on screen may belong in origin to other cultures. And the non-normative language (e.g., calques, semantic loans) used in the preferred mode of linguistic transfer, whether it is translation for dubbing or subtitling, has diachronically evolved in many cases into a status of canonized sociolect, not only accepted but also required, by foreign audiences of American films. However, a remarkable step forward is taken when this typology of artificial linguistic constructs starts being used creatively by nationals of these target cultural contexts. In the case of Spain, the success of American sitcoms such as Friends in the 1990s led Spanish television scriptwriters to include in national productions lexical and syntactical indirect borrowings (Anglicisms not formally identifiable as such because they include elements from their own language) in order to target audiences of the former. However, this commercial strategy had already taken place decades earlier when Spain became a favored location for the shooting of foreign films in the early 1960s. The international popularity of the then newly developed sub-genre known as Spaghetti-Western encouraged Spanish investors to produce their own movies, and local scriptwriters made use of the dubbese developed nationally since the advent of sound in film instead of using normative language. As a result, direct Anglicisms, as well as lexical and syntactical borrowings made up the creative writing of these Spanish productions, which also became commercially successful. Interestingly enough, some of these films were even marketed in English-speaking countries as original westerns (some of the names of actors and directors were anglified to that purpose) dubbed into English. The analysis of these 'back translations' will also foreground some semantic distortions that arose in the process. In order to perform the research on these issues, a wide corpus of American films has been used, which chronologically range from Stagecoach (John Ford, 1939) to Django Unchained (Quentin Tarantino, 2012), together with a shorter corpus of Spanish films produced during the golden age of Spaghetti Westerns, from una tumba para el sheriff (Mario Caiano; in English lone and angry man, William Hawkins) to tu fosa será la exacta, amigo (Juan Bosch, 1972; in English my horse, my gun, your widow, John Wood). The methodology of analysis and the conclusions reached could be applied to other genres and other cultural contexts.

Keywords: dubbing, film genre, screen translation, sociolect

Procedia PDF Downloads 130
26703 Understanding Consumer Recycling Behavior: A Literature Review of Motivational and Behavioral Aspects

Authors: Karin Johansson, Ola Johansson

Abstract:

Recycling is an important aspect of a sustainable society and depends to a large extent on consumers’ willingness to provide the voluntary work needed to take the first critical step in many return logistics systems. Based on a systematic review of articles on recycling behavior, this paper presents and discusses the findings in relation to Fogg’s Behavioral Model (FBM). Through the analysis of a corpus of 72 articles, the most important research contributions on recycling behavior are summarized and discussed. The choice of using FBM as a framework provides a new way of viewing previous research findings, and aids in identifying knowledge gaps. Based on the review, this work identifies and discusses four areas of potential interest for future research.

Keywords: recycling, reverse logistics, solid waste management, sustainability

Procedia PDF Downloads 113
26702 The Role and Effects of Communication on Occupational Safety: A Review

Authors: Pieter A. Cornelissen, Joris J. Van Hoof

Abstract:

The interest in improving occupational safety started almost simultaneously with the beginning of the Industrial Revolution. Yet, it was not until the late 1970’s before the role of communication was considered in scientific research regarding occupational safety. In recent years the importance of communication as a means to improve occupational safety has increased. Not only as communication might have a direct effect on safety performance and safety outcomes, but also as it can be viewed as a major component of other important safety-related elements (e.g., training, safety meetings, leadership). And while safety communication is an increasingly important topic in research, its operationalization is often vague and differs among studies. This is not only problematic when comparing results, but also in applying these results to practice and the work floor. By means of an in-depth analysis—building on an existing dataset—this review aims to overcome these problems. The initial database search yielded 25.527 articles, which was reduced to a research corpus of 176 articles. Focusing on the 37 articles of this corpus that addressed communication (related to safety outcomes and safety performance), the current study will provide a comprehensive overview of the role and effects of safety communication and outlines the conditions under which communication contributes to a safer work environment. The study shows that in literature a distinction is commonly made between safety communication (i.e., the exchange or dissemination of safety-related information) and feedback (i.e. a reactive form of communication). And although there is a consensus among researchers that both communication and feedback positively affect safety performance, there is a debate about the directness of this relationship. Whereas some researchers assume a direct relationship between safety communication and safety performance, others state that this relationship is mediated by safety climate. One of the key findings is that despite the strongly present view that safety communication is a formal and top-down safety management tool, researchers stress the importance of open communication that encourages and allows employees to express their worries, experiences, views, and share information. This raises questions with regard to other directions (e.g., bottom-up, horizontal) and forms of communication (e.g., informal). The current review proposes a framework to overcome the often vague and different operationalizations of safety communication. The proposed framework can be used to characterize safety communication in terms of stakeholders, direction, and characteristics of communication (e.g., medium usage).

Keywords: communication, feedback, occupational safety, review

Procedia PDF Downloads 265
26701 Machine Learning-Based Workflow for the Analysis of Project Portfolio

Authors: Jean Marie Tshimula, Atsushi Togashi

Abstract:

We develop a data-science approach for providing an interactive visualization and predictive models to find insights into the projects' historical data in order for stakeholders understand some unseen opportunities in the African market that might escape them behind the online project portfolio of the African Development Bank. This machine learning-based web application identifies the market trend of the fastest growing economies across the continent as well skyrocketing sectors which have a significant impact on the future of business in Africa. Owing to this, the approach is tailored to predict where the investment needs are the most required. Moreover, we create a corpus that includes the descriptions of over more than 1,200 projects that approximately cover 14 sectors designed for some of 53 African countries. Then, we sift out this large amount of semi-structured data for extracting tiny details susceptible to contain some directions to follow. In the light of the foregoing, we have applied the combination of Latent Dirichlet Allocation and Random Forests at the level of the analysis module of our methodology to highlight the most relevant topics that investors may focus on for investing in Africa.

Keywords: machine learning, topic modeling, natural language processing, big data

Procedia PDF Downloads 148
26700 Rethinking Literary Language: A Philsophicus-Logico Approach. The Novel ‘’ Sympathizer ‘’ as a Case Study

Authors: Oublal Ali

Abstract:

Due scholarly attention given to Ludwig Wittgenstein since the appearance of Tractatus is resulted from revolutionary shift he has made in the conception of language. True, his first and foremost concern was to solve the issue of language philosophers failed to recognize. Not only Tracturain’s approach to language that argues for philosophers failure of understanding the logic of language, but also his later conception which is developed in philosophical investigations and the reminder of all his remarks. On such a basis, it is claimed that Wittgenstein’s theory of language should not be confined to the language within philosophical streams with this premise we therefore propose to analytically read one of the literary propositions in the sympathizer as linguistic corpus. Our investigation of the literary proposition weaves us into claiming that Wittgenstein’s language games -later philosophy- is apposite to the analysis of literary works thanks to the shift Wittgenstein has made from demarcated use of language to the multiplicity and non-uniformity of its use.

Keywords: language, context, use, language games, literary propositions

Procedia PDF Downloads 81
26699 Synaesthetic Metaphors in Persian: a Cognitive Corpus Based and Comparative Perspective

Authors: A. Afrashi

Abstract:

Introduction: Synaesthesia is a term denoting the perception or description of the perception of one sense modality in terms of another. In literature, synaesthesia refers to a technique adopted by writers to present ideas, characters or places in such a manner that they appeal to more than one sense like hearing, seeing, smell etc. at a given time. In everyday language too we find many examples of synaesthesia. We commonly hear phrases like ‘loud colors’, ‘frozen silence’ and ‘warm colors’, ‘bitter cold’ etc. Empirical cognitive studies have proved that synaesthetic representations both in literature and everyday languages are constrained ie. they do not map randomly among sensory domains. From the beginning of the 20th century Synaesthesia has been a research domain both in literature and structural linguistics. However the exploration of cognitive mechanisms motivating synaesthesia, have made it an important topic in 21st century cognitive linguistics and literary studies. Synaesthetic metaphors are linguistic representations of those mental mechanisms, the study of which reveals invaluable facts about perception, cognition and conceptualization. According to the main tenets of cognitive approach to language and literature, unified and similar cognitive mechanisms are active both in everyday language and literature, and synaesthesia is one of those cognitive mechanisms. Main objective of the present research is to answer the following questions: What types of sense transfers are accessible in Persian synaesthetic metaphors. How are these types of sense transfers cognitively explained. What are the results of cross-linguistic comparative study of synaestetic metaphors based on the existing observations? Methodology: The present research employs a cognitive - corpus based method, and the theoretical framework adopted to analyze linguistic synaesthesia is the contemporary theory of metaphor, where conceptual metaphor is the result of systemic mappings across cognitive domains. Persian Language Data- base (PLDB) in the Institute for Humanities and Cultural Studies which consists mainly of Persian modern prose, is searched for synaesthetic metaphors. Then for each metaphorical structure, the source and target domains are determined. Then sense transfers are identified and the types of synaesthetic metaphors recognized. Findings: Persian synaesthetic metaphors conform to the hierarchical distribution principle, according to which transfers tend to go from touch to taste to smell to sound and to sight, not vice versa. In other words mapping from more accessible or basic concepts onto less accessible or less basic ones seems more natural. Furthermore the most frequent target domain in Persian synaesthetic metaphors is sound. Certain characteristics of Persian synaesthetic metaphors are comparable with existing related researches carried on English, French, Hungarian and Chinese synaesthetic metaphors. Conclusion: Cognitive corpus based approaches to linguistic synaesthesia, are applicable to stylistics and literary criticism and this recent research domain is an efficient approach to study cross linguistic variations to find out which of the five senses is dominant cross linguistically and cross culturally as the target domain in metaphorical mappings , and so forth receiving dominance in conceptualizations.

Keywords: cognitive semantics, conceptual metaphor, synaesthesia, corpus based approach

Procedia PDF Downloads 534
26698 Terrorism in German and Italian Press Headlines: A Cognitive Linguistic Analysis of Conceptual Metaphors

Authors: Silvia Sommella

Abstract:

Islamic terrorism has gained a lot of media attention in the last years also because of the striking increase of terror attacks since 2014. The main aim of this paper is to illustrate the phenomenon of Islamic terrorism by applying frame semantics and metaphor analysis to German and Italian press headlines of the two online weekly publications Der Spiegel and L’Espresso between 2014 and 2019. This study focuses on how media discourse – through the use of conceptual metaphors – let arise in people a particular reception of the phenomenon of Islamic terrorism and accept governmental strategies and policies, perceiving terrorists as evildoers, as the members of an uncivilised group ‘other’ opposed to the civilised group ‘we’: two groups that are perceived as opposed. The press headlines are analyzed on the basis of the cognitive linguistics, namely Lakoff and Johnson’s conceptualization of metaphor to distinguish between abstract conceptual metaphors and specific metaphorical expressions. The study focuses on the contexts, frames, and metaphors. The method adopted in this study is Konerding’s frame semantics (1993). Konerding carried out on the basis of dictionaries – in particular of the Duden Deutsches Universalwörterbuch (Duden Universal German Dictionary) – in a pilot study of a lexicological work hyperonym reduction of substantives, working exclusively with nouns because hyperonyms usually occur in the dictionary meaning explanations as for the main elements of nominal phrases. The results of Konerding’s hyperonym type reduction is a small set of German nouns and they correspond to the highest hyperonyms, the so-called categories, matrix frames: ‘object’, ‘organism’, ‘person/actant’, ‘event’, ‘action/interaction/communication’, ‘institution/social group’, ‘surroundings’, ‘part/piece’, ‘totality/whole’, ‘state/property’. The second step of Konerding’s pilot study consists in determining the potential reference points of each category so that conventionally expectable routinized predications arise as predictors. Konerding found out which predicators the ascertained noun types can be linked to. For the purpose of this study, metaphorical expressions will be listed and categorized in conceptual metaphors and under the matrix frames that correspond to the particular conceptual metaphor. All of the corpus analyses are carried out using Ant Conc corpus software. The research will verify some previously analyzed metaphors such as TERRORISM AS WAR, A CRIME, A NATURAL EVENT, A DISEASE and will identify new conceptualizations and metaphors about Islamic terrorism, especially in the Italian language like TERRORISM AS A GAME, WARES, A DRAMATIC PLAY. Through the identification of particular frames and their construction, the research seeks to understand the public reception and the way to handle the discourse about Islamic terrorism in the above mentioned online weekly publications under a contrastive analysis in the German and in the Italian language.

Keywords: cognitive linguistics, frame semantics, Islamic terrorism, media

Procedia PDF Downloads 143