Search results for: spoken corpus
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 592

Search results for: spoken corpus

502 Track and Evaluate Cortical Responses Evoked by Electrical Stimulation

Authors: Kyosuke Kamada, Christoph Kapeller, Michael Jordan, Mostafa Mohammadpour, Christy Li, Christoph Guger

Abstract:

Cortico-cortical evoked potentials (CCEP) refer to responses generated by cortical electrical stimulation at distant brain sites. These responses provide insights into the functional networks associated with language or motor functions, and in the context of epilepsy, they can reveal pathological networks. Locating the origin and spread of seizures within the cortex is crucial for pre-surgical planning. This process can be enhanced by employing cortical stimulation at the seizure onset zone (SOZ), leading to the generation of CCEPs in remote brain regions that may be targeted for disconnection. In the case of a 24-year-old male patient suffering from intractable epilepsy, corpus callosotomy was performed as part of the treatment. DTI-MRI imaging, conducted using a 3T MRI scanner for fiber tracking, along with CCEP, is used as part of an assessment for surgical planning. Stimulation of the SOZ, with alternating monophasic pulses of 300µs duration and 15mA current intensity, resulted in CCEPs on the contralateral frontal cortex, reaching a peak amplitude of 206µV with a latency of 31ms, specifically in the left pars triangularis. The related fiber tracts were identified with a two-tensor unscented Kalman filter (UKF) technique, showing transversal fibers through the corpus callosum. The CCEPs were monitored through the progress of the surgery. Notably, the SOZ-associated CCEPs exhibited a reduction following the resection of the anterior portion of the corpus callosum, reaching the identified connecting fibers. This intervention demonstrated a potential strategy for mitigating the impact of intractable epilepsy through targeted disconnection of identified cortical regions.

Keywords: CCEP, SOZ, Corpus callosotomy, DTI

Procedia PDF Downloads 36
501 The Development of Chinese-English Homophonic Word Pairs Databases for English Teaching and Learning

Authors: Yuh-Jen Wu, Chun-Min Lin

Abstract:

Homophonic words are common in Mandarin Chinese which belongs to the tonal language family. Using homophonic cues to study foreign languages is one of the learning techniques of mnemonics that can aid the retention and retrieval of information in the human memory. When learning difficult foreign words, some learners transpose them with words in a language they are familiar with to build an association and strengthen working memory. These phonological clues are beneficial means for novice language learners. In the classroom, if mnemonic skills are used at the appropriate time in the instructional sequence, it may achieve their maximum effectiveness. For Chinese-speaking students, proper use of Chinese-English homophonic word pairs may help them learn difficult vocabulary. In this study, a database program is developed by employing Visual Basic. The database contains two corpora, one with Chinese lexical items and the other with English ones. The Chinese corpus contains 59,053 Chinese words that were collected by a web crawler. The pronunciations of this group of words are compared with words in an English corpus based on WordNet, a lexical database for the English language. Words in both databases with similar pronunciation chunks and batches are detected. A total of approximately 1,000 Chinese lexical items are located in the preliminary comparison. These homophonic word pairs can serve as a valuable tool to assist Chinese-speaking students in learning and memorizing new English vocabulary.

Keywords: Chinese, corpus, English, homophonic words, vocabulary

Procedia PDF Downloads 157
500 Laundering vs. Blanqueo: Translating Financial Crime Metaphors From English to Spanish

Authors: Stephen Gerome

Abstract:

This study examines the translation and use of metaphors in the realm of public safety discourse and intends to shed light on a continuing problem in cross-cultural communication. Metaphors can cause problems not only within languages but also in interlingual communication. The use and misuse of metaphors may hinder the ability to adequately communicate prevention efforts and, in some cases, facilitate and allow financial crime to go undetected. The use of lexicalized metaphors in communications by political entities, journalists, and legal agents in communications regarding law, policy making, compliance monitoring and enforcement as well as in adjudication can have negative consequences if misconstrued. This study provides examples of metaphor usage in published documents in a corpus linguistic study that compares the use of lexicalized metaphors in this discourse to shed light on possible unexpected consequences as well as counterproductive ones.

Keywords: translation, legal, corpus linguistics, financial

Procedia PDF Downloads 86
499 The Mirage of Progress? a Longitudinal Study of Japanese Students’ L2 Oral Grammar

Authors: Robert Long, Hiroaki Watanabe

Abstract:

This longitudinal study examines the grammatical errors of Japanese university students’ dialogues with a native speaker over an academic year. The L2 interactions of 15 Japanese speakers were taken from the JUSFC2018 corpus (April/May 2018) and the JUSFC2019 corpus (January/February). The corpora were based on a self-introduction monologue and a three-question dialogue; however, this study examines the grammatical accuracy found in the dialogues. Research questions focused on a possible significant difference in grammatical accuracy from the first interview session in 2018 and the second one the following year, specifically regarding errors in clauses per 100 words, global errors and local errors, and with specific errors related to parts of speech. The investigation also focused on which forms showed the least improvement or had worsened? Descriptive statistics showed that error-free clauses/errors per 100 words decreased slightly while clauses with errors/100 words increased by one clause. Global errors showed a significant decline, while local errors increased from 97 to 158 errors. For errors related to parts of speech, a t-test confirmed there was a significant difference between the two speech corpora with more error frequency occurring in the 2019 corpus. This data highlights the difficulty in having students self-edit themselves.

Keywords: clause analysis, global vs. local errors, grammatical accuracy, L2 output, longitudinal study

Procedia PDF Downloads 111
498 Morpho-Syntactic Pattern in Maithili Urdu

Authors: Mohammad Jahangeer Warsi

Abstract:

This is, perhaps, the first linguistic study of Maithili Urdu, a dialect of Urdu language of Indo-Aryan family, spoken by around four million speakers in Darbhanga, Samastipur, Begusarai, Madhubani, and Muzafarpur districts of Bihar. It has the subject–verb–object (SOV) word order and it lacks script and literature. Needless to say, this work is an attempt to document this dialect so that it should contribute to the field of descriptive linguistics. Besides, it is also spoken by majority of Maithili diaspora community. Maithili Urdu does not have its own script or literature, yet it has maintained an oral history of over many centuries. It has contributed to enriching the Maithili, Hindi and Urdu languages and literature very profoundly. Dialects are the contact languages of particular regions, and they have a deep impact on their cultural heritage. Slowly with time, these dialects begin to take shape of languages. The convergence of a dialect into a language is a symbol and pride of the people who speak it. Although, confined to the five districts of northern Bihar, yet highly popular among the natives, it is the primary mode of communication of the local Muslims. The paper will focus on the structure of expressions about Maithili Urdu that include the structure of words, phrases, clauses, and sentences. There are clear differences in linguistic features of Maithili Urdu vis-à-vis Urdu, Maithili and Hindi. Though being a dialect of Urdu, interestingly, there is only one second person pronoun tu and lack of agentive marker –ne. Although being spoken in the vicinity of Hindi, Urdu and Maithili, it undoubtedly has its own linguistic features, of them, verb conjugation is remarkably unique. Because of the oral tradition of this link language, intonation has become significantly prominent. This paper will discuss the morpho-syntactic pattern of Maithili Urdu and will go through a sample text to authenticate the findings.

Keywords: cultural heritage, morpho-syntactic pattern, Maithili Urdu, verb conjugation

Procedia PDF Downloads 185
497 Tracing the Developmental Repertoire of the Progressive: Evidence from L2 Construction Learning

Authors: Tianqi Wu, Min Wang

Abstract:

Research investigating language acquisition from a constructionist perspective has demonstrated that language is learned as constructions at various linguistic levels, which is related to factors of frequency, semantic prototypicality, and form-meaning contingency. However, previous research on construction learning tended to focus on clause-level constructions such as verb argument constructions but few attempts were made to study morpheme-level constructions such as the progressive construction, which is regarded as a source of acquisition problems for English learners from diverse L1 backgrounds, especially for those whose L1 do not have an equivalent construction such as German and Chinese. To trace the developmental trajectory of Chinese EFL learners’ use of the progressive with respect to verb frequency, verb-progressive contingency, and verbal prototypicality and generality, a learner corpus consisting of three sub-corpora representing three different English proficiency levels was extracted from the Chinese Learners of English Corpora (CLEC). As the reference point, a native speakers’ corpus extracted from the Louvain Corpus of Native English Essays was also established. All the texts were annotated with C7 tagset by part-of-speech tagging software. After annotation all valid progressive hits were retrieved with AntConc 3.4.3 followed by a manual check. Frequency-related data showed that from the lowest to the highest proficiency level, (1) the type token ratio increased steadily from 23.5% to 35.6%, getting closer to 36.4% in the native speakers’ corpus, indicating a wider use of verbs in the progressive; (2) the normalized entropy value rose from 0.776 to 0.876, working towards the target score of 0.886 in native speakers’ corpus, revealing that upper-intermediate learners exhibited a more even distribution and more productive use of verbs in the progressive; (3) activity verbs (i.e., verbs with prototypical progressive meanings like running and singing) dropped from 59% to 34% but non-prototypical verbs such as state verbs (e.g., being and living) and achievement verbs (e.g., dying and finishing) were increasingly used in the progressive. Apart from raw frequency analyses, collostructional analyses were conducted to quantify verb-progressive contingency and to determine what verbs were distinctively associated with the progressive construction. Results were in line with raw frequency findings, which showed that contingency between the progressive and non-prototypical verbs represented by light verbs (e.g., going, doing, making, and coming) increased as English proficiency proceeded. These findings altogether suggested that beginning Chinese EFL learners were less productive in using the progressive construction: they were constrained by a small set of verbs which had concrete and typical progressive meanings (e.g., the activity verbs). But with English proficiency increasing, their use of the progressive began to spread to marginal members such as the light verbs.

Keywords: Construction learning, Corpus-based, Progressives, Prototype

Procedia PDF Downloads 109
496 Dialogism in Research Article Introductions Written by Iranian Non-Native and English Native Speaking Writers

Authors: Moharram Sharifi

Abstract:

Despite a growing interest in the study of the introduction section of Research Articles (RA), there have been few studies to investigate how academic writers engage with other voices and alternative positions in this academic genre. Therefore, the purpose of this study was to show how Native Speaker (NS) and (Non-Native Speaker (NNS) writers take positions and stances in research article introductions. For this purpose, Engagement resources based on the appraisal framework were investigated in sixty articles written by English NS and Iranian NNS published in applied linguistics journals. It was found that the mean occurrences of heteroglossic items in both corpora were larger than those of monoglossic items, but comparing the means of monoglossic engagements between the two corpora, it was revealed that NS writers’ corpus had larger mean occurrences of monoglossic engagements than NNS writers’ corpus implying the native’s stronger authorial stance in the texts. The results also revealed that there was no significant difference in the use of contractive and expansive engagements by NS writers (t (29) = -0.995, p>0.05), indicating a balanced use between the two options. However, the higher mean occurrences of expansive options compared with contractive options in the NNS corpus may suggest that NN writers open up more dialogic room for alternative positions in the RA introductions. The findings of this study may help writers to better perceive the creation of a strong authorial position using appropriate engagement resources in RA introductions.

Keywords: engagement, heteroglossic, monoglossic, introduction

Procedia PDF Downloads 28
495 Lennox-gastaut Syndrome Associated with Dysgenesis of Corpus Callosum

Authors: A. Bruce Janati, Muhammad Umair Khan, Naif Alghassab, Ibrahim Alzeir, Assem Mahmoud, M. Sammour

Abstract:

Rationale: Lennox-Gastaut syndrome(LGS) is an electro-clinical syndrome composed of the triad of mental retardation, multiple seizure types, and the characteristic generalized slow spike-wave complexes in the EEG. In this article, we report on two patients with LGS whose brain MRI showed dysgenesis of corpus callosum(CC). We review the literature and stress the role of CC in the genesis of secondary bilateral synchrony(SBS). Method: This was a clinical study conducted at King Khalid Hospital. Results: The EEG was consistent with LGS in patient 1 and unilateral slow spike-wave complexes in patient 2. The MRI showed hypoplasia of the splenium of CC in patient 1, and global hypoplasia of CC combined with Joubert syndrome in patient 2. Conclusion: Based on the data, we proffer the following hypotheses: 1-Hypoplasia of CC interferes with functional integrity of this structure. 2-The genu of CC plays a pivotal role in the genesis of secondary bilateral synchrony. 3-Electrodecremental seizures in LGS emanate from pacemakers generated in the brain stem, in particular the mesencephalon projecting abnormal signals to the cortex via thalamic nuclei. 4-Unilateral slow spike-wave complexes in the context of mental retardation and multiple seizure types may represent a variant of LGS, justifying neuroimaging studies.

Keywords: EEG, Lennox-Gastaut syndrome, corpus callosum , MRI

Procedia PDF Downloads 418
494 Translation Quality Assessment in Fansubbed English-Chinese Swearwords: A Corpus-Based Study of the Big Bang Theory

Authors: Qihang Jiang

Abstract:

Fansubbing, the combination of fan and subtitling, is one of the main branches of Audiovisual Translation (AVT) having kindled more and more interest of researchers into the AVT field in recent decades. In particular, the quality of so-called non-professional translation seems questionable due to the non-transparent qualification of subtitlers in a huge community network. This paper attempts to figure out how YYeTs aka 'ZiMuZu', the largest fansubbing group in China, translates swearwords from English to Chinese for its fans of the prevalent American sitcom The Big Bang Theory, taking cultural, social and political elements into account in the context of China. By building a bilingual corpus containing both the source and target texts, this paper found that most of the original swearwords were translated in a toned-down manner, probably due to Chinese audiences’ cultural and social network features as well as the strict censorship under the Chinese government. Additionally, House (2015)’s newly revised model of Translation Quality Assessment (TQA) was applied and examined. Results revealed that most of the subtitled swearwords achieved their pragmatic functions and exerted a communicative effect for audiences. In conclusion, this paper enriches the empirical research concerning House’s new TQA model, gives a full picture of the subtitling of swearwords in AVT field and provides a practical guide for the practitioners in their career of subtitling.

Keywords: corpus-based approach, fansubbing, pragmatic functions, swearwords, translation quality assessment

Procedia PDF Downloads 128
493 The Diminished Online Persona: A Semantic Change of Chinese Classifier Mei on Weibo

Authors: Hui Shi

Abstract:

This study investigates a newly emerged usage of Chinese numeral classifier mei (枚) in the cyberspace. In modern Chinese grammar, mei as a classifier should occupy the pre-nominal position, and its valid accompanying nouns are restricted to small, flat, fragile inanimate objects rather than humans. To examine the semantic change of mei, two types of data from Weibo.com were collected. First, 500 mei-included Weibo posts constructed a corpus for analyzing this classifier's word order distribution (post-nominal or pre-nominal) as well as its accompanying nouns' semantics (inanimate or human). Second, considering that mei accompanies a remarkable number of human nouns in the first corpus, the second corpus is composed of mei-involved Weibo IDs from users located in first and third-tier cities (n=8 respectively). The findings show that in the cyber community, mei frequently classifies human-related neologisms at the archaic post-normal position. Besides, the 23 to 29-year-old females as well as Weibo users from third-tier cities are the major populations who adopt mei in their user IDs for self-description and identity expression. This paper argues that the creative usage of mei gains popularity in the Chinese internet due to a humor effect. The marked word order switch and semantic misapplication combined to trigger incongruity and jocularity. This study has significance for research on Chinese cyber neologism. It may also lay a foundation for further studies on Chinese classifier change and Chinese internet communication.

Keywords: Chinese classifier, humor, neologism, semantic change

Procedia PDF Downloads 234
492 Identification of Text Domains and Register Variation through the Analysis of Lexical Distribution in a Bangla Mass Media Text Corpus

Authors: Mahul Bhattacharyya, Niladri Sekhar Dash

Abstract:

The present research paper is an experimental attempt to investigate the nature of variation in the register in three major text domains, namely, social, cultural, and political texts collected from the corpus of Bangla printed mass media texts. This present study uses a corpus of a moderate amount of Bangla mass media text that contains nearly one million words collected from different media sources like newspapers, magazines, advertisements, periodicals, etc. The analysis of corpus data reveals that each text has certain lexical properties that not only control their identity but also mark their uniqueness across the domains. At first, the subject domains of the texts are classified into two parameters namely, ‘Genre' and 'Text Type'. Next, some empirical investigations are made to understand how the domains vary from each other in terms of lexical properties like both function and content words. Here the method of comparative-cum-contrastive matching of lexical load across domains is invoked through word frequency count to track how domain-specific words and terms may be marked as decisive indicators in the act of specifying the textual contexts and subject domains. The study shows that the common lexical stock that percolates across all text domains are quite dicey in nature as their lexicological identity does not have any bearing in the act of specifying subject domains. Therefore, it becomes necessary for language users to anchor upon certain domain-specific lexical items to recognize a text that belongs to a specific text domain. The eventual findings of this study confirm that texts belonging to different subject domains in Bangla news text corpus clearly differ on the parameters of lexical load, lexical choice, lexical clustering, lexical collocation. In fact, based on these parameters, along with some statistical calculations, it is possible to classify mass media texts into different types to mark their relation with regard to the domains they should actually belong. The advantage of this analysis lies in the proper identification of the linguistic factors which will give language users a better insight into the method they employ in text comprehension, as well as construct a systemic frame for designing text identification strategy for language learners. The availability of huge amount of Bangla media text data is useful for achieving accurate conclusions with a certain amount of reliability and authenticity. This kind of corpus-based analysis is quite relevant for a resource-poor language like Bangla, as no attempt has ever been made to understand how the structure and texture of Bangla mass media texts vary due to certain linguistic and extra-linguistic constraints that are actively operational to specific text domains. Since mass media language is assumed to be the most 'recent representation' of the actual use of the language, this study is expected to show how the Bangla news texts reflect the thoughts of the society and how they leave a strong impact on the thought process of the speech community.

Keywords: Bangla, corpus, discourse, domains, lexical choice, mass media, register, variation

Procedia PDF Downloads 156
491 The Diary of Dracula, by Marin Mincu: Inquiries into a Romanian 'Book of Wisdom' as a Fictional Counterpart for Corpus Hermeticum

Authors: Lucian Vasile Bagiu, Paraschiva Bagiu

Abstract:

The novel written in Italian and published in Italy in 1992 by the Romanian scholar Marin Mincu is meant for the foreign reader, aiming apparently at a better knowledge of the historical character of Vlad the Empalor (Vlad Dracul), within the European cultural, political and historical context of 1463. Throughout the very well written tome, one comes to realize that one of the underlining levels of the fiction is the exposing of various fundamental features of the Romanian culture and civilization. The author of the diary, Dracula, makes mention of Corpus Hermeticum no less than fifteen times, suggesting his own diary is some sort of a philosophical counterpart. The essay focuses on several ‘truths’ and ‘wisdom’ revealed in the fictional teachings of Dracula. The boycott of History by the Romanians is identified as an echo of the philosophical approach of the famous Romanian scholar and writer Lucian Blaga. The orality of the Romanian culture is a landmark opposed to written culture of the Western Europe. The religion of the ancient Dacian God Zalmoxis is seen as the basis for the Romanian existential and/or metaphysical ethnic philosophy (a feature tackled by the famous Romanian historian of religion Mircea Eliade), with a suggestion that Hermes Trismegistus may have written his Corpus Hermeticum being influenced by Zalmoxis. The historical figure of the last Dacian king Decebalus (death 106 AD) is a good pretext for a tantalizing Indo-European suggestion that the prehistoric Thraco-Dacian people may have been the ancestors of the first Romans settled in Latium. The lost diary of the Emperor Trajan The Bello Dacico may have proved that the unknown language of the Dacians was very much alike Latin language (a secret well hidden by the Vatican). The attitude towards death of the Dacians, as described by Herodotus, may have later inspired Pitagora, Socrates, the Eleusinian and Orphic Mysteries, etc. All of these within the Humanistic and Renascentist European context of the epoch, Dracula having a close relationship with scholars such as Nicolaus Cusanus, Cosimo de Medici, Marsilio Ficino, Pope Pius II, etc. Thus The Diary of Dracula turns out as exciting and stupefying as Corpus Hermeticum, a book impossible to assimilate entirely, yet a reference not wise to be ignored.

Keywords: Corpus Hermeticum, Dacians, Dracula, Zalmoxis

Procedia PDF Downloads 141
490 Use of Ing-Formed and Derived Verbal Nominalization in American English: A Survey Applied to Native American English Speakers

Authors: Yujia Sun

Abstract:

Research on nominalizations in English can be traced back to at least the 1960s and even centered in the field nowadays. At the very beginning, the discussion was about the relationship between verbs and nouns, but then it moved to the distinct senses embodied in different forms of nominals, namely, various types of nominalizations. This paper tries to address the issue that how speakers perceive different forms of verbal nouns, and what might influence their perceptions. The data are collected through a self-designed questionnaire targeted at native speakers of American English, and the employment of the Corpus of Contemporary American English (COCA). The results show that semantic differences between different forms of nominals do play a role in people’s preference to certain form than another. But it still awaits more explorations to see how the frequency of usage is interrelates to this issue.

Keywords: corpus of contemporary American English, derived nominalization, frequency of usage, ing-formed nominalization

Procedia PDF Downloads 157
489 Direct Translation vs. Pivot Language Translation for Persian-Spanish Low-Resourced Statistical Machine Translation System

Authors: Benyamin Ahmadnia, Javier Serrano

Abstract:

In this paper we compare two different approaches for translating from Persian to Spanish, as a language pair with scarce parallel corpus. The first approach involves direct transfer using an statistical machine translation system, which is available for this language pair. The second approach involves translation through English, as a pivot language, which has more translation resources and more advanced translation systems available. The results show that, it is possible to achieve better translation quality using English as a pivot language in either approach outperforms direct translation from Persian to Spanish. Our best result is the pivot system which scores higher than direct translation by (1.12) BLEU points.

Keywords: statistical machine translation, direct translation approach, pivot language translation approach, parallel corpus

Procedia PDF Downloads 464
488 A Framework for Chinese Domain-Specific Distant Supervised Named Entity Recognition

Authors: Qin Long, Li Xiaoge

Abstract:

The Knowledge Graphs have now become a new form of knowledge representation. However, there is no consensus in regard to a plausible and definition of entities and relationships in the domain-specific knowledge graph. Further, in conjunction with several limitations and deficiencies, various domain-specific entities and relationships recognition approaches are far from perfect. Specifically, named entity recognition in Chinese domain is a critical task for the natural language process applications. However, a bottleneck problem with Chinese named entity recognition in new domains is the lack of annotated data. To address this challenge, a domain distant supervised named entity recognition framework is proposed. The framework is divided into two stages: first, the distant supervised corpus is generated based on the entity linking model of graph attention neural network; secondly, the generated corpus is trained as the input of the distant supervised named entity recognition model to train to obtain named entities. The link model is verified in the ccks2019 entity link corpus, and the F1 value is 2% higher than that of the benchmark method. The re-pre-trained BERT language model is added to the benchmark method, and the results show that it is more suitable for distant supervised named entity recognition tasks. Finally, it is applied in the computer field, and the results show that this framework can obtain domain named entities.

Keywords: distant named entity recognition, entity linking, knowledge graph, graph attention neural network

Procedia PDF Downloads 73
487 Variables, Annotation, and Metadata Schemas for Early Modern Greek

Authors: Eleni Karantzola, Athanasios Karasimos, Vasiliki Makri, Ioanna Skouvara

Abstract:

Historical linguistics unveils the historical depth of languages and traces variation and change by analyzing linguistic variables over time. This field of linguistics usually deals with a closed data set that can only be expanded by the (re)discovery of previously unknown manuscripts or editions. In some cases, it is possible to use (almost) the entire closed corpus of a language for research, as is the case with the Thesaurus Linguae Graecae digital library for Ancient Greek, which contains most of the extant ancient Greek literature. However, concerning ‘dynamic’ periods when the production and circulation of texts in printed as well as manuscript form have not been fully mapped, representative samples and corpora of texts are needed. Such material and tools are utterly lacking for Early Modern Greek (16th-18th c.). In this study, the principles of the creation of EMoGReC, a pilot representative corpus of Early Modern Greek (16th-18th c.) are presented. Its design follows the fundamental principles of historical corpora. The selection of texts aims to create a representative and balanced corpus that gives insight into diachronic, diatopic and diaphasic variation. The pilot sample includes data derived from fully machine-readable vernacular texts, which belong to 4-5 different textual genres and come from different geographical areas. We develop a hierarchical linguistic annotation scheme, further customized to fit the characteristics of our text corpus. Regarding variables and their variants, we use as a point of departure the bundle of twenty-four features (or categories of features) for prose demotic texts of the 16th c. Tags are introduced bearing the variants [+old/archaic] or [+novel/vernacular]. On the other hand, further phenomena that are underway (cf. The Cambridge Grammar of Medieval and Early Modern Greek) are selected for tagging. The annotated texts are enriched with metalinguistic and sociolinguistic metadata to provide a testbed for the development of the first comprehensive set of tools for the Greek language of that period. Based on a relational management system with interconnection of data, annotations, and their metadata, the EMoGReC database aspires to join a state-of-the-art technological ecosystem for the research of observed language variation and change using advanced computational approaches.

Keywords: early modern Greek, variation and change, representative corpus, diachronic variables.

Procedia PDF Downloads 39
486 An Automatic Speech Recognition Tool for the Filipino Language Using the HTK System

Authors: John Lorenzo Bautista, Yoon-Joong Kim

Abstract:

This paper presents the development of a Filipino speech recognition tool using the HTK System. The system was trained from a subset of the Filipino Speech Corpus developed by the DSP Laboratory of the University of the Philippines-Diliman. The speech corpus was both used in training and testing the system by estimating the parameters for phonetic HMM-based (Hidden-Markov Model) acoustic models. Experiments on different mixture-weights were incorporated in the study. The phoneme-level word-based recognition of a 5-state HMM resulted in an average accuracy rate of 80.13 for a single-Gaussian mixture model, 81.13 after implementing a phoneme-alignment, and 87.19 for the increased Gaussian-mixture weight model. The highest accuracy rate of 88.70% was obtained from a 5-state model with 6 Gaussian mixtures.

Keywords: Filipino language, Hidden Markov Model, HTK system, speech recognition

Procedia PDF Downloads 450
485 Instructional Consequences of the Transiency of Spoken Words

Authors: Slava Kalyuga, Sujanya Sombatteera

Abstract:

In multimedia learning, written text is often transformed into spoken (narrated) text. This transient information may overwhelm limited processing capacity of working memory and inhibit learning instead of improving it. The paper reviews recent empirical studies in modality and verbal redundancy effects within a cognitive load framework and outlines conditions under which negative effects of transiency may occur. According to the modality effect, textual information accompanying pictures should be presented in an auditory rather than visual form in order to engage two available channels of working memory – auditory and visual - instead of only one of them. However, some studies failed to replicate the modality effect and found differences opposite to those expected. Also, according to the multimedia redundancy effect, the same information should not be presented simultaneously in different modalities to avoid unnecessary cognitive load imposed by the integration of redundant sources of information. However, a few studies failed to replicate the multimedia redundancy effect too. Transiency of information is used to explain these controversial results.

Keywords: cognitive load, transient information, modality effect, verbal redundancy effect

Procedia PDF Downloads 358
484 A Study on Sentiment Analysis Using Various ML/NLP Models on Historical Data of Indian Leaders

Authors: Sarthak Deshpande, Akshay Patil, Pradip Pandhare, Nikhil Wankhede, Rushali Deshmukh

Abstract:

Among the highly significant duties for any language most effective is the sentiment analysis, which is also a key area of NLP, that recently made impressive strides. There are several models and datasets available for those tasks in popular and commonly used languages like English, Russian, and Spanish. While sentiment analysis research is performed extensively, however it is lagging behind for the regional languages having few resources such as Hindi, Marathi. Marathi is one of the languages that included in the Indian Constitution’s 8th schedule and is the third most widely spoken language in the country and primarily spoken in the Deccan region, which encompasses Maharashtra and Goa. There isn’t sufficient study on sentiment analysis methods based on Marathi text due to lack of available resources, information. Therefore, this project proposes the use of different ML/NLP models for the analysis of Marathi data from the comments below YouTube content, tweets or Instagram posts. We aim to achieve a short and precise analysis and summary of the related data using our dataset (Dates, names, root words) and lexicons to locate exact information.

Keywords: multilingual sentiment analysis, Marathi, natural language processing, text summarization, lexicon-based approaches

Procedia PDF Downloads 46
483 Phraseologisms With The Spices And Food Additives Component In Polish And Russian. Lexical And Semantic Aspects

Authors: Oliwia Bator

Abstract:

The subject of this description is phraseologisms with the component “spices and food additives component" in Polish and Russian. The purpose of the study is to analyze the phraseologisms from the point of view of lexis and semantics. The material for analysis was extracted from Phraseological Dictionaries of Polish and Russian. The phraseologisms were considered from the lexical point of view, taking into account the name of the " spices and food additives" component, which forms them. From the semantic point of view, 12 semantic groups of phraseologisms were separated in Polish, while 9 semantic groups were separated in Russian. In addition is shown their functioning in the contexts of contemporary Polish and Russian. The contexts were taken from the National Corpus of the Polish Language and the National Corpus of the Russian Language.

Keywords: phraseology, language, slavic studies, linguistics

Procedia PDF Downloads 4
482 The Power of Words: A Corpus Analysis of Campaign Speeches of President Donald J. Trump

Authors: Aiza Dalman

Abstract:

Words are powerful when these are used wisely and strategically. In this study, twelve (12) campaign speeches of President Donald J. Trump were analyzed as to frequently used words and ethos, pathos and logos being employed. The speeches were read thoroughly, analyzed and interpreted. With the use of Word Counter Tool and Text Analyzer software accessible online, it was found out that the word ‘will’ has the highest frequency of 121, followed by Hillary (58), American (38), going (35), plan and Clinton (32), illegal (30), government (28), corruption (26) and criminal (24). When the speeches were analyzed as to ethos, pathos and logos, on the other hand, it revealed that these were all employed in his speeches. The statements under these pointed out against Hillary or in his favor. The unique strategy of President Donald J. Trump as to frequently used words and ethos, pathos and logos in persuading people perhaps lead the way to his victory.

Keywords: campaign speeches, corpus analysis, ethos, logos and pathos, power of words

Procedia PDF Downloads 257
481 A Corpus Study of English Verbs in Chinese EFL Learners’ Academic Writing Abstracts

Authors: Shuaili Ji

Abstract:

The correct use of verbs is an important element of high-quality research articles, and thus for Chinese EFL learners, it is significant to master characteristics of verbs and to precisely use verbs. However, some researches have shown that there are differences in using verbs between learners and native speakers and learners have difficulty in using English verbs. This corpus-based quantitative research can enhance learners’ knowledge of English verbs and promote the quality of research article abstracts even of the whole academic writing. The aim of this study is to find the differences between learners’ and native speakers’ use of verbs and to study the factors that contribute to those differences. To this end, the research question is as follows: What are the differences between most frequently used verbs by learners and those by native speakers? The research question is answered through a study that uses corpus-based data-driven approach to analyze the verbs used by learners in their abstract writings in terms of collocation, colligation and semantic prosody. The results show that: (1) EFL learners obviously overused ‘be, can, find, make’ and underused ‘investigate, examine, may’. As to modal verbs, learners obviously overused ‘can’ while underused ‘may’. (2) Learners obviously overused ‘we find + object clauses’ while underused ‘nouns (results, findings, data) + suggest/indicate/reveal + object clauses’ when expressing research results. (3) Learners tended to transfer the collocation, colligation and semantic prosody of shǐ and zuò to make. (4) Learners obviously overused ‘BE+V-ed’ and used BE as the main verb. They also obviously overused the basic forms of BE such as be, is, are, while obviously underused its inflections (was, were). These results manifested learners’ lack of accuracy and idiomatic property in verb usage. Due to the influence of the concept transfer of Chinese, the verbs in learners’ abstracts showed obvious transfer of mother language. In addition, learners have not fully mastered the use of verbs, avoiding using complex colligations to prevent errors. Based on these findings, the present study has implications for English teaching, seeking to have implications for English academic abstract writing in China. Further research could be undertaken to study the use of verbs in the whole dissertation to find out whether the characteristic of the verbs in abstracts can apply in the whole dissertation or not.

Keywords: academic writing abstracts, Chinese EFL learners, corpus-based, data-driven, verbs

Procedia PDF Downloads 309
480 Absence of Developmental Change in Epenthetic Vowel Duration in Japanese Speakers’ English

Authors: Takayuki Konishi, Kakeru Yazawa, Mariko Kondo

Abstract:

This study examines developmental change in the production of epenthetic vowels by Japanese learners of English in relation to acquisition of L2 English speech rhythm. Seventy-two Japanese learners of English in the J-AESOP corpus were divided into lower- and higher-level learners according to their proficiency score and the frequency of vowel epenthesis. Three learners were excluded because no vowel epenthesis was observed in their utterances. The analysis of their read English speech data showed no statistical difference between lower- and higher-level learners, implying the absence of any developmental change in durations of epenthetic vowels. This result, together with the findings of previous studies, will be discussed in relation to the transfer of L1 phonology and manifestation of L2 English rhythm.

Keywords: vowel epenthesis, Japanese learners of English, L2 speech corpus, speech rhythm

Procedia PDF Downloads 247
479 Technological Tool-Use as an Online Learner Strategy in a Synchronous Speaking Task

Authors: J. Knight, E. Barberà

Abstract:

Language learning strategies have been defined as thoughts and actions, consciously chosen and operationalized by language learners, to help them in carrying out a multiplicity of tasks from the very outset of learning to the most advanced levels of target language performance. While research in the field of Second Language Acquisition has focused on ‘good’ language learners, the effectiveness of strategy-use and orchestration by effective learners in face-to-face classrooms much less research has attended to learner strategies in online contexts, particular strategies in relation to technological tool use which can be part of a task design. In addition, much research on learner strategies and strategy use has been explored focusing on cognitive, attitudinal and metacognitive behaviour with less research focusing on the social aspect of strategies. This study focuses on how learners mediate with a technological tool designed to support synchronous spoken interaction and how this shape their spoken interaction in the opening of their talk. A case study approach is used incorporating notions from communities of practice theory to analyse and understand learner strategies of dyads carrying out a role play task. The study employs analysis of transcripts of spoken interaction in the openings of the talk along with log files of tool use. The study draws on results of previous studies pertaining to the same tool as a form of triangulation. Findings show how learners gain pre-task planning time through technological tool control. The strategies involving learners’ choices to enter and exit the tool shape their spoken interaction qualitatively, with some cases demonstrating long silences whilst others appearing to start the pedagogical task immediately. Who/what learners orientate to in the openings of the talk: an audience (i.e. the teacher), each other and/or screen-based signifiers in the opening moments of the talk also becomes a focus. The study highlights how tool use as a social practice should be considered a learning strategy in online contexts whereby different usages may be understood in the light of the more usual asynchronous social practices of the online community. The teachers’ role in the community is also problematised as the evaluator of the practices of that community. Results are pertinent for task design for synchronous speaking tasks. The use of community of practice theory supports an understanding of strategy use that involves both metacognition alongside social context revealing how tool-use strategies may need to be orally (socially) negotiated by learners and may also differ from an online language community.

Keywords: learner strategy, tool use, community of practice, speaking task

Procedia PDF Downloads 319
478 MicroRNA in Bovine Corpus Luteum during Early Pregnancy

Authors: Rreze Gecaj, Corina Schanzenbach, Benedikt Kirchner, Michael Pfaffl, Bajram Berisha

Abstract:

The maintenance of corpus lutem (CL) during early pregnancy in cattle is a critical and multifarious process. A luteotrophic mechanism originating from the embryo is widely accepted as the triggering signal for the CL maintenance. In the cattle, it is the interferon-tau (IFNT) secretion form conceptus that prevents CL regression and ensures progesterone production for the establishment of pregnancy. In addition to endocrine and paracrine signals, microRNA (miRNA) can also support CL sustainability during early pregnancy. MiRNA are small non-coding nucleic acids that regulate gene expression post-transcriptionally and are shown to be involved in the modulation of CL function. However, the examination of miRNAs in corpus luteum function at the early pregnancy still remains largely uncovered. This study aims at profiling the expression of miRNA in CL during the early pregnancy in cattle by comparing it with the CL form late cycle and with the regressed CL. Corpora lutea were assigned in two different groups during the cycle (C13 group, late CL: days 13-18 and C18, regressed CL group: day >18) and during the early pregnancy (group P: 1-2 month). The estrous cycle was determined by macroscopic examination and to age the fetus crown-rump length measurement was applied. A total of 9 corpora lutea from individual animals were included in the study, three corpora lutea for each group. MiRNAs population was profiled using small RNA next-generation sequencing and biologically significant miRNAs were evaluated for their differential expression using the DESeq2-methodology. We show that 6 differentially expressed miRNAs (bta-mir-2890, -2332, -2441-3p, -148b, -1248 and -29c) are common to both comparisons, P vs C13 and P vs C18. While for each stage individually we have identified unique miRNAs differentially expressed only for the given comparison. bta-miR-23a and -769 were unique miRNAs differentially expressed in P vs C13, whereas forty-four unique miRNAs were identified as differentially expressed in P vs C18. These data confirm that miRNAs are highly abundant in luteal tissue during early pregnancy and potentially regulate the CL maintenance at this stage of fetus development.

Keywords: bovine, corpus luteum, microRNA, pregnancy, RNA-Seq

Procedia PDF Downloads 235
477 VIAN-DH: Computational Multimodal Conversation Analysis Software and Infrastructure

Authors: Teodora Vukovic, Christoph Hottiger, Noah Bubenhofer

Abstract:

The development of VIAN-DH aims at bridging two linguistic approaches: conversation analysis/interactional linguistics (IL), so far a dominantly qualitative field, and computational/corpus linguistics and its quantitative and automated methods. Contemporary IL investigates the systematic organization of conversations and interactions composed of speech, gaze, gestures, and body positioning, among others. These highly integrated multimodal behaviour is analysed based on video data aimed at uncovering so called “multimodal gestalts”, patterns of linguistic and embodied conduct that reoccur in specific sequential positions employed for specific purposes. Multimodal analyses (and other disciplines using videos) are so far dependent on time and resource intensive processes of manual transcription of each component from video materials. Automating these tasks requires advanced programming skills, which is often not in the scope of IL. Moreover, the use of different tools makes the integration and analysis of different formats challenging. Consequently, IL research often deals with relatively small samples of annotated data which are suitable for qualitative analysis but not enough for making generalized empirical claims derived quantitatively. VIAN-DH aims to create a workspace where many annotation layers required for the multimodal analysis of videos can be created, processed, and correlated in one platform. VIAN-DH will provide a graphical interface that operates state-of-the-art tools for automating parts of the data processing. The integration of tools that already exist in computational linguistics and computer vision, facilitates data processing for researchers lacking programming skills, speeds up the overall research process, and enables the processing of large amounts of data. The main features to be introduced are automatic speech recognition for the transcription of language, automatic image recognition for extraction of gestures and other visual cues, as well as grammatical annotation for adding morphological and syntactic information to the verbal content. In the ongoing instance of VIAN-DH, we focus on gesture extraction (pointing gestures, in particular), making use of existing models created for sign language and adapting them for this specific purpose. In order to view and search the data, VIAN-DH will provide a unified format and enable the import of the main existing formats of annotated video data and the export to other formats used in the field, while integrating different data source formats in a way that they can be combined in research. VIAN-DH will adapt querying methods from corpus linguistics to enable parallel search of many annotation levels, combining token-level and chronological search for various types of data. VIAN-DH strives to bring crucial and potentially revolutionary innovation to the field of IL, (that can also extend to other fields using video materials). It will allow the processing of large amounts of data automatically and, the implementation of quantitative analyses, combining it with the qualitative approach. It will facilitate the investigation of correlations between linguistic patterns (lexical or grammatical) with conversational aspects (turn-taking or gestures). Users will be able to automatically transcribe and annotate visual, spoken and grammatical information from videos, and to correlate those different levels and perform queries and analyses.

Keywords: multimodal analysis, corpus linguistics, computational linguistics, image recognition, speech recognition

Procedia PDF Downloads 84
476 The Analysis of Deceptive and Truthful Speech: A Computational Linguistic Based Method

Authors: Seham El Kareh, Miramar Etman

Abstract:

Recently, detecting liars and extracting features which distinguish them from truth-tellers have been the focus of a wide range of disciplines. To the author’s best knowledge, most of the work has been done on facial expressions and body gestures but only few works have been done on the language used by both liars and truth-tellers. This paper sheds light on four axes. The first axis copes with building an audio corpus for deceptive and truthful speech for Egyptian Arabic speakers. The second axis focuses on examining the human perception of lies and proving our need for computational linguistic-based methods to extract features which characterize truthful and deceptive speech. The third axis is concerned with building a linguistic analysis program that could extract from the corpus the inter- and intra-linguistic cues for deceptive and truthful speech. The program built here is based on selected categories from the Linguistic Inquiry and Word Count program. Our results demonstrated that Egyptian Arabic speakers on one hand preferred to use first-person pronouns and present tense compared to the past tense when lying and their lies lacked of second-person pronouns, and on the other hand, when telling the truth, they preferred to use the verbs related to motion and the nouns related to time. The results also showed that there is a need for bigger data to prove the significance of words related to emotions and numbers.

Keywords: Egyptian Arabic corpus, computational analysis, deceptive features, forensic linguistics, human perception, truthful features

Procedia PDF Downloads 185
475 Corpus-Based Neural Machine Translation: Empirical Study Multilingual Corpus for Machine Translation of Opaque Idioms - Cloud AutoML Platform

Authors: Khadija Refouh

Abstract:

Culture bound-expressions have been a bottleneck for Natural Language Processing (NLP) and comprehension, especially in the case of machine translation (MT). In the last decade, the field of machine translation has greatly advanced. Neural machine translation NMT has recently achieved considerable development in the quality of translation that outperformed previous traditional translation systems in many language pairs. Neural machine translation NMT is an Artificial Intelligence AI and deep neural networks applied to language processing. Despite this development, there remain some serious challenges that face neural machine translation NMT when translating culture bounded-expressions, especially for low resources language pairs such as Arabic-English and Arabic-French, which is not the case with well-established language pairs such as English-French. Machine translation of opaque idioms from English into French are likely to be more accurate than translating them from English into Arabic. For example, Google Translate Application translated the sentence “What a bad weather! It runs cats and dogs.” to “يا له من طقس سيء! تمطر القطط والكلاب” into the target language Arabic which is an inaccurate literal translation. The translation of the same sentence into the target language French was “Quel mauvais temps! Il pleut des cordes.” where Google Translate Application used the accurate French corresponding idioms. This paper aims to perform NMT experiments towards better translation of opaque idioms using high quality clean multilingual corpus. This Corpus will be collected analytically from human generated idiom translation. AutoML translation, a Google Neural Machine Translation Platform, is used as a custom translation model to improve the translation of opaque idioms. The automatic evaluation of the custom model will be compared to the Google NMT using Bilingual Evaluation Understudy Score BLEU. BLEU is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. Human evaluation is integrated to test the reliability of the Blue Score. The researcher will examine syntactical, lexical, and semantic features using Halliday's functional theory.

Keywords: multilingual corpora, natural language processing (NLP), neural machine translation (NMT), opaque idioms

Procedia PDF Downloads 114
474 A Theoretical and Corpus-Based Analysis of English and Spanish Syntax Derived from Método de Los Relojes Verb Types According to Systemic-Functional Grammar as a Foundation for Methodological Adaption

Authors: Timothy William Lawrence

Abstract:

The goal of this paper is to research and categorize the four basic verb types found in the Spanish descriptive grammar book Método de los Relojes using verb clauses as representation as found in M.A.K. Halliday's Systemic-Functional Grammar with the purpose of establishing theoretical along with syntactical parallels and deviations between English and Spanish. Results confirm theoretical correlations exist therefore leading to an analysis of English grammar syntax resulting in delineating commonalities and differences from Spanish. Corpora searches were carried out on different patterns of syntactical structures confirming divergences in verb syntax, making it possible to establish parameters to adapt English verbs to the criteria of the four basic Método de los Relojes verb types.

Keywords: corpus studies, Método de los Relojes, structural-functional grammar, verb syntax

Procedia PDF Downloads 167
473 A Corpus-Linguistic Analysis of Online Iranian News Coverage on Syrian Revolution

Authors: Amaal Ali Al-Gamde

Abstract:

The Syrian revolution is a major issue in the Middle East, which draws in world powers and receives a great focus in international mass media since 2011. The heavy global reliance on cyber news and digital sources plays a key role in conveying a sense of bias to a wide range of online readers. Thus, based on the assumption that media discourse possesses ideological implications, this study investigates the representation of Syrian revolution in online media. The paper explores the discursive constructions of anti and pro-government powers in Syrian revolution in 1000,000-word corpus of Fars online reports (an Iranian news agency), issued between 2013 and 2015. Taking a corpus assisted discourse analysis approach, the analysis investigates three types of lexicosemantic relations, the semantic macrostructures within which the two social actors are framed, the lexical collocations characterizing the news discourse and the discourse prosodies they tell about the two sides of the conflict. The study utilizes computer-based approaches, sketch engine and AntConc software to minimize the bias of the subjective analysis. The analysis moves from the insights of lexical frequencies and keyness scores to examine themes and the collocational patterns. The findings reveal the Fars agency’s ideological mode of representations in reporting events of Syrian revolution in two ways. The first is by stereotyping the opposition groups under the umbrella of terrorism, using words such as (law breakers, foreign-backed groups, militant groups, terrorists) to legitimize the atrocities of security forces against protesters and enhance horror among civilians. The second is through emphasizing the power of the government and depicting it as the defender of the Arab land by foregrounding the discourse of international conspiracy against Syria. The paper concludes discussing the potential importance of triangulating corpus linguistic tools with critical discourse analysis to elucidate more about discourses and reality.

Keywords: discourse prosody, ideology, keyness, semantic macrostructure

Procedia PDF Downloads 112