Search results for: learner corpus
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 742

Search results for: learner corpus

622 Identification of Text Domains and Register Variation through the Analysis of Lexical Distribution in a Bangla Mass Media Text Corpus

Authors: Mahul Bhattacharyya, Niladri Sekhar Dash

Abstract:

The present research paper is an experimental attempt to investigate the nature of variation in the register in three major text domains, namely, social, cultural, and political texts collected from the corpus of Bangla printed mass media texts. This present study uses a corpus of a moderate amount of Bangla mass media text that contains nearly one million words collected from different media sources like newspapers, magazines, advertisements, periodicals, etc. The analysis of corpus data reveals that each text has certain lexical properties that not only control their identity but also mark their uniqueness across the domains. At first, the subject domains of the texts are classified into two parameters namely, ‘Genre' and 'Text Type'. Next, some empirical investigations are made to understand how the domains vary from each other in terms of lexical properties like both function and content words. Here the method of comparative-cum-contrastive matching of lexical load across domains is invoked through word frequency count to track how domain-specific words and terms may be marked as decisive indicators in the act of specifying the textual contexts and subject domains. The study shows that the common lexical stock that percolates across all text domains are quite dicey in nature as their lexicological identity does not have any bearing in the act of specifying subject domains. Therefore, it becomes necessary for language users to anchor upon certain domain-specific lexical items to recognize a text that belongs to a specific text domain. The eventual findings of this study confirm that texts belonging to different subject domains in Bangla news text corpus clearly differ on the parameters of lexical load, lexical choice, lexical clustering, lexical collocation. In fact, based on these parameters, along with some statistical calculations, it is possible to classify mass media texts into different types to mark their relation with regard to the domains they should actually belong. The advantage of this analysis lies in the proper identification of the linguistic factors which will give language users a better insight into the method they employ in text comprehension, as well as construct a systemic frame for designing text identification strategy for language learners. The availability of huge amount of Bangla media text data is useful for achieving accurate conclusions with a certain amount of reliability and authenticity. This kind of corpus-based analysis is quite relevant for a resource-poor language like Bangla, as no attempt has ever been made to understand how the structure and texture of Bangla mass media texts vary due to certain linguistic and extra-linguistic constraints that are actively operational to specific text domains. Since mass media language is assumed to be the most 'recent representation' of the actual use of the language, this study is expected to show how the Bangla news texts reflect the thoughts of the society and how they leave a strong impact on the thought process of the speech community.

Keywords: Bangla, corpus, discourse, domains, lexical choice, mass media, register, variation

Procedia PDF Downloads 151
621 The Diary of Dracula, by Marin Mincu: Inquiries into a Romanian 'Book of Wisdom' as a Fictional Counterpart for Corpus Hermeticum

Authors: Lucian Vasile Bagiu, Paraschiva Bagiu

Abstract:

The novel written in Italian and published in Italy in 1992 by the Romanian scholar Marin Mincu is meant for the foreign reader, aiming apparently at a better knowledge of the historical character of Vlad the Empalor (Vlad Dracul), within the European cultural, political and historical context of 1463. Throughout the very well written tome, one comes to realize that one of the underlining levels of the fiction is the exposing of various fundamental features of the Romanian culture and civilization. The author of the diary, Dracula, makes mention of Corpus Hermeticum no less than fifteen times, suggesting his own diary is some sort of a philosophical counterpart. The essay focuses on several ‘truths’ and ‘wisdom’ revealed in the fictional teachings of Dracula. The boycott of History by the Romanians is identified as an echo of the philosophical approach of the famous Romanian scholar and writer Lucian Blaga. The orality of the Romanian culture is a landmark opposed to written culture of the Western Europe. The religion of the ancient Dacian God Zalmoxis is seen as the basis for the Romanian existential and/or metaphysical ethnic philosophy (a feature tackled by the famous Romanian historian of religion Mircea Eliade), with a suggestion that Hermes Trismegistus may have written his Corpus Hermeticum being influenced by Zalmoxis. The historical figure of the last Dacian king Decebalus (death 106 AD) is a good pretext for a tantalizing Indo-European suggestion that the prehistoric Thraco-Dacian people may have been the ancestors of the first Romans settled in Latium. The lost diary of the Emperor Trajan The Bello Dacico may have proved that the unknown language of the Dacians was very much alike Latin language (a secret well hidden by the Vatican). The attitude towards death of the Dacians, as described by Herodotus, may have later inspired Pitagora, Socrates, the Eleusinian and Orphic Mysteries, etc. All of these within the Humanistic and Renascentist European context of the epoch, Dracula having a close relationship with scholars such as Nicolaus Cusanus, Cosimo de Medici, Marsilio Ficino, Pope Pius II, etc. Thus The Diary of Dracula turns out as exciting and stupefying as Corpus Hermeticum, a book impossible to assimilate entirely, yet a reference not wise to be ignored.

Keywords: Corpus Hermeticum, Dacians, Dracula, Zalmoxis

Procedia PDF Downloads 131
620 Use of Ing-Formed and Derived Verbal Nominalization in American English: A Survey Applied to Native American English Speakers

Authors: Yujia Sun

Abstract:

Research on nominalizations in English can be traced back to at least the 1960s and even centered in the field nowadays. At the very beginning, the discussion was about the relationship between verbs and nouns, but then it moved to the distinct senses embodied in different forms of nominals, namely, various types of nominalizations. This paper tries to address the issue that how speakers perceive different forms of verbal nouns, and what might influence their perceptions. The data are collected through a self-designed questionnaire targeted at native speakers of American English, and the employment of the Corpus of Contemporary American English (COCA). The results show that semantic differences between different forms of nominals do play a role in people’s preference to certain form than another. But it still awaits more explorations to see how the frequency of usage is interrelates to this issue.

Keywords: corpus of contemporary American English, derived nominalization, frequency of usage, ing-formed nominalization

Procedia PDF Downloads 152
619 Direct Translation vs. Pivot Language Translation for Persian-Spanish Low-Resourced Statistical Machine Translation System

Authors: Benyamin Ahmadnia, Javier Serrano

Abstract:

In this paper we compare two different approaches for translating from Persian to Spanish, as a language pair with scarce parallel corpus. The first approach involves direct transfer using an statistical machine translation system, which is available for this language pair. The second approach involves translation through English, as a pivot language, which has more translation resources and more advanced translation systems available. The results show that, it is possible to achieve better translation quality using English as a pivot language in either approach outperforms direct translation from Persian to Spanish. Our best result is the pivot system which scores higher than direct translation by (1.12) BLEU points.

Keywords: statistical machine translation, direct translation approach, pivot language translation approach, parallel corpus

Procedia PDF Downloads 459
618 A Framework for Chinese Domain-Specific Distant Supervised Named Entity Recognition

Authors: Qin Long, Li Xiaoge

Abstract:

The Knowledge Graphs have now become a new form of knowledge representation. However, there is no consensus in regard to a plausible and definition of entities and relationships in the domain-specific knowledge graph. Further, in conjunction with several limitations and deficiencies, various domain-specific entities and relationships recognition approaches are far from perfect. Specifically, named entity recognition in Chinese domain is a critical task for the natural language process applications. However, a bottleneck problem with Chinese named entity recognition in new domains is the lack of annotated data. To address this challenge, a domain distant supervised named entity recognition framework is proposed. The framework is divided into two stages: first, the distant supervised corpus is generated based on the entity linking model of graph attention neural network; secondly, the generated corpus is trained as the input of the distant supervised named entity recognition model to train to obtain named entities. The link model is verified in the ccks2019 entity link corpus, and the F1 value is 2% higher than that of the benchmark method. The re-pre-trained BERT language model is added to the benchmark method, and the results show that it is more suitable for distant supervised named entity recognition tasks. Finally, it is applied in the computer field, and the results show that this framework can obtain domain named entities.

Keywords: distant named entity recognition, entity linking, knowledge graph, graph attention neural network

Procedia PDF Downloads 68
617 Variables, Annotation, and Metadata Schemas for Early Modern Greek

Authors: Eleni Karantzola, Athanasios Karasimos, Vasiliki Makri, Ioanna Skouvara

Abstract:

Historical linguistics unveils the historical depth of languages and traces variation and change by analyzing linguistic variables over time. This field of linguistics usually deals with a closed data set that can only be expanded by the (re)discovery of previously unknown manuscripts or editions. In some cases, it is possible to use (almost) the entire closed corpus of a language for research, as is the case with the Thesaurus Linguae Graecae digital library for Ancient Greek, which contains most of the extant ancient Greek literature. However, concerning ‘dynamic’ periods when the production and circulation of texts in printed as well as manuscript form have not been fully mapped, representative samples and corpora of texts are needed. Such material and tools are utterly lacking for Early Modern Greek (16th-18th c.). In this study, the principles of the creation of EMoGReC, a pilot representative corpus of Early Modern Greek (16th-18th c.) are presented. Its design follows the fundamental principles of historical corpora. The selection of texts aims to create a representative and balanced corpus that gives insight into diachronic, diatopic and diaphasic variation. The pilot sample includes data derived from fully machine-readable vernacular texts, which belong to 4-5 different textual genres and come from different geographical areas. We develop a hierarchical linguistic annotation scheme, further customized to fit the characteristics of our text corpus. Regarding variables and their variants, we use as a point of departure the bundle of twenty-four features (or categories of features) for prose demotic texts of the 16th c. Tags are introduced bearing the variants [+old/archaic] or [+novel/vernacular]. On the other hand, further phenomena that are underway (cf. The Cambridge Grammar of Medieval and Early Modern Greek) are selected for tagging. The annotated texts are enriched with metalinguistic and sociolinguistic metadata to provide a testbed for the development of the first comprehensive set of tools for the Greek language of that period. Based on a relational management system with interconnection of data, annotations, and their metadata, the EMoGReC database aspires to join a state-of-the-art technological ecosystem for the research of observed language variation and change using advanced computational approaches.

Keywords: early modern Greek, variation and change, representative corpus, diachronic variables.

Procedia PDF Downloads 31
616 Parental Monitoring of Learners’ Cell Phone Use in the Eastern Cape, South Africa

Authors: Melikhaya Skhephe, Robert Mawuli Kwasi Boadzo, Zanoxolo Berington Gobingca

Abstract:

This research study sought to examine parental monitoring of learners’ cell phone use in the Eastern Cape, South Africa. To this end, the researchers employed a quantitative approach. Data were obtained through questionnaires, with a sample of 15 parents having been purposively selected. The findings revealed that parents are unaware that they have to monitor the learner’s cell phone. Another finding was that parents in the 21-century did not support the use of mobile phones in education. The researchers recommend that parent’s discussion forums be created to educate parents on how a cell phone can be used in education. Cellphone companies need to be encouraged to educate parents on how they monitor cell phones used by learners. Another recommendation was that network providers need to restrict access to searching on the internet according to age.

Keywords: parental monitoring, app blocking services, learner’s cell phone use, cell phone

Procedia PDF Downloads 124
615 An Automatic Speech Recognition Tool for the Filipino Language Using the HTK System

Authors: John Lorenzo Bautista, Yoon-Joong Kim

Abstract:

This paper presents the development of a Filipino speech recognition tool using the HTK System. The system was trained from a subset of the Filipino Speech Corpus developed by the DSP Laboratory of the University of the Philippines-Diliman. The speech corpus was both used in training and testing the system by estimating the parameters for phonetic HMM-based (Hidden-Markov Model) acoustic models. Experiments on different mixture-weights were incorporated in the study. The phoneme-level word-based recognition of a 5-state HMM resulted in an average accuracy rate of 80.13 for a single-Gaussian mixture model, 81.13 after implementing a phoneme-alignment, and 87.19 for the increased Gaussian-mixture weight model. The highest accuracy rate of 88.70% was obtained from a 5-state model with 6 Gaussian mixtures.

Keywords: Filipino language, Hidden Markov Model, HTK system, speech recognition

Procedia PDF Downloads 441
614 Understanding the Multilingualism of the Mauritian Multilingual Primary School Learner and Translanguaging: A Linguistic Ethnographic Study

Authors: Yesha Devi Mahadeo-Doorgakant

Abstract:

The Mauritian landscape is well-known for its multilingualism with the daily interaction of the number of languages that are used in the island; namely Kreol Morisien, the European languages (English and French) and the Oriental/Asian languages (Hindi, Arabic/Urdu, Tamil, Telegu, Marathi, Mandarin, etc.). However, within Mauritius’ multilingual educational system, English is the official medium of instruction while French is taught as compulsory subject till upper secondary and oriental languages are offered as optional languages at primary level. Usually, Mauritians choose one oriental language based on their ethnic/religious identity, when they start their primary schooling as an additional language to learn. In January 2012, Kreol Morisien, which is the considered the language of daily interaction of the majority of Mauritians, was introduced as an optional subject at primary level, taught at the same time as the oriental languages. The introduction of Kreol Morisien has spurred linguistic debates about the issue of multilingualism within the curriculum. Taking this into account, researchers have started pondering on the multilingual educational system of the country and questioning whether the current language curriculum caters for the complex everyday linguistic reality of the multilingual Mauritian learner, given most learners are embedded within an environment where the different languages interact with each other daily. This paper, therefore, proposes translanguaging as being a more befitting theoretical lens through which the multilingualism and the linguistic repertoire of Mauritian learners’ can best be understood.

Keywords: multilingualism, translanguaging, multilingual learner, linguistic ethnography

Procedia PDF Downloads 146
613 The Power of Words: A Corpus Analysis of Campaign Speeches of President Donald J. Trump

Authors: Aiza Dalman

Abstract:

Words are powerful when these are used wisely and strategically. In this study, twelve (12) campaign speeches of President Donald J. Trump were analyzed as to frequently used words and ethos, pathos and logos being employed. The speeches were read thoroughly, analyzed and interpreted. With the use of Word Counter Tool and Text Analyzer software accessible online, it was found out that the word ‘will’ has the highest frequency of 121, followed by Hillary (58), American (38), going (35), plan and Clinton (32), illegal (30), government (28), corruption (26) and criminal (24). When the speeches were analyzed as to ethos, pathos and logos, on the other hand, it revealed that these were all employed in his speeches. The statements under these pointed out against Hillary or in his favor. The unique strategy of President Donald J. Trump as to frequently used words and ethos, pathos and logos in persuading people perhaps lead the way to his victory.

Keywords: campaign speeches, corpus analysis, ethos, logos and pathos, power of words

Procedia PDF Downloads 244
612 Technological Tool-Use as an Online Learner Strategy in a Synchronous Speaking Task

Authors: J. Knight, E. Barberà

Abstract:

Language learning strategies have been defined as thoughts and actions, consciously chosen and operationalized by language learners, to help them in carrying out a multiplicity of tasks from the very outset of learning to the most advanced levels of target language performance. While research in the field of Second Language Acquisition has focused on ‘good’ language learners, the effectiveness of strategy-use and orchestration by effective learners in face-to-face classrooms much less research has attended to learner strategies in online contexts, particular strategies in relation to technological tool use which can be part of a task design. In addition, much research on learner strategies and strategy use has been explored focusing on cognitive, attitudinal and metacognitive behaviour with less research focusing on the social aspect of strategies. This study focuses on how learners mediate with a technological tool designed to support synchronous spoken interaction and how this shape their spoken interaction in the opening of their talk. A case study approach is used incorporating notions from communities of practice theory to analyse and understand learner strategies of dyads carrying out a role play task. The study employs analysis of transcripts of spoken interaction in the openings of the talk along with log files of tool use. The study draws on results of previous studies pertaining to the same tool as a form of triangulation. Findings show how learners gain pre-task planning time through technological tool control. The strategies involving learners’ choices to enter and exit the tool shape their spoken interaction qualitatively, with some cases demonstrating long silences whilst others appearing to start the pedagogical task immediately. Who/what learners orientate to in the openings of the talk: an audience (i.e. the teacher), each other and/or screen-based signifiers in the opening moments of the talk also becomes a focus. The study highlights how tool use as a social practice should be considered a learning strategy in online contexts whereby different usages may be understood in the light of the more usual asynchronous social practices of the online community. The teachers’ role in the community is also problematised as the evaluator of the practices of that community. Results are pertinent for task design for synchronous speaking tasks. The use of community of practice theory supports an understanding of strategy use that involves both metacognition alongside social context revealing how tool-use strategies may need to be orally (socially) negotiated by learners and may also differ from an online language community.

Keywords: learner strategy, tool use, community of practice, speaking task

Procedia PDF Downloads 315
611 A Corpus Study of English Verbs in Chinese EFL Learners’ Academic Writing Abstracts

Authors: Shuaili Ji

Abstract:

The correct use of verbs is an important element of high-quality research articles, and thus for Chinese EFL learners, it is significant to master characteristics of verbs and to precisely use verbs. However, some researches have shown that there are differences in using verbs between learners and native speakers and learners have difficulty in using English verbs. This corpus-based quantitative research can enhance learners’ knowledge of English verbs and promote the quality of research article abstracts even of the whole academic writing. The aim of this study is to find the differences between learners’ and native speakers’ use of verbs and to study the factors that contribute to those differences. To this end, the research question is as follows: What are the differences between most frequently used verbs by learners and those by native speakers? The research question is answered through a study that uses corpus-based data-driven approach to analyze the verbs used by learners in their abstract writings in terms of collocation, colligation and semantic prosody. The results show that: (1) EFL learners obviously overused ‘be, can, find, make’ and underused ‘investigate, examine, may’. As to modal verbs, learners obviously overused ‘can’ while underused ‘may’. (2) Learners obviously overused ‘we find + object clauses’ while underused ‘nouns (results, findings, data) + suggest/indicate/reveal + object clauses’ when expressing research results. (3) Learners tended to transfer the collocation, colligation and semantic prosody of shǐ and zuò to make. (4) Learners obviously overused ‘BE+V-ed’ and used BE as the main verb. They also obviously overused the basic forms of BE such as be, is, are, while obviously underused its inflections (was, were). These results manifested learners’ lack of accuracy and idiomatic property in verb usage. Due to the influence of the concept transfer of Chinese, the verbs in learners’ abstracts showed obvious transfer of mother language. In addition, learners have not fully mastered the use of verbs, avoiding using complex colligations to prevent errors. Based on these findings, the present study has implications for English teaching, seeking to have implications for English academic abstract writing in China. Further research could be undertaken to study the use of verbs in the whole dissertation to find out whether the characteristic of the verbs in abstracts can apply in the whole dissertation or not.

Keywords: academic writing abstracts, Chinese EFL learners, corpus-based, data-driven, verbs

Procedia PDF Downloads 303
610 Absence of Developmental Change in Epenthetic Vowel Duration in Japanese Speakers’ English

Authors: Takayuki Konishi, Kakeru Yazawa, Mariko Kondo

Abstract:

This study examines developmental change in the production of epenthetic vowels by Japanese learners of English in relation to acquisition of L2 English speech rhythm. Seventy-two Japanese learners of English in the J-AESOP corpus were divided into lower- and higher-level learners according to their proficiency score and the frequency of vowel epenthesis. Three learners were excluded because no vowel epenthesis was observed in their utterances. The analysis of their read English speech data showed no statistical difference between lower- and higher-level learners, implying the absence of any developmental change in durations of epenthetic vowels. This result, together with the findings of previous studies, will be discussed in relation to the transfer of L1 phonology and manifestation of L2 English rhythm.

Keywords: vowel epenthesis, Japanese learners of English, L2 speech corpus, speech rhythm

Procedia PDF Downloads 242
609 MicroRNA in Bovine Corpus Luteum during Early Pregnancy

Authors: Rreze Gecaj, Corina Schanzenbach, Benedikt Kirchner, Michael Pfaffl, Bajram Berisha

Abstract:

The maintenance of corpus lutem (CL) during early pregnancy in cattle is a critical and multifarious process. A luteotrophic mechanism originating from the embryo is widely accepted as the triggering signal for the CL maintenance. In the cattle, it is the interferon-tau (IFNT) secretion form conceptus that prevents CL regression and ensures progesterone production for the establishment of pregnancy. In addition to endocrine and paracrine signals, microRNA (miRNA) can also support CL sustainability during early pregnancy. MiRNA are small non-coding nucleic acids that regulate gene expression post-transcriptionally and are shown to be involved in the modulation of CL function. However, the examination of miRNAs in corpus luteum function at the early pregnancy still remains largely uncovered. This study aims at profiling the expression of miRNA in CL during the early pregnancy in cattle by comparing it with the CL form late cycle and with the regressed CL. Corpora lutea were assigned in two different groups during the cycle (C13 group, late CL: days 13-18 and C18, regressed CL group: day >18) and during the early pregnancy (group P: 1-2 month). The estrous cycle was determined by macroscopic examination and to age the fetus crown-rump length measurement was applied. A total of 9 corpora lutea from individual animals were included in the study, three corpora lutea for each group. MiRNAs population was profiled using small RNA next-generation sequencing and biologically significant miRNAs were evaluated for their differential expression using the DESeq2-methodology. We show that 6 differentially expressed miRNAs (bta-mir-2890, -2332, -2441-3p, -148b, -1248 and -29c) are common to both comparisons, P vs C13 and P vs C18. While for each stage individually we have identified unique miRNAs differentially expressed only for the given comparison. bta-miR-23a and -769 were unique miRNAs differentially expressed in P vs C13, whereas forty-four unique miRNAs were identified as differentially expressed in P vs C18. These data confirm that miRNAs are highly abundant in luteal tissue during early pregnancy and potentially regulate the CL maintenance at this stage of fetus development.

Keywords: bovine, corpus luteum, microRNA, pregnancy, RNA-Seq

Procedia PDF Downloads 230
608 Evaluation of a Driver Training Intervention for People on the Autism Spectrum: A Multi-Site Randomized Control Trial

Authors: P. Vindin, R. Cordier, N. J. Wilson, H. Lee

Abstract:

Engagement in community-based activities such as education, employment, and social relationships can improve the quality of life for individuals with Autism Spectrum Disorder (ASD). Community mobility is vital to attaining independence for individuals with ASD. Learning to drive and gaining a driver’s license is a critical link to community mobility; however, for individuals with ASD acquiring safe driving skills can be a challenging process. Issues related to anxiety, executive function, and social communication may affect driving behaviours. Driving training and education aimed at addressing barriers faced by learner drivers with ASD can help them improve their driving performance. A multi-site randomized controlled trial (RCT) was conducted to evaluate the effectiveness of an autism-specific driving training intervention for improving the on-road driving performance of learner drivers with ASD. The intervention was delivered via a training manual and interactive website consisting of five modules covering varying driving environments starting with a focus on off-road preparations and progressing through basic to complex driving skill mastery. Seventy-two learner drivers with ASD aged 16 to 35 were randomized using a blinded group allocation procedure into either the intervention or control group. The intervention group received 10 driving lessons with the instructors trained in the use of an autism-specific driving training protocol, whereas the control group received 10 driving lessons as usual. Learner drivers completed a pre- and post-observation drive using a standardized driving route to measure driving performance using the Driving Performance Checklist (DPC). They also completed anxiety, executive function, and social responsiveness measures. The findings showed that there were significant improvements in driving performance for both the intervention (d = 1.02) and the control group (d = 1.15). However, the differences were not significant between groups (p = 0.614) or study sites (p = 0.842). None of the potential moderator variables (anxiety, cognition, social responsiveness, and driving instructor experience) influenced driving performance. This study is an important step toward improving community mobility for individuals with ASD showing that an autism-specific driving training intervention can improve the driving performance of leaner drivers with ASD. It also highlighted the complexity of conducting a multi-site design even when sites were matched according to geography and traffic conditions. Driving instructors also need more and clearer information on how to communicate with learner drivers with restricted verbal expression.

Keywords: autism spectrum disorder, community mobility, driving training, transportation

Procedia PDF Downloads 102
607 The Analysis of Deceptive and Truthful Speech: A Computational Linguistic Based Method

Authors: Seham El Kareh, Miramar Etman

Abstract:

Recently, detecting liars and extracting features which distinguish them from truth-tellers have been the focus of a wide range of disciplines. To the author’s best knowledge, most of the work has been done on facial expressions and body gestures but only few works have been done on the language used by both liars and truth-tellers. This paper sheds light on four axes. The first axis copes with building an audio corpus for deceptive and truthful speech for Egyptian Arabic speakers. The second axis focuses on examining the human perception of lies and proving our need for computational linguistic-based methods to extract features which characterize truthful and deceptive speech. The third axis is concerned with building a linguistic analysis program that could extract from the corpus the inter- and intra-linguistic cues for deceptive and truthful speech. The program built here is based on selected categories from the Linguistic Inquiry and Word Count program. Our results demonstrated that Egyptian Arabic speakers on one hand preferred to use first-person pronouns and present tense compared to the past tense when lying and their lies lacked of second-person pronouns, and on the other hand, when telling the truth, they preferred to use the verbs related to motion and the nouns related to time. The results also showed that there is a need for bigger data to prove the significance of words related to emotions and numbers.

Keywords: Egyptian Arabic corpus, computational analysis, deceptive features, forensic linguistics, human perception, truthful features

Procedia PDF Downloads 180
606 Corpus-Based Neural Machine Translation: Empirical Study Multilingual Corpus for Machine Translation of Opaque Idioms - Cloud AutoML Platform

Authors: Khadija Refouh

Abstract:

Culture bound-expressions have been a bottleneck for Natural Language Processing (NLP) and comprehension, especially in the case of machine translation (MT). In the last decade, the field of machine translation has greatly advanced. Neural machine translation NMT has recently achieved considerable development in the quality of translation that outperformed previous traditional translation systems in many language pairs. Neural machine translation NMT is an Artificial Intelligence AI and deep neural networks applied to language processing. Despite this development, there remain some serious challenges that face neural machine translation NMT when translating culture bounded-expressions, especially for low resources language pairs such as Arabic-English and Arabic-French, which is not the case with well-established language pairs such as English-French. Machine translation of opaque idioms from English into French are likely to be more accurate than translating them from English into Arabic. For example, Google Translate Application translated the sentence “What a bad weather! It runs cats and dogs.” to “يا له من طقس سيء! تمطر القطط والكلاب” into the target language Arabic which is an inaccurate literal translation. The translation of the same sentence into the target language French was “Quel mauvais temps! Il pleut des cordes.” where Google Translate Application used the accurate French corresponding idioms. This paper aims to perform NMT experiments towards better translation of opaque idioms using high quality clean multilingual corpus. This Corpus will be collected analytically from human generated idiom translation. AutoML translation, a Google Neural Machine Translation Platform, is used as a custom translation model to improve the translation of opaque idioms. The automatic evaluation of the custom model will be compared to the Google NMT using Bilingual Evaluation Understudy Score BLEU. BLEU is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. Human evaluation is integrated to test the reliability of the Blue Score. The researcher will examine syntactical, lexical, and semantic features using Halliday's functional theory.

Keywords: multilingual corpora, natural language processing (NLP), neural machine translation (NMT), opaque idioms

Procedia PDF Downloads 107
605 A Theoretical and Corpus-Based Analysis of English and Spanish Syntax Derived from Método de Los Relojes Verb Types According to Systemic-Functional Grammar as a Foundation for Methodological Adaption

Authors: Timothy William Lawrence

Abstract:

The goal of this paper is to research and categorize the four basic verb types found in the Spanish descriptive grammar book Método de los Relojes using verb clauses as representation as found in M.A.K. Halliday's Systemic-Functional Grammar with the purpose of establishing theoretical along with syntactical parallels and deviations between English and Spanish. Results confirm theoretical correlations exist therefore leading to an analysis of English grammar syntax resulting in delineating commonalities and differences from Spanish. Corpora searches were carried out on different patterns of syntactical structures confirming divergences in verb syntax, making it possible to establish parameters to adapt English verbs to the criteria of the four basic Método de los Relojes verb types.

Keywords: corpus studies, Método de los Relojes, structural-functional grammar, verb syntax

Procedia PDF Downloads 158
604 A Conundrum of Teachability and Learnability of Deaf Adult English as Second Language Learners in Pakistani Mainstream Classrooms: Integration or Elimination

Authors: Amnah Moghees, Saima Abbas Dar, Muniba Saeed

Abstract:

Teaching a second language to deaf learners has always been a challenge in Pakistan. Different approaches and strategies have been followed, but they have been resulted into partial or complete failure. The study aims to investigate the language problems faced by adult deaf learners of English as second language in mainstream classrooms. Moreover, the study also determines the factors which are very much involved in language teaching and learning in mainstream classes. To investigate the language problems, data will be collected through writing samples of ten deaf adult learners and ten normal ESL learners of the same class; whereas, observation in inclusive language teaching classrooms and interviews from five ESL teachers in inclusive classes will be conducted to know the factors which are directly or indirectly involved in inclusive language education. Keeping in view this study, qualitative research paradigm will be applied to analyse the corpus. The study figures out that deaf ESL learners face severe language issues such as; odd sentence structures, subject and verb agreement violation, misappropriation of verb forms and tenses as compared to normal ESL learners. The study also predicts that in mainstream classrooms there are multiple factors which are affecting the smoothness of teaching and learning procedure; role of mediator, level of deaf learners, empathy of normal learners towards deaf learners and language teacher’s training.

Keywords: deaf English language learner, empathy, mainstream classrooms, previous language knowledge of learners, role of mediator, language teachers' training

Procedia PDF Downloads 140
603 The Autonomy Use of Preparatory School Students to Learn English Language

Authors: Mi̇hri̇ban Müge Aras

Abstract:

The present study aims to investigate the learner autonomy usage of prep school students. This research focuses on the prep school students' autonomy habits according to their self-regulated studies, age and duration of learning English. The research also analyzes whether prep school students have strong autonomy to learn the English language or depend on teachers and English classes only. The participants of the study consisted of 32 prep school students. The "Likert- type of questionnaire " was adopted by the researcher from the survey of Dede (2017). The scale was a one-dimensional 4-Likert type, which has the options of 1=never, 2= sometimes, 3=often, and 4=always. There are 19 questions in the questionnaire to understand the autonomy of students when they try to learn English. Descriptive statistics and OneANOVA were used to analyze the data. The results of the study showed that there is no significant correlation between their ages and their duration of learning English according to their autonomy studies for English.

Keywords: learner autonomy, self-regulated learning, independent learning, English language learning, prep school students

Procedia PDF Downloads 195
602 A Corpus-Linguistic Analysis of Online Iranian News Coverage on Syrian Revolution

Authors: Amaal Ali Al-Gamde

Abstract:

The Syrian revolution is a major issue in the Middle East, which draws in world powers and receives a great focus in international mass media since 2011. The heavy global reliance on cyber news and digital sources plays a key role in conveying a sense of bias to a wide range of online readers. Thus, based on the assumption that media discourse possesses ideological implications, this study investigates the representation of Syrian revolution in online media. The paper explores the discursive constructions of anti and pro-government powers in Syrian revolution in 1000,000-word corpus of Fars online reports (an Iranian news agency), issued between 2013 and 2015. Taking a corpus assisted discourse analysis approach, the analysis investigates three types of lexicosemantic relations, the semantic macrostructures within which the two social actors are framed, the lexical collocations characterizing the news discourse and the discourse prosodies they tell about the two sides of the conflict. The study utilizes computer-based approaches, sketch engine and AntConc software to minimize the bias of the subjective analysis. The analysis moves from the insights of lexical frequencies and keyness scores to examine themes and the collocational patterns. The findings reveal the Fars agency’s ideological mode of representations in reporting events of Syrian revolution in two ways. The first is by stereotyping the opposition groups under the umbrella of terrorism, using words such as (law breakers, foreign-backed groups, militant groups, terrorists) to legitimize the atrocities of security forces against protesters and enhance horror among civilians. The second is through emphasizing the power of the government and depicting it as the defender of the Arab land by foregrounding the discourse of international conspiracy against Syria. The paper concludes discussing the potential importance of triangulating corpus linguistic tools with critical discourse analysis to elucidate more about discourses and reality.

Keywords: discourse prosody, ideology, keyness, semantic macrostructure

Procedia PDF Downloads 108
601 Number Variation of the Personal Pronoun We in American Spoken English

Authors: Qiong Hu, Ming Yue

Abstract:

Language variation signals the newest usage of language community, which might become the developmental trend of that language. The personal pronoun we is prescribed as a plural pronoun in grammar, but its number value is more flexible in actual use. Based on the homemade Friends corpus, the present research explores the number value of the first person pronoun we in nowadays American spoken English. With consideration of the subjectivity of we, this paper used ‘we+ PCU (Perception-cognation-utterance) verbs’ collocations and ‘we+ plural categories’ as the parameters. Results from corpus data and manual annotation show that: 1) the overall frequency of we has been increasing; 2) we has been increasingly used with other plural categories, indicating a weakening of its plural reference; and 3) we has been increasingly used with PCU (perception-cognition-utterance) verbs of strong subjectivity, indicating a strengthening of its singular reference. All these seem to support our hypothesis that we is undergoing the process of further grammaticalization towards a singular reference, though future evidence is needed to attest the bold prediction.

Keywords: number, PCU verbs, personal pronoun we,

Procedia PDF Downloads 203
600 The Construction of Malaysian Airline Tragedies in Malaysian and British Online News: A Multidisciplinary Study

Authors: Theng Theng Ong

Abstract:

This study adopts a multidisciplinary method by combining the corpus-based discourse analysis study and language attitude study to explore the construction of Malaysia airline tragedies: MH370, MH17 and QZ8501 in the selected Malaysian and United Kingdom (UK) online news. The study aims to determine the ways in which Malaysian Airline tragedies MH370, MH17 and QZ8501 are linguistically defined and constructed in terms of keyword and collocation. The study also seeks to identify the types of discourse that are presented in the new articles. The differences or similarities in terms of keywords, topics or issues covered by the selected Malaysian and UK news media will also be examined. Finally, the language attitude study will be carried out to examine the Malaysia and UK university students’ attitudes toward the keywords, topics or issues covered by the selected Malaysian and UK news media pertaining to Malaysian Airline tragedies MH370, MH17 and QZ8501. The analysis is divided into two parts with the first part focusing on corpus-based discourse analysis on the media text. The second part of the study is to investigate Malaysians and UK news readers’ attitudes towards the online news being reported by the Malaysian and UK news media pertaining to the Airline tragedies. The main findings of corpus-based discourse analysis are essential in designing the questions in the questionnaires and interview and therefore led to the identification of the attitudes among Malaysian and UK news. This study adopts a multidisciplinary method by combining the corpus-based discourse analysis study and language attitude study to explore the construction of Malaysia airline tragedies: MH370, MH17 and QZ8501 in the selected Malaysian and United Kingdom (UK) online news. The study aims to determine the ways in which Malaysian Airline tragedies MH370, MH17 and QZ8501 are linguistically defined and constructed in terms of keyword and collocation. The study also seeks to identify the types of discourse that are presented in the new articles. The differences or similarities in terms of keywords, topics or issues covered by the selected Malaysian and UK news media will also be examined. Finally, the language attitude study will be carried out to examine the Malaysia and UK university students’ attitudes toward the keywords, topics or issues covered by the selected Malaysian and UK news media pertaining to Malaysian Airline tragedies MH370, MH17 and QZ8501. The analysis is divided into two parts with the first part focusing on corpus-based discourse analysis on the media text. The second part of the study is to investigate Malaysians and UK news readers’ attitudes towards the online news being reported by the Malaysian and UK news media pertaining to the Airline tragedies. The main findings of corpus-based discourse analysis are essential in designing the questions in the questionnaires and interview and therefore led to the identification of the attitudes among Malaysian and UK news.

Keywords: corpus linguistics, critical discourse analysis, news media, tragedies study

Procedia PDF Downloads 313
599 The Relationship between Iranian EFL Learners' Multiple Intelligences and Their Performance on Grammar Tests

Authors: Rose Shayeghi, Pejman Hosseinioun

Abstract:

The Multiple Intelligences theory characterizes human intelligence as a multifaceted entity that exists in all human beings with varying degrees. The most important contribution of this theory to the field of English Language Teaching (ELT) is its role in identifying individual differences and designing more learner-centered programs. The present study aims at investigating the relationship between different elements of multiple intelligence and grammar scores. To this end, 63 female Iranian EFL learner selected from among intermediate students participated in the study. The instruments employed were a Nelson English language test, Michigan Grammar Test, and Teele Inventory for Multiple Intelligences (TIMI). The results of Pearson Product-Moment Correlation revealed a significant positive correlation between grammatical accuracy and linguistic as well as interpersonal intelligence. The results of Stepwise Multiple Regression indicated that linguistic intelligence contributed to the prediction of grammatical accuracy.

Keywords: multiple intelligence, grammar, ELT, EFL, TIMI

Procedia PDF Downloads 461
598 Identifying the Mindset of Deaf Benildean Students in Learning Anatomy and Physiology

Authors: Joanne Rieta Miranda

Abstract:

Learning anatomy and physiology among Deaf Non-Science major students is a challenge. They have this mindset that Anatomy and Physiology are difficult and very technical. In this study, nine (9) deaf students who are business majors were considered. Non-conventional teaching strategies and classroom activities were employed such as cooperative learning, virtual lab, Facebook live, big sky, blood typing, mind mapping, reflections, etc. Of all the activities; the deaf students ranked cooperative learning as the best learning activity. This is where they played doctors. They measured the pulse rate, heart rate and blood pressure of their partner classmate. In terms of mindset, 2 out of 9 students have a growth mindset with some fixed ideas while 7 have a fixed mindset with some growth ideas. All the students passed the course. Three out of nine students got a grade of 90% and above. The teacher was evaluated by the deaf students as very satisfactory with a mean score of 3.54. This means that the learner-centered practices in the classroom are manifested to a great extent.

Keywords: deaf students, learning anatomy and physiology, teaching strategies, learner-entered practices

Procedia PDF Downloads 201
597 Developing the Skills of Reading Comprehension of Learners of English as a Second Language

Authors: Indu Gamage

Abstract:

Though commonly utilized as a language improvement technique, reading has not been fully employed by both language teachers and learners to develop reading comprehension skills in English as a second language. In a Sri Lankan context, this area has to be delved deep into as the learners’ show more propensity to analyze. Reading comprehension is an area that most language teachers and learners struggle with though it appears easy. Most ESL learners engage in reading tasks without being properly aware of the objective of doing reading comprehension. It is observed that when doing reading tasks, the language learners’ concern is more on the meanings of individual words than on the overall comprehension of the given text. The passiveness with which the ESL learners engage themselves in reading comprehension makes reading a tedious task for the learner thereby giving the learner a sense of disappointment at the end. Certain reading tasks take the form of translations. The active cognitive participation of the learner in the mode of using productive strategies for predicting, employing schemata and using contextual clues seems quite less. It was hypothesized that the learners’ lack of knowledge of the productive strategies of reading was the major obstacle that makes reading comprehension a tedious task for them. This study is based on a group of 30 tertiary students who read English only as a fundamental requirement for their degree. They belonged to the Faculty of Humanities and Social Sciences of the University of Ruhuna, Sri Lanka. Almost all learners hailed from areas where English was hardly utilized in their day to day conversations. The study is carried out in the mode of a questionnaire to check their opinions on reading and a test to check whether the learners are using productive strategies of reading when doing reading comprehension tasks. The test comprised reading questions covering major productive strategies for reading. Then the results were analyzed to see the degree of their active engagement in comprehending the text. The findings depicted the validity of the hypothesis as grounds behind the difficulties related to reading comprehension.

Keywords: reading, comprehension, skills, reading strategies

Procedia PDF Downloads 148
596 Learning-by-Heart vs. Learning by Thinking: Fostering Thinking in Foreign Language Learning A Comparison of Two Approaches

Authors: Danijela Vranješ, Nataša Vukajlović

Abstract:

Turning to learner-centered teaching instead of the teacher-centered approach brought a whole new perspective into the process of teaching and learning and set a new goal for improving the educational process itself. However, recently a tremendous decline in students’ performance on various standardized tests can be observed, above all on the PISA-test. The learner-centeredness on its own is not enough anymore: the students’ ability to think is deteriorating. Especially in foreign language learning, one can encounter a lot of learning by heart: whether it is grammar or vocabulary, teachers often seem to judge the students’ success merely on how well they can recall a specific word, phrase, or grammar rule, but they rarely aim to foster their ability to think. Convinced that foreign language teaching can do both, this research aims to discover how two different approaches to teaching foreign language foster the students’ ability to think as well as to what degree they help students get to the state-determined level of foreign language at the end of the semester as defined in the Common European Framework. For this purpose, two different curricula were developed: one is a traditional, learner-centered foreign language curriculum that aims at teaching the four competences as defined in the Common European Framework and serves as a control variable, whereas the second one has been enriched with various thinking routines and aims at teaching the foreign language as a means to communicate ideas and thoughts rather than reducing it to the four competences. Moreover, two types of tests were created for each approach, each based on the content taught during the semester. One aims to test the students’ competences as defined in the CER, and the other aims to test the ability of students to draw on the knowledge gained and come to their own conclusions based on the content taught during the semester. As it is an ongoing study, the results are yet to be interpreted.

Keywords: common european framework of reference, foreign language learning, foreign language teaching, testing and assignment

Procedia PDF Downloads 71
595 We Are the 99 percent – the Occupy-Movement in Social Media

Authors: Wolfram Karg

Abstract:

The Occupy-Movement came into in 2011 existence in the US as a reaction to one of the worst economic crisis since World War II. With cuts in benefits and social services, with people being evicted from their homes on the one hand and high bonuses granted to their managers of the very same companies, a strong feeling of injustice besieged people in the US and caused them to voice their anger peacefully in social media and on the streets. Due to the world-wide-web, users all around the world read about this movement and recognized the same injustice in their own countries, making Occupy a global movement. The vast array of topics covered by Occupy offers a unique chance to carry out a corpus-based discourse analysis based on the DIMEAN-Model. The focus on this paper is limited to two aspects of DIMEAN: intertextual references and the use of connectors in texts. Because the discourse is to a large extent carried out via posts in blogs, online-articles and comments, the paper also analyses, in how far modern (i.e. computer-based media) there is a correlation between the use of connectors in different communicative types used by the Occupy-Movement.

Keywords: discourse, new media, occupy, corpus analysis

Procedia PDF Downloads 468
594 The Effectiveness of Computerized Dynamic Listening Assessment Informed by Attribute-Based Mediation Model

Authors: Yaru Meng

Abstract:

The study contributes to the small but growing literature around computerized approaches to dynamic assessment (C-DA), wherein individual items are accompanied by mediating prompts. Mediation in the current computerized dynamic listening assessment (CDLA) was informed by an attribute-based mediation model (AMM) that identified the underlying L2 listening cognitive abilities and associated descriptors. The AMM served to focus mediation during C-DA on particular cognitive abilities with a goal of specifying areas of learner difficulty. 86 low-intermediate L2 English learners from a university in China completed three listening assessments, with an experimental group receiving the CLDA system and a control group a non-dynamic assessment. As an assessment, the use of the AMM in C-DA generated detailed diagnoses for each learner. In addition, both within- and between-group repeated ANOVA found greater gains at the level of specific attributes among C-DA learners over the course of a 5-week study. Directions for future research are discussed.

Keywords: computerized dynamic assessment, effectiveness, English as foreign language listening, attribute-based mediation model

Procedia PDF Downloads 178
593 Lexical Collocations in Medical Articles of Non-Native vs Native English-Speaking Researchers

Authors: Waleed Mandour

Abstract:

This study presents multidimensional scrutiny of Benson et al.’s seven-category taxonomy of lexical collocations used by Egyptian medical authors and their peers of native-English speakers. It investigates 212 medical papers, all published during a span of 6 years (from 2013 to 2018). The comparison is held to the medical research articles submitted by native speakers of English (25,238 articles in total with over 103 million words) as derived from the Directory of Open Access Journals (a 2.7 billion-word corpus). The non-native speakers compiled corpus was properly annotated and marked-up manually by the researcher according to the standards of Weisser. In terms of statistical comparisons, though, deployed were the conventional frequency-based analysis besides the relevant criteria, such as association measures (AMs) in which LogDice is deployed as per the recommendation of Kilgariff et al. when comparing large corpora. Despite the terminological convergence in the subject corpora, comparison results confirm the previous literature of which the non-native speakers’ compositions reveal limited ranges of lexical collocations in terms of their distribution. However, there is a ubiquitous tendency of overusing the NS-high-frequency multi-words in all lexical categories investigated. Furthermore, Egyptian authors, conversely to their English-speaking peers, tend to embrace more collocations denoting quantitative rather than qualitative analyses in their produced papers. This empirical work, per se, contributes to the English for Academic Purposes (EAP) and English as a Lingua Franca in Academic settings (ELFA). In addition, there are pedagogical implications that would promote a better quality of medical research papers published in Egyptian universities.

Keywords: corpus linguistics, EAP, ELFA, lexical collocations, medical discourse

Procedia PDF Downloads 103