Search results for: multilingual corpora
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 239

Search results for: multilingual corpora

179 Contact Phenomena in Medieval Business Texts

Authors: Carmela Perta

Abstract:

Among the studies flourished in the field of historical sociolinguistics, mainly in the strand devoted to English history, during its Medieval and early modern phases, multilingual texts had been analysed using theories and models coming from contact linguistics, thus applying synchronic models and approaches to the past. This is true also in the case of contact phenomena which would transcend the writing level involving the language systems implicated in contact processes to the point of perceiving a new variety. This is the case for medieval administrative-commercial texts in which, according to some Scholars, the degree of fusion of Anglo-Norman, Latin and middle English is so high a mixed code emerges, and there are recurrent patterns of mixed forms. Interesting is a collection of multilingual business writings by John Balmayn, an Englishman overseeing a large shipment in Tuscany, namely the Cantelowe accounts. These documents display various analogies with multilingual texts written in England in the same period; in fact, the writer seems to make use of the above-mentioned patterns, with Middle English, Latin, Anglo-Norman, and the newly added Italian. Applying an atomistic yet dynamic approach to the study of contact phenomena, we will investigate these documents, trying to explore the nature of the switching forms they contain from an intra-writer variation perspective. After analysing the accounts and the type of multilingualism in them, we will take stock of the assumed mixed code nature, comparing the characteristics found in this genre with modern assumptions. The aim is to evaluate the possibility to consider the switching forms as core elements of a mixed code, used as professional variety among merchant communities, or whether such texts should be analysed from a switching perspective.

Keywords: historical sociolinguistics, historical code switching, letters, medieval england

Procedia PDF Downloads 46
178 Supporting Young Emergent Multilingual Learners in Prekindergarten Classrooms: Policy Implications

Authors: Tiedan Huang, Chun Zhang, Caitlin Coe

Abstract:

This study investigated the quality of instructional support for young Emergent Multilingual Learners (EMLs) in 50 Universal Prekindergarten (UPK) classroom in New York City (NYC). This is one of the first empirical studies examining the instructional support for this student population. We collected data using a mixed method of structured observations of teacher-child interactions in 50 classrooms, and surveys and interviews with program leaders and the teaching teams. We found that NYC’s UPK classrooms offered warm and supportive environments for EMLs. Nevertheless, in general, instructional support was relatively low. This study identified large mindset, knowledge, and practice gaps—and a real opportunity—among NYC early childhood professionals, specifically in the areas of providing adequate instructional and linguistic support for EMLs as well as partnering with families in capturing their cultural and home literacy assets. Consistent, rigorous, and meaningful use of data is necessary to support both EMLs’ language and literacy development and teachers’/leaders’ professional development.

Keywords: high quality instruction, culturally and linguistically responsive practices, professional development, workforce development

Procedia PDF Downloads 54
177 MicroRNA in Bovine Corpus Luteum during Early Pregnancy

Authors: Rreze Gecaj, Corina Schanzenbach, Benedikt Kirchner, Michael Pfaffl, Bajram Berisha

Abstract:

The maintenance of corpus lutem (CL) during early pregnancy in cattle is a critical and multifarious process. A luteotrophic mechanism originating from the embryo is widely accepted as the triggering signal for the CL maintenance. In the cattle, it is the interferon-tau (IFNT) secretion form conceptus that prevents CL regression and ensures progesterone production for the establishment of pregnancy. In addition to endocrine and paracrine signals, microRNA (miRNA) can also support CL sustainability during early pregnancy. MiRNA are small non-coding nucleic acids that regulate gene expression post-transcriptionally and are shown to be involved in the modulation of CL function. However, the examination of miRNAs in corpus luteum function at the early pregnancy still remains largely uncovered. This study aims at profiling the expression of miRNA in CL during the early pregnancy in cattle by comparing it with the CL form late cycle and with the regressed CL. Corpora lutea were assigned in two different groups during the cycle (C13 group, late CL: days 13-18 and C18, regressed CL group: day >18) and during the early pregnancy (group P: 1-2 month). The estrous cycle was determined by macroscopic examination and to age the fetus crown-rump length measurement was applied. A total of 9 corpora lutea from individual animals were included in the study, three corpora lutea for each group. MiRNAs population was profiled using small RNA next-generation sequencing and biologically significant miRNAs were evaluated for their differential expression using the DESeq2-methodology. We show that 6 differentially expressed miRNAs (bta-mir-2890, -2332, -2441-3p, -148b, -1248 and -29c) are common to both comparisons, P vs C13 and P vs C18. While for each stage individually we have identified unique miRNAs differentially expressed only for the given comparison. bta-miR-23a and -769 were unique miRNAs differentially expressed in P vs C13, whereas forty-four unique miRNAs were identified as differentially expressed in P vs C18. These data confirm that miRNAs are highly abundant in luteal tissue during early pregnancy and potentially regulate the CL maintenance at this stage of fetus development.

Keywords: bovine, corpus luteum, microRNA, pregnancy, RNA-Seq

Procedia PDF Downloads 230
176 The Value of Computerized Corpora in EFL Textbook Design: The Case of Modal Verbs

Authors: Lexi Li

Abstract:

This study aims to contribute to the field of how computer technology can be exploited to enhance EFL textbook design. Specifically, the study demonstrates how computerized native and learner corpora can be used to enhance modal verb treatment in EFL textbooks. The linguistic focus is will, would, can, could, may, might, shall, should, must. The native corpus is the spoken component of BNC2014 (hereafter BNCS2014). The spoken part is chosen because the pedagogical purpose of the textbooks is communication-oriented. Using the standard query option of CQPweb, 5% of each of the nine modals was sampled from BNCS2014. The learner corpus is the POS-tagged Ten-thousand English Compositions of Chinese Learners (TECCL). All the essays under the “secondary school” section were selected. A series of five secondary coursebooks comprise the textbook corpus. All the data in both the learner and the textbook corpora are retrieved through the concordance functions of WordSmith Tools (version, 5.0). Data analysis was divided into two parts. The first part compared the patterns of modal verbs in the textbook corpus and BNC2014 with respect to distributional features, semantic functions, and co-occurring constructions to examine whether the textbooks reflect the authentic use of English. Secondly, the learner corpus was compared with the textbook corpus in terms of the use (distributional features, semantic functions, and co-occurring constructions) in order to examine the degree of influence of the textbook on learners’ use of modal verbs. Moreover, the learner corpus was analyzed for the misuse (syntactic errors, e.g., she can sings*.) of the nine modal verbs to uncover potential difficulties that confront learners. The results indicate discrepancies between the textbook presentation of modal verbs and authentic modal use in natural discourse in terms of distributions of frequencies, semantic functions, and co-occurring structures. Furthermore, there are consistent patterns of use between the learner corpus and the textbook corpus with respect to the three above-mentioned aspects, except could, will and must, partially confirming the correlation between the frequency effects and L2 grammar acquisition. Further analysis reveals that the exceptions are caused by both positive and negative L1 transfer, indicating that the frequency effects can be intercepted by L1 interference. Besides, error analysis revealed that could, would, should and must are the most difficult for Chinese learners due to both inter-linguistic and intra-linguistic interference. The discrepancies between the textbook corpus and the native corpus point to a need to adjust the presentation of modal verbs in the textbooks in terms of frequencies, different meanings, and verb-phrase structures. Along with the adjustment of modal verb treatment based on authentic use, it is important for textbook writers to take into consideration the L1 interference as well as learners’ difficulties in their use of modal verbs. The present study is a methodological showcase of the combination both native and learner corpora in the enhancement of EFL textbook language authenticity and appropriateness for learners.

Keywords: EFL textbooks, learner corpus, modal verbs, native corpus

Procedia PDF Downloads 95
175 Ultrasonic Assessment of Corpora lutea and Plasma Progesterone Levels in Early Pregnant and Non Pregnant Cows

Authors: Abdurraouf O. Gaja, Salah Y. A. Al-Dahash, Guru Solmon Raju, Chikara Kubota

Abstract:

Corpus luteum cross sectional (by ultrasonography) and plasma progesterone (by DELFIA) were estimated in early pregnant and non pregnant cows on days 14th and 20th to 23rd post insemination. On day 14th, corpus luteum sectional area was 348.43 mm2 in pregnant and 387.84mm2 in non pregnant cows. Within days 20th to 23rd, corpus luteum sectional area ranged between 342.06 and 367.90 mm2 in pregnant and between 193.85 and 270.69 mm2 in non pregnant cows. Plasma progesterone level was 2.43 ng/ml in pregnant and 2.46 ng/ml in non pregnant cows on day 14th, while during days 20th to 23rd the level ranged between 2.47 and 2,84 ng/ml in pregnant and between 0.53 and 1.17 ng/ml in non pregnant cows. Results of both luteal tissue areas as well as plasma progesterone levels were highly significantly deferent (P<0.01) between pregnant and non pregnant cows during days 20th to 23rd, but there were no significant differences on day 14th. The correlation between CL cross-sectional area and plasma progesterone level was 0.4 in pregnant cows and 0.99 in non pregnant cow. It is clear, from this study, that ultrasonic assessment of corpora lutea is a viable alternative to determine plasma progesterone levels for early pregnancy diagnosis in cows.

Keywords: progesterone, ultrasonography, corpus luteum, pregnancy diagnosis, cow

Procedia PDF Downloads 281
174 Corpus Stylistics and Multidimensional Analysis for English for Specific Purposes Teaching and Assessment

Authors: Svetlana Strinyuk, Viacheslav Lanin

Abstract:

Academic English has become lingua franca for international scientific community which stimulates universities to introduce English for Specific Purposes (EAP) courses into curriculum. Teaching L2 EAP students might be fulfilled with corpus technologies and digital stylistics. A special software developed to reach the manifold task of teaching, assessing and researching academic writing of L2 students on basis of digital stylistics and multidimensional analysis was created. A set of annotations (style markers) – grammar, lexical and syntactic features most significant of academic writing was built. Contrastive comparison of two corpora “model corpus”, subject domain limited papers published by competent writers in leading academic journals, and “students’ corpus”, subject domain limited papers written by last year students allows to receive data about the features of academic writing underused or overused by L2 EAP student. Both corpora are tagged with a special software created in GATE Developer. Style markers within the framework of research might be replaced depending on the relevance and validity of the result which is achieved from research corpora. Thus, selecting relevant (high frequency) style markers and excluding less relevant, i.e. less frequent annotations, high validity of the model is achieved. Software allows to compare the data received from processing model corpus to students’ corpus and get reports which can be used in teaching and assessment. The less deviation from the model corpus students demonstrates in their writing the higher is academic writing skill acquisition. The research showed that several style markers (hedging devices) were underused by L2 EAP students whereas lexical linking devices were used excessively. A special software implemented into teaching of EAP courses serves as a successful visual aid, makes assessment more valid; it is indicative of the degree of writing skill acquisition, and provides data for further research.

Keywords: corpus technologies in EAP teaching, multidimensional analysis, GATE Developer, corpus stylistics

Procedia PDF Downloads 161
173 Multilingual Students Acting as Language Brokers in Italy: Their Points of View and Feelings towards This Activity

Authors: Federica Ceccoli

Abstract:

Italy is undergoing one of its largest migratory waves, and Italian schools are reporting the highest numbers of multilingual students coming from immigrant families and speaking minority languages. For these pupils, who have not perfectly acquired their mother tongue yet, learning a second language may represent a burden on their linguistic development and may have some repercussions on their school performances and relational skills. These are some of the reasons why they have turned out to be those who have the worst grades and the highest school drop-out rates. However, despite these negative outcomes, it has been demonstrated that multilingual immigrant students frequently act as translators or language brokers for their peers or family members who do not speak Italian fluently. This activity has been defined as Child Language Brokering (hereinafter CLB) and it has become a common practice especially in minority communities as immigrants’ children often learn the host language much more quickly than their parents, thus contributing to their family life by acting as language and cultural mediators. This presentation aims to analyse the data collected by a research carried out during the school year 2014-2015 in the province of Ravenna, in the Northern Italian region of Emilia-Romagna, among 126 immigrant students attending junior high schools. The purpose of the study was to analyse by means of a structured questionnaire whether multilingualism matched with language brokering experiences or not and to examine the perspectives of those students who reported having acted as translators using their linguistic knowledge to help people understand each other. The questionnaire consisted of 34 items roughly divided into 2 sections. The first section required multilingual students to provide personal details like their date and place of birth, as well as details about their families (number of siblings, parents’ jobs). In the second section, they were asked about the languages spoken in their families as well as their language brokering experience. The in-depth questionnaire sought to investigate a wide variety of brokering issues such as frequency and purpose of the activity, where, when and which documents young language brokers translate and how they feel about this practice. The results have demonstrated that CLB is a very common practice among immigrants’ children living in Ravenna and almost all students reported positive feelings when asked about their brokering experience with their families and also at school. In line with previous studies, responses to the questionnaire item regarding the people they brokered for revealed that the category ranking first is parents. Similarly, language-brokering activities tend to occur most often at home and the documents they translate the most (either orally or in writing) are notes from teachers. Such positive feelings towards this activity together with the evidence that it occurs very often in schools have laid the foundation for further projects on how this common practice may be valued and used to strengthen the linguistic skills of these multilingual immigrant students and thus their school performances.

Keywords: immigration, language brokering, multilingualism, students' points of view

Procedia PDF Downloads 158
172 Children and Communities Benefit from Mother-Tongue Based Multi-Lingual Education

Authors: Binay Pattanayak

Abstract:

Multilingual state, Jharkhand is home to more than 19 tribal and regional languages. These are used by more than 33 communities in the state. The state has declared 12 of these languages as official languages of the state. However, schools in the state do not recognize any of these community languages even in early grades! Children, who speak in their mother tongues at home, local market and playground, find it very difficult to understand their teacher and textbooks in school. They fail to acquire basic literacy and numeracy skills in early grades. Out of frustration due to lack of comprehension, the majority of children leave school. Jharkhand sees the highest dropout in early grades in India. To address this, the state under the guidance of the author designed a mother tongue based pre-school education programme named Bhasha Puliya and bilingual picture dictionaries in 9 tribal and regional mother tongues of children. This contributed significantly to children’s school readiness in the school. Followed by this, the state designed a mother-tongue based multilingual education programme (MTB-MLE) for multilingual context. The author guided textbook development in 5 tribal (Santhali, Mundari, Ho, Kurukh and Kharia) and two regional (Odia and Bangla) languages. Teachers and community members were trained for MTB-MLE in around 1,000 schools of the concerned language pockets. Community resource groups were constituted along with their academic calendars in each school to promote story-telling, singing, painting, dancing, riddles, etc. with community support. This, on the one hand, created rich learning environments for children. On the other hand, the communities have discovered a great potential in the process of developing a wide variety of learning materials for children in own mother-tongue using their local stories, songs, riddles, paintings, idioms, skits, etc. as a process of their literary, cultural and technical enrichment. The majority of children are acquiring strong early grade reading skills (basic literacy and numeracy) in grades I-II thereby getting well prepared for higher studies. In a phased manner they are learning Hindi and English after 4-5 years of MTB-MLE using the foundational language learning skills. Community members have started designing new books, audio-visual learning materials in their mother-tongues seeing a great potential for their cultural and technological rejuvenation.

Keywords: community resource groups, MTB-MLE, multilingual, socio-linguistic survey, learning

Procedia PDF Downloads 168
171 Testing the Simplification Hypothesis in Constrained Language Use: An Entropy-Based Approach

Authors: Jiaxin Chen

Abstract:

Translations have been labeled as more simplified than non-translations, featuring less diversified and more frequent lexical items and simpler syntactic structures. Such simplified linguistic features have been identified in other bilingualism-influenced language varieties, including non-native and learner language use. Therefore, it has been proposed that translation could be studied within a broader framework of constrained language, and simplification is one of the universal features shared by constrained language varieties due to similar cognitive-physiological and social-interactive constraints. Yet contradicting findings have also been presented. To address this issue, this study intends to adopt Shannon’s entropy-based measures to quantify complexity in language use. Entropy measures the level of uncertainty or unpredictability in message content, and it has been adapted in linguistic studies to quantify linguistic variance, including morphological diversity and lexical richness. In this study, the complexity of lexical and syntactic choices will be captured by word-form entropy and pos-form entropy, and a comparison will be made between constrained and non-constrained language use to test the simplification hypothesis. The entropy-based method is employed because it captures both the frequency of linguistic choices and their evenness of distribution, which are unavailable when using traditional indices. Another advantage of the entropy-based measure is that it is reasonably stable across languages and thus allows for a reliable comparison among studies on different language pairs. In terms of the data for the present study, one established (CLOB) and two self-compiled corpora will be used to represent native written English and two constrained varieties (L2 written English and translated English), respectively. Each corpus consists of around 200,000 tokens. Genre (press) and text length (around 2,000 words per text) are comparable across corpora. More specifically, word-form entropy and pos-form entropy will be calculated as indicators of lexical and syntactical complexity, and ANOVA tests will be conducted to explore if there is any corpora effect. It is hypothesized that both L2 written English and translated English have lower entropy compared to non-constrained written English. The similarities and divergences between the two constrained varieties may provide indications of the constraints shared by and peculiar to each variety.

Keywords: constrained language use, entropy-based measures, lexical simplification, syntactical simplification

Procedia PDF Downloads 63
170 Learning Vocabulary with SkELL: Developing a Methodology with University Students in Japan Using Action Research

Authors: Henry R. Troy

Abstract:

Corpora are becoming more prevalent in the language classroom, especially in the development of dictionaries and course materials. Nevertheless, corpora are still perceived by many educators as difficult to use directly in the classroom, a process which is also known as “data-driven learning” (DDL). Action research has been identified as a method by which DDL’s efficiency can be increased, but it is also an approach few studies on DDL have employed. Studies into the effectiveness of DDL in language education in Japan are also rare, and investigations focused more on student and teacher reactions rather than pre and post-test scores are rarer still. This study investigates the student and teacher reactions to the use of SkELL, a free online corpus designed to be user-friendly, for vocabulary learning at a university in Japan. Action research is utilized to refine the teaching methodology, with changes to the method based on student and teacher feedback received via surveys submitted after each of the four implementations of DDL. After some training, the students used tablets to study the target vocabulary autonomously in pairs and groups, with the teacher acting as facilitator. The results show that the students enjoyed using SkELL and felt it was effective for vocabulary learning, while the teaching methodology grew in efficiency throughout the course. These findings suggest that action research can be a successful method for increasing the efficacy of DDL in the language classroom, especially with teachers and students who are new to the practice.

Keywords: action research, corpus linguistics, data-driven learning, vocabulary learning

Procedia PDF Downloads 207
169 The Mirage of Progress? a Longitudinal Study of Japanese Students’ L2 Oral Grammar

Authors: Robert Long, Hiroaki Watanabe

Abstract:

This longitudinal study examines the grammatical errors of Japanese university students’ dialogues with a native speaker over an academic year. The L2 interactions of 15 Japanese speakers were taken from the JUSFC2018 corpus (April/May 2018) and the JUSFC2019 corpus (January/February). The corpora were based on a self-introduction monologue and a three-question dialogue; however, this study examines the grammatical accuracy found in the dialogues. Research questions focused on a possible significant difference in grammatical accuracy from the first interview session in 2018 and the second one the following year, specifically regarding errors in clauses per 100 words, global errors and local errors, and with specific errors related to parts of speech. The investigation also focused on which forms showed the least improvement or had worsened? Descriptive statistics showed that error-free clauses/errors per 100 words decreased slightly while clauses with errors/100 words increased by one clause. Global errors showed a significant decline, while local errors increased from 97 to 158 errors. For errors related to parts of speech, a t-test confirmed there was a significant difference between the two speech corpora with more error frequency occurring in the 2019 corpus. This data highlights the difficulty in having students self-edit themselves.

Keywords: clause analysis, global vs. local errors, grammatical accuracy, L2 output, longitudinal study

Procedia PDF Downloads 101
168 The Istrian Istrovenetian-Croatian Bilingual Corpus

Authors: Nada Poropat Jeletic, Gordana Hrzica

Abstract:

Bilingual conversational corpora represent a meaningful and the most comprehensive data source for investigating the genuine contact phenomena in non-monitored bi-lingual speech productions. They can be particularly useful for bilingual research since some features of bilingual interaction can hardly be accessed with more traditional methodologies (e.g., elicitation tasks). The method of language sampling provides the resources for describing language interaction in a bilingual community and/or in bilingual situations (e.g. code-switching, amount of languages used, number of languages used, etc.). To capture these phenomena in genuine communication situations, such sampling should be as close as possible to spontaneous communication. Bilingual spoken corpus design is methodologically demanding. Therefore this paper aims at describing the methodological challenges that apply to the corpus design of the conversational corpus design of the Istrian Istrovenetian-Croatian Bilingual Corpus. Croatian is the first official language of the Croatian-Italian officially bilingual Istria County, while Istrovenetian is a diatopic subvariety of Venetian, a longlasting lingua franca in the Istrian peninsula, the mother tongue of the members of the Italian National Community in Istria and the primary code of informal everyday communication among the Istrian Italophone population. Within the CLARIN infrastructure, TalkBank is being used, as it provides relevant procedures for designing and analyzing bilingual corpora. Furthermore, it allows public availability allows for easy replication of studies and cumulative progress as a research community builds up around the corpus, while the tools developed within the field of corpus linguistics enable easy retrieval and analysis of information. The method of language sampling employed is kept at the level of spontaneous communication, in order to maximise the naturalness of the collected conversational data. All speakers have provided written informed consent in which they agree to be recorded at a random point within the period of one month after signing the consent. Participants are administered a background questionnaire providing information about the socioeconomic status and the exposure and language usage in the participants social networks. Recording data are being transcribed, phonologically adapted within a standard-sized orthographic form, coded and segmented (speech streams are being segmented into communication units based on syntactic criteria) and are being marked following the CHAT transcription system and its associated CLAN suite of programmes within the TalkBank toolkit. The corpus consists of transcribed sound recordings of 36 bilingual speakers, while the target is to publish the whole corpus by the end of 2020, by sampling spontaneous conversations among approximately 100 speakers from all the bilingual areas of Istria for ensuring representativeness (the participants are being recruited across three generations of native bilingual speakers in all the bilingual areas of the peninsula). Conversational corpora are still rare in TalkBank, so the Corpus will contribute to BilingBank as a highly relevant and scientifically reliable resource for an internationally established and active research community. The impact of the research of communities with societal bilingualism will contribute to the growing body of research on bilingualism and multilingualism, especially regarding topics of language dominance, language attrition and loss, interference and code-switching etc.

Keywords: conversational corpora, bilingual corpora, code-switching, language sampling, corpus design methodology

Procedia PDF Downloads 109
167 Translanguaging as a Decolonial Move in South African Bilingual Classrooms

Authors: Malephole Philomena Sefotho

Abstract:

Nowadays, it is a fact that the majority of people, worldwide, are bilingual rather than monolingual due to the surge of globalisation and mobility. Consequently, bilingual education is a topical issue of discussion among researchers. Several studies that have focussed on it have highlighted the importance and need for incorporating learners’ linguistic repertoires in multilingual classrooms and move away from the colonial approach which is a monolingual bias – one language at a time. Researchers pointed out that a systematic approach that involves the concurrent use of languages and not a separation of languages must be implemented in bilingual classroom settings. Translanguaging emerged as a systematic approach that assists learners to make meaning of their world and it involves allowing learners to utilize all their linguistic resources in their classrooms. The South African language policy also room for diverse languages use in bi/multilingual classrooms. This study, therefore, sought to explore how teachers apply translanguaging in bilingual classrooms in incorporating learners’ linguistic repertoires. It further establishes teachers’ perspectives in the use of more than one language in teaching and learning. The participants for this study were language teachers who teach at bilingual primary schools in Johannesburg in South Africa. Semi-structured interviews were conducted to establish their perceptions on the concurrent use of languages. Qualitative research design was followed in analysing data. The findings showed that teachers were reluctant to allow translanguaging to take place in their classrooms even though they realise the importance thereof. Not allowing bilingual learners to use their linguistic repertoires has resulted in learners’ negative attitude towards their languages and contributed in learners’ loss of their identity. This article, thus recommends a drastic change to decolonised approaches in teaching and learning in multilingual settings and translanguaging as a decolonial move where learners are allowed to translanguage freely in their classroom settings for better comprehension and making meaning of concepts and/or related ideas. It further proposes continuous conversations be encouraged to bring eminent cultural and linguistic genocide to a halt.

Keywords: bilingualism, decolonisation, linguistic repertoires, translanguaging

Procedia PDF Downloads 144
166 Investigating University Language Teacher’s Perception of Their Identities in the Algerian Multilingual Context

Authors: Yousra Drissi

Abstract:

This research explores language teacher identity in a multilingual context where both teachers and students come from different linguistic backgrounds. It seeks to understand how teachers perceive themselves as language teachers in this context in relation to different influencing factors, both internal and external. This study is being conducted due to the importance of language teacher identity (LTI) in the university context, which is being neglected in the present literature (in an attempt to address the gap in the present literature). The broader aim of this study is to bring attention to language teacher identity along with the different influencing elements which can either promote or hinder its development. In this research, we are using the sociocultural theory and post-structural theory. This research uses the mixed methods approach to collect and analyse relevant data. A structured survey was distributed to language teachers from different universities around Algeria, followed by in-depth interviews. Results are supposed to show the different points in self-perception that these teachers share or differ in. they will also help us identify the different internal and external factors that can be of influence. However, the results of this research can be used by institutions as well as decision-makers to better understand university teachers and help them improve their teaching practices by empowering their language teacher identity, starting from teacher education programs to continuous teacher development programs.

Keywords: identity, language teacher identity, multilingualism, university teacher

Procedia PDF Downloads 49
165 Dialogism in Research Article Introductions Written by Iranian Non-Native and English Native Speaking Writers

Authors: Moharram Sharifi

Abstract:

Despite a growing interest in the study of the introduction section of Research Articles (RA), there have been few studies to investigate how academic writers engage with other voices and alternative positions in this academic genre. Therefore, the purpose of this study was to show how Native Speaker (NS) and (Non-Native Speaker (NNS) writers take positions and stances in research article introductions. For this purpose, Engagement resources based on the appraisal framework were investigated in sixty articles written by English NS and Iranian NNS published in applied linguistics journals. It was found that the mean occurrences of heteroglossic items in both corpora were larger than those of monoglossic items, but comparing the means of monoglossic engagements between the two corpora, it was revealed that NS writers’ corpus had larger mean occurrences of monoglossic engagements than NNS writers’ corpus implying the native’s stronger authorial stance in the texts. The results also revealed that there was no significant difference in the use of contractive and expansive engagements by NS writers (t (29) = -0.995, p>0.05), indicating a balanced use between the two options. However, the higher mean occurrences of expansive options compared with contractive options in the NNS corpus may suggest that NN writers open up more dialogic room for alternative positions in the RA introductions. The findings of this study may help writers to better perceive the creation of a strong authorial position using appropriate engagement resources in RA introductions.

Keywords: engagement, heteroglossic, monoglossic, introduction

Procedia PDF Downloads 16
164 A BERT-Based Model for Financial Social Media Sentiment Analysis

Authors: Josiel Delgadillo, Johnson Kinyua, Charles Mutigwe

Abstract:

The purpose of sentiment analysis is to determine the sentiment strength (e.g., positive, negative, neutral) from a textual source for good decision-making. Natural language processing in domains such as financial markets requires knowledge of domain ontology, and pre-trained language models, such as BERT, have made significant breakthroughs in various NLP tasks by training on large-scale un-labeled generic corpora such as Wikipedia. However, sentiment analysis is a strong domain-dependent task. The rapid growth of social media has given users a platform to share their experiences and views about products, services, and processes, including financial markets. StockTwits and Twitter are social networks that allow the public to express their sentiments in real time. Hence, leveraging the success of unsupervised pre-training and a large amount of financial text available on social media platforms could potentially benefit a wide range of financial applications. This work is focused on sentiment analysis using social media text on platforms such as StockTwits and Twitter. To meet this need, SkyBERT, a domain-specific language model pre-trained and fine-tuned on financial corpora, has been developed. The results show that SkyBERT outperforms current state-of-the-art models in financial sentiment analysis. Extensive experimental results demonstrate the effectiveness and robustness of SkyBERT.

Keywords: BERT, financial markets, Twitter, sentiment analysis

Procedia PDF Downloads 123
163 The Dilemma of Translanguaging Pedagogy in a Multilingual University in South Africa

Authors: Zakhile Somlata

Abstract:

In the context of international linguistic and cultural diversity, all languages can be used for all purposes. Africa in general and South Africa, in particular, is not an exception to multilingual and multicultural society. The multilingual and multicultural nature of South African society has a direct bearing to the heterogeneity of South African Universities in general. Universities as the centers of research, innovation, and transformation of the entire society should be at the forefront in leading multilingualism. The universities in South Africa had been using English and to a certain extent Afrikaans as the only academic languages during colonialism and apartheid regime. The democratic breakthrough of 1994 brought linguistic relief in South Africa. The Constitution of the Republic of South Africa recognizes 11 official languages that should enjoy parity of esteem for the realization of multilingualism. The elevation of the nine previously marginalized indigenous African languages as academic languages in higher education is central to multilingualism. It is high time that Afrocentric model instead of Eurocentric model should be the one which underpins education system in South Africa at all levels. Almost all South African universities have their language policies that seek to promote access and success of students through multilingualism, but the main dilemma is the implementation of language policies. This study is significant to respond to two objectives: (i) To evaluate how selected institutions use language policies for accessibility and success of students. (ii) To study how selected universities integrate African languages for both academic and administrative purposes. This paper reflects the language policy practices in one selected University of Technology (UoT) in South Africa. The UoT has its own language policy which depicts linguistic diversity of the institution and its commitment to promote multilingualism. Translanguaging pedagogy which accommodates minority languages' usage in the teaching and learning process plays a pivotal role in promoting multilingualism. This research paper employs mixed methods (quantitative and qualitative research) approach. Qualitative data has been collected from the key informants (insiders and experts), while quantitative data has been collected from a cohort of third-year students. A mixed methods approach with its convergent parallel design allows the data to be collected separately, analysed separately but with the comparison of the results. Language development initiatives have been discussed within the framework of language policy and policy implementation strategies. Theoretically, this paper is rooted in language as a problem, language as a right and language as a resource. The findings demonstrate that despite being a multilingual institution, there is a perpetuation of marginalization of African languages to be used as academic languages. Findings further display the hegemony of English. The promotion of status quo compromises the promotion of multilingualism, Africanization of Higher Education and intellectualization of indigenous African languages in South Africa under a democratic dispensation.

Keywords: afro-centric model, hegemony of English, language as a resource, translanguaging pedagogy

Procedia PDF Downloads 163
162 A Sociolinguistic Investigation of Code-Switching Practices of ESL Students Outside EFL Classrooms

Authors: Shehroz Mukhtar, Maqsood Ahmed, Abdullah Mukhtar, Choudhry Shahid, Waqar Javaid

Abstract:

Code switching is a common phenomenon, generally observed in multilingual communities across the globe. A critical look at code switching literature reveals that mostly code switching has been studied in classroom in learning and teaching context while code switching outside classroom in settings such as café, hostel and so on have been the least explored areas. Current research investigated the reasons for code switching in the interactive practices of students and their perceptions regarding the same outside the classroom settings. This paper is the study of the common practice that prevails in the Universities of Sialkot that bilinguals mix two languages when they speak in different class room situations. In Pakistani classrooms where Multilingual are in abundance i.e. they can speak two or more than two languages at the same time, the code switching or language combination is very common. The teachers of Sialkot switch from one language to another consciously or unconsciously while teaching English in the class rooms. This phenomenon has not been explored in the Sialkot’s teaching context. In Sialkot private educational institutes does not encourage code-switching whereas the public or government institutes use it frequently. The crux of this research is to investigate and identify the importance of code switching by taking its users in consideration. Survey research method and survey questionnaire will be used to get exact data from teachers and students. We will try to highlight the functions and importance of code switching in foreign language classrooms of Sialkot and will explore why this trend is emerging in Sialkot.

Keywords: code switching, bilingual context, L1, L2

Procedia PDF Downloads 23
161 Primary School Teacher's Perception of the Efficacy of Mother Tongue-Based Multilingual Education (MTB-MLE) in Saint Louis University, Laboratory Elementary School

Authors: Villiam Ambong, Kevin Banawag, Wynne Shane Bugatan, Mark Alvin Jay Carpio, Hwan Hee Choi, Moises Kevin Chungalao

Abstract:

This survey research investigated the perception of primary school teachers on the efficacy of MTB-MLE in SLU-LES, Baguio City. SLU-LES has a total of 21 primary school teachers who served as respondents of this study in an attempt to answer the major questions regarding the efficacy of MTB-MLE among primary school teachers. A questionnaire was used in collecting the data which were analyzed using weighted mean and ANOVA. The questionnaire was validated by a statistician and it was administered to a school which does not differ from the intended respondents for further validation of the items. Findings revealed from the intended respondents that they perceive MTB-MLE as effective; however, they do not prefer the use of Mother Tongue as a medium of instruction. A research on the same topic was conducted in Ibadan, Nigeria by Dr. David O. Fakeye and although his respondents were students; the results came out that the respondents do perceive MTB-MLE to be efficacious. The results of this study also showed that years of teaching experience and the number of languages spoken by the teachers have no bearing on the preference of the respondents between MT medium and English medium gave that the respondents are in melting pot community. Comparative studies between rural and urban schools are encouraged. Future researchers should include questions that elicit reasons of the respondents on the efficacy of mother tongue as well as their preference between mother tongue medium and English.

Keywords: mother tongue, primary teachers, perception, multilingual education

Procedia PDF Downloads 247
160 Internationalization and Management of Linguistic Diversity In Multilingual Higher Education Institutions: Lecturers’ Experience From Three Universities in Europe

Authors: Argyro Maria Skourmalla

Abstract:

Internationalization and management of linguistic diversity in Higher Education (HE) have gained much attention in research in the last few years. Internationalization policies in HE aims at promoting the dual role of Higher Education Institutions (HEIs), civilization and competitiveness. In the context of the European Union, the European Education Area initiative aims at “inclusive national education and training systems” through networking and exchange between HEIs. However, the use of English as a ‘lingua academica’ in the place of the official, national, and regional/minority languages raises questions regarding linguistic diversity, linguistic rights and concerns that have to do with the scientific weakening of these languages. In fact, the European Civil Society Platform for Multilingualism, in the Declaration for Multilingualism in Higher Education, draws attention to the use of English at the expense of other regional/national languages and the impact of English-only language policy on an epistemological level. The above issues were brought up during semi-structured interviews with lecturing staff coming from three multilingual Universities in Europe. Lecturers shared their experiences and the practices they use to manage linguistic diversity in these three Universities. Findings show that even though different languages are used in teaching across disciplines, English -or ‘Globish’ as mentioned during an interview- is widely used in research. Despite English being accepted as the “lingua academica,” issues regarding loss of identity come up

Keywords: higher education, internationalization, linguistic diversity, teaching, research, English

Procedia PDF Downloads 55
159 Primary School Teachers’ Perception on the Efficacy of Mother Tongue-Based Multilingual Education (MTB-MLE) in Saint Louis University, Laboratory Elementary School

Authors: Villiam C. Ambong, Kevin G. Banawag, Wynne Shane B. Bugatan, Mark Alvin Jay R. Carpio, Hwan Hee Choi, Moses Kevin L. Chungalao

Abstract:

This survey research investigated the perception of primary school teachers on the efficacy of MTB-MLE in SLU-LES, Baguio City. SLU-LES has a total of 21 primary school teachers who served as the respondents of this study in an attempt to answer three major questions regarding the efficacy of MTB-MLE among primary school teachers. A questionnaire was used in collecting the data which were analyzed using weighted mean and ANOVA. The questionnaire was validated by a statistician and it was administered to a school which does not differ from the intended respondents for further validation of the items. Findings revealed from the intended respondents that they perceive MTB-MLE as effective; however, they do not prefer the use of Mother Tongue as medium of instruction. A research of the same topic was conducted in Ibadan, Nigeria by Dr. David O. Fakeye and although his respondents were students; the results came out that the respondents do perceive MTB-MLE to be efficacious. The results of this study also showed that years of teaching experience and number of languages spoken by the teachers have no bearing on the preference of the respondents between MT medium and English medium given that the respondents are in a melting pot community. Comparative studies between rural schools and urban schools are encouraged. Future researches should include questions that elicit reasons of the respondents on the efficacy of mother tongue as well as their preference between mother tongue medium and English.

Keywords: mother tongue, primary teachers, perception, multilingual education

Procedia PDF Downloads 424
158 A Sociolinguistic Investigation of Code-Switching Practices of ESL Students Outside EFL Classrooms

Authors: Shehroz Mukhtar, Maqsood Ahmed, Abdullah Mukhtar, Choudhry Shahid, Waqar Javaid

Abstract:

Code switching is a common phenomenon, generally observed in multilingual communities across the globe. A critical look at code-switching literature reveals that mostly code-switching has been studied in the classrooms in learning and teaching contexts, while code-switching outside the classroom in settings such as café, hostels and so on has been the least explored areas. The current research investigated the reasons for code-switching in the interactive practices of students and their perceptions regarding the same outside the classroom settings. This paper is the study of the common practice that prevails in the Universities of Sialkot that bilinguals mix two languages when they speak in different classroom situations. In Pakistani classrooms where Multilingual is in abundance, i.e. they can speak two or more two languages at the same time, code-switching or language combination is very common. The teachers of Sialkot switch from one language to another consciously or unconsciously while teaching English in the classrooms. This phenomenon has not been explored in Sialkot’s teaching context. In Sialkot, private educational institutes do not encourage code-switching, whereas public or government institutes use it frequently. The crux of this research is to investigate and identify the importance of code-switching by taking its users into consideration. The survey research method and survey questionnaire will be used to get exact data from teachers and students. We will try to highlight the functions and importance of code switching in foreign language classrooms of Sialkot and will explore why this trend is emerging in Sialkot.

Keywords: code switching, foreign language classrooms, bilingual context, use of L1, importance of L2.

Procedia PDF Downloads 23
157 Lexical Collocations in Medical Articles of Non-Native vs Native English-Speaking Researchers

Authors: Waleed Mandour

Abstract:

This study presents multidimensional scrutiny of Benson et al.’s seven-category taxonomy of lexical collocations used by Egyptian medical authors and their peers of native-English speakers. It investigates 212 medical papers, all published during a span of 6 years (from 2013 to 2018). The comparison is held to the medical research articles submitted by native speakers of English (25,238 articles in total with over 103 million words) as derived from the Directory of Open Access Journals (a 2.7 billion-word corpus). The non-native speakers compiled corpus was properly annotated and marked-up manually by the researcher according to the standards of Weisser. In terms of statistical comparisons, though, deployed were the conventional frequency-based analysis besides the relevant criteria, such as association measures (AMs) in which LogDice is deployed as per the recommendation of Kilgariff et al. when comparing large corpora. Despite the terminological convergence in the subject corpora, comparison results confirm the previous literature of which the non-native speakers’ compositions reveal limited ranges of lexical collocations in terms of their distribution. However, there is a ubiquitous tendency of overusing the NS-high-frequency multi-words in all lexical categories investigated. Furthermore, Egyptian authors, conversely to their English-speaking peers, tend to embrace more collocations denoting quantitative rather than qualitative analyses in their produced papers. This empirical work, per se, contributes to the English for Academic Purposes (EAP) and English as a Lingua Franca in Academic settings (ELFA). In addition, there are pedagogical implications that would promote a better quality of medical research papers published in Egyptian universities.

Keywords: corpus linguistics, EAP, ELFA, lexical collocations, medical discourse

Procedia PDF Downloads 103
156 Circadian Disruption in Polycystic Ovary Syndrome Model Rats

Authors: Fangfang Wang, Fan Qu

Abstract:

Polycystic ovary syndrome (PCOS), the most common endocrinopathy among women of reproductive age, is characterized by ovarian dysfunction, hyperandrogenism and reduced fecundity. The aim of this study is to investigate whether the circadian disruption is involved in pathogenesis of PCOS in androgen-induced animal model. We established a rat model of PCOS using single subcutaneous injection with testosterone propionate on the ninth day after birth, and confirmed their PCOS-like phenotypes with vaginal smears, ovarian hematoxylin and eosin (HE) staining and serum androgen measurement. The control group rats received the vehicle only. Gene expression was detected by real-time quantitative PCR. (1) Compared with control group, PCOS model rats of 10-week group showed persistently keratinized vaginal cells, while all the control rats showed at least two consecutive estrous cycles. (2) Ovarian HE staining and histological examination showed that PCOS model rats of 10-week group presented many cystic follicles with decreased numbers of granulosa cells and corpora lutea in their ovaries, while the control rats had follicles with normal layers of granulosa cells at various stages of development and several generations of corpora lutea. (3) In the 10-week group, serum free androgen index was notably higher in PCOS model rats than controls. (4) Disturbed mRNA expression patterns of core clock genes were found in ovaries of PCOS model rats of 10-week group. Abnormal expression of key genes associated with circadian rhythm in ovary may be one of the mechanisms for ovarian dysfunction in PCOS model rats induced by androgen.

Keywords: polycystic ovary syndrome, androgen, animal model, circadian disruption

Procedia PDF Downloads 199
155 Divergences in Interpreters’ Oral Interpretation among Pentecostal Churches: Sermonic Reflections

Authors: Rufus Olufemi Adebayo, Sylvia Phiwani Zulu

Abstract:

Interpreting in the setting of diverse language and multicultural congregants, is often understood as integrating the content of the message. Preaching, similar to any communication, takes seriously people’s multiple contexts. The one who provides the best insight into understanding “the other”, traditionally speaking could be an interpreter in a multilingual context. Nonetheless, there are reflections in the loss of spiritual communication, translation and interpretive dialogue. No matter how eloquent the preacher is, an interpreter can make or mere the sermon (speech). The sermon that the preacher preaches is not always the one the congregation hears from the interpreter. In other occurrences, however, interpreting can lead not only to distort messages but also to dissatisfied audiences and preacher being overshadowed by the pranks of the interpreter. Using qualitative methodology, this paper explores the challenges and the conventional assumptions about preachers’ interpreter as influenced by spirituality, culture, and language in empirical and theoretical perspectives. An emphasis on the bias translation and the basis of reality that suppresses or devalues the spiritual communication is examined. The result indicates that interpretation of the declaration of guilt, history of congregation, spirituality, attitudes, morals, customs, specific practices of a preacher, education, and the environment form an entangled and misinterpretation. The article concludes by re-examining these qualities and rearticulating them into a preliminary theory for practice, as distinguished from theory, which could possibly enhance the development of more sustainable multilingual interpretation in the South African Pentecostal churches.

Keywords: congregants, divergences, interpreting/translation, language & communication, sermon/preaching

Procedia PDF Downloads 127
154 The Omani Learner of English Corpus: Source and Tools

Authors: Anood Al-Shibli

Abstract:

Designing a learner corpus is not an easy task to accomplish because dealing with learners’ language has many variables which might affect the results of any study based on learners’ language production (spoken and written). Also, it is very essential to systematically design a learner corpus especially when it is aimed to be a reference to language research. Therefore, designing the Omani Learner Corpus (OLEC) has undergone many explicit and systematic considerations. These criteria can be regarded as the foundation to design any learner corpus to be exploited effectively in language use and language learning studies. Added to that, OLEC is manually error-annotated corpus. Error-annotation in learner corpora is very essential; however, it is time-consuming and prone to errors. Consequently, a navigating tool is designed to help the annotators to insert errors’ codes in order to make the error-annotation process more efficient and consistent. To assure accuracy, error annotation procedure is followed to annotate OLEC and some preliminary findings are noted. One of the main results of this procedure is creating an error-annotation system based on the Omani learners of English language production. Because OLEC is still in the first stages, the primary findings are related to only one level of proficiency and one error type which is verb related errors. It is found that Omani learners in OLEC has the tendency to have more errors in forming the verb and followed by problems in agreement of verb. Comparing the results to other error-based studies indicate that the Omani learners tend to have basic verb errors which can found in lower-level of proficiency. To this end, it is essential to admit that examining learners’ errors can give insights to language acquisition and language learning and most errors do not happen randomly but they occur systematically among language learners.

Keywords: error-annotation system, error-annotation manual, learner corpora, verbs related errors

Procedia PDF Downloads 114
153 Partial Triphallia: The First Case Report of External and Internal Penile Triplication in a Cadaver

Authors: Madeleine Gadd, Rose How, Edward Mathews, John Buchanan, Vicky Cottrell, Andre Coetzee, Karuna Katti

Abstract:

Introduction: Triphallia, a congenital anomaly describing the presence of three distinct penile shafts, has been reported only once in the literature. This case report describes the serendipitous discovery of the first reported human case of partial orthotopic triphallia during cadaveric dissection. Case Summary: Despite the normal appearance of external genitalia on examination, the dissection of a 78-year-old male revealed a remarkable anatomical variation: two small supernumerary penises situated in a transverse orientation postero inferiorly to the primary penis. The main and the larger supernumerary penile shafts displayed their own corpora cavernosa and glans penis, sharing a single urethra, which coursed through the secondary penis prior to its passage through the primary penis. The smallest of the supernumerary penises was similar in dimension to the secondary penis, at 3.7cm long and 1.2cm wide (compared to the secondary penis at 3.8cm long and 1.3cm wide). However, it lacked a urethra and a typical arrangement of the corpora cavernosa and spongiosum, making this a case of partial triphallia rather than true triphallia. Conclusion: This case report provides a comprehensive anatomical description of partial triphallia in a cadaver, shedding light on the morphology, embryology, and clinical implications of this anomaly. This case report underscores the importance of meticulous anatomical dissections, particularly since, without dissection, this anatomical variation would have remained undiscovered. Although we can only speculate the functional implications of this condition, understanding such anatomical variations contributes to both knowledge of human anatomy and clinical management, should the condition be encountered in living individuals.

Keywords: triphallia, diphallia, congenital abnormalities, genitourinary abnormalities, urology

Procedia PDF Downloads 39
152 Online Multilingual Dictionary Using Hamburg Notation for Avatar-Based Indian Sign Language Generation System

Authors: Sugandhi, Parteek Kumar, Sanmeet Kaur

Abstract:

Sign Language (SL) is used by deaf and other people who cannot speak but can hear or have a problem with spoken languages due to some disability. It is a visual gesture language that makes use of either one hand or both hands, arms, face, body to convey meanings and thoughts. SL automation system is an effective way which provides an interface to communicate with normal people using a computer. In this paper, an avatar based dictionary has been proposed for text to Indian Sign Language (ISL) generation system. This research work will also depict a literature review on SL corpus available for various SL s over the years. For ISL generation system, a written form of SL is required and there are certain techniques available for writing the SL. The system uses Hamburg sign language Notation System (HamNoSys) and Signing Gesture Mark-up Language (SiGML) for ISL generation. It is developed in PHP using Web Graphics Library (WebGL) technology for 3D avatar animation. A multilingual ISL dictionary is developed using HamNoSys for both English and Hindi Language. This dictionary will be used as a database to associate signs with words or phrases of a spoken language. It provides an interface for admin panel to manage the dictionary, i.e., modification, addition, or deletion of a word. Through this interface, HamNoSys can be developed and stored in a database and these notations can be converted into its corresponding SiGML file manually. The system takes natural language input sentence in English and Hindi language and generate 3D sign animation using an avatar. SL generation systems have potential applications in many domains such as healthcare sector, media, educational institutes, commercial sectors, transportation services etc. This research work will help the researchers to understand various techniques used for writing SL and generation of Sign Language systems.

Keywords: avatar, dictionary, HamNoSys, hearing impaired, Indian sign language (ISL), sign language

Procedia PDF Downloads 197
151 Differential Approach to Technology Aided English Language Teaching: A Case Study in a Multilingual Setting

Authors: Sweta Sinha

Abstract:

Rapid evolution of technology has changed language pedagogy as well as perspectives on language use, leading to strategic changes in discourse studies. We are now firmly embedded in a time when digital technologies have become an integral part of our daily lives. This has led to generalized approaches to English Language Teaching (ELT) which has raised two-pronged concerns in linguistically diverse settings: a) the diverse linguistic background of the learner might interfere/ intervene with the learning process and b) the differential level of already acquired knowledge of target language might make the classroom practices too easy or too difficult for the target group of learners. ELT needs a more systematic and differential pedagogical approach for greater efficiency and accuracy. The present research analyses the need of identifying learner groups based on different levels of target language proficiency based on a longitudinal study done on 150 undergraduate students. The learners were divided into five groups based on their performance on a twenty point scale in Listening Speaking Reading and Writing (LSRW). The groups were then subjected to varying durations of technology aided language learning sessions and their performance was recorded again on the same scale. Identifying groups and introducing differential teaching and learning strategies led to better results compared to generalized teaching strategies. Language teaching includes different aspects: the organizational, the technological, the sociological, the psychological, the pedagogical and the linguistic. And a facilitator must account for all these aspects in a carefully devised differential approach meeting the challenge of learner diversity. Apart from the justification of the formation of differential groups the paper attempts to devise framework to account for all these aspects in order to make ELT in multilingual setting much more effective.

Keywords: differential groups, English language teaching, language pedagogy, multilingualism, technology aided language learning

Procedia PDF Downloads 372
150 Evaluation of Modern Natural Language Processing Techniques via Measuring a Company's Public Perception

Authors: Burak Oksuzoglu, Savas Yildirim, Ferhat Kutlu

Abstract:

Opinion mining (OM) is one of the natural language processing (NLP) problems to determine the polarity of opinions, mostly represented on a positive-neutral-negative axis. The data for OM is usually collected from various social media platforms. In an era where social media has considerable control over companies’ futures, it’s worth understanding social media and taking actions accordingly. OM comes to the fore here as the scale of the discussion about companies increases, and it becomes unfeasible to gauge opinion on individual levels. Thus, the companies opt to automize this process by applying machine learning (ML) approaches to their data. For the last two decades, OM or sentiment analysis (SA) has been mainly performed by applying ML classification algorithms such as support vector machines (SVM) and Naïve Bayes to a bag of n-gram representations of textual data. With the advent of deep learning and its apparent success in NLP, traditional methods have become obsolete. Transfer learning paradigm that has been commonly used in computer vision (CV) problems started to shape NLP approaches and language models (LM) lately. This gave a sudden rise to the usage of the pretrained language model (PTM), which contains language representations that are obtained by training it on the large datasets using self-supervised learning objectives. The PTMs are further fine-tuned by a specialized downstream task dataset to produce efficient models for various NLP tasks such as OM, NER (Named-Entity Recognition), Question Answering (QA), and so forth. In this study, the traditional and modern NLP approaches have been evaluated for OM by using a sizable corpus belonging to a large private company containing about 76,000 comments in Turkish: SVM with a bag of n-grams, and two chosen pre-trained models, multilingual universal sentence encoder (MUSE) and bidirectional encoder representations from transformers (BERT). The MUSE model is a multilingual model that supports 16 languages, including Turkish, and it is based on convolutional neural networks. The BERT is a monolingual model in our case and transformers-based neural networks. It uses a masked language model and next sentence prediction tasks that allow the bidirectional training of the transformers. During the training phase of the architecture, pre-processing operations such as morphological parsing, stemming, and spelling correction was not used since the experiments showed that their contribution to the model performance was found insignificant even though Turkish is a highly agglutinative and inflective language. The results show that usage of deep learning methods with pre-trained models and fine-tuning achieve about 11% improvement over SVM for OM. The BERT model achieved around 94% prediction accuracy while the MUSE model achieved around 88% and SVM did around 83%. The MUSE multilingual model shows better results than SVM, but it still performs worse than the monolingual BERT model.

Keywords: BERT, MUSE, opinion mining, pretrained language model, SVM, Turkish

Procedia PDF Downloads 112