Search results for: spoken corpus
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 592

Search results for: spoken corpus

532 Neologisms and Word-Formation Processes in Board Game Rulebook Corpus: Preliminary Results

Authors: Athanasios Karasimos, Vasiliki Makri

Abstract:

This research focuses on the design and development of the first text Corpus based on Board Game Rulebooks (BGRC) with direct application on the morphological analysis of neologisms and tendencies in word-formation processes. Corpus linguistics is a dynamic field that examines language through the lens of vast collections of texts. These corpora consist of diverse written and spoken materials, ranging from literature and newspapers to transcripts of everyday conversations. By morphologically analyzing these extensive datasets, morphologists can gain valuable insights into how language functions and evolves, as these extensive datasets can reflect the byproducts of inflection, derivation, blending, clipping, compounding, and neology. This entails scrutinizing how words are created, modified, and combined to convey meaning in a corpus of challenging, creative, and straightforward texts that include rules, examples, tutorials, and tips. Board games teach players how to strategize, consider alternatives, and think flexibly, which are critical elements in language learning. Their rulebooks reflect not only their weight (complexity) but also the language properties of each genre and subgenre of these games. Board games are a captivating realm where strategy, competition, and creativity converge. Beyond the excitement of gameplay, board games also spark the art of word creation. Word games, like Scrabble, Codenames, Bananagrams, Wordcraft, Alice in the Wordland, Once uUpona Time, challenge players to construct words from a pool of letters, thus encouraging linguistic ingenuity and vocabulary expansion. These games foster a love for language, motivating players to unearth obscure words and devise clever combinations. On the other hand, the designers and creators produce rulebooks, where they include their joy of discovering the hidden potential of language, igniting the imagination, and playing with the beauty of words, making these games a delightful fusion of linguistic exploration and leisurely amusement. In this research, more than 150 rulebooks in English from all types of modern board games, either language-independent or language-dependent, are used to create the BGRC. A representative sample of each genre (family, party, worker placement, deckbuilding, dice, and chance games, strategy, eurogames, thematic, role-playing, among others) was selected based on the score from BoardGameGeek, the size of the texts and the level of complexity (weight) of the game. A morphological model with morphological networks, multi-word expressions, and word-creation mechanics based on the complexity of the textual structure, difficulty, and board game category will be presented. In enabling the identification of patterns, trends, and variations in word formation and other morphological processes, this research aspires to make avail of this creative yet strict text genre so as to (a) give invaluable insight into morphological creativity and innovation that (re)shape the lexicon of the English language and (b) test morphological theories. Overall, it is shown that corpus linguistics empowers us to explore the intricate tapestry of language, and morphology in particular, revealing its richness, flexibility, and adaptability in the ever-evolving landscape of human expression.

Keywords: board game rulebooks, corpus design, morphological innovations, neologisms, word-formation processes

Procedia PDF Downloads 65
531 Verb Bias in Mandarin: The Corpus Based Study of Children

Authors: Jou-An Chung

Abstract:

The purpose of this study is to investigate the verb bias of the Mandarin verbs in children’s reading materials and provide the criteria for categorization. Verb bias varies cross-linguistically. As Mandarin and English are typological different, this study hopes to shed light on Mandarin verb bias with the use of corpus and provide thorough and detailed criteria for analysis. Moreover, this study focuses on children’s reading materials since it is a significant issue in understanding children’s sentence processing. Therefore, investigating verb bias of Mandarin verbs in children’s reading materials is also an important issue and can provide further insights into children’s sentence processing. The small corpus is built up for this study. The corpus consists of the collection of school textbooks and Mandarin Daily News for children. The files are then segmented and POS tagged by JiebaR (Chinese segmentation with R). For the ease of analysis, the one-word character verbs and intransitive verbs are excluded beforehand. The total of 20 high frequency verbs are hand-coded and are further categorized into one of the three types, namely DO type, SC type and other category. If the frequency of taking Other Type exceeds the threshold of 25%, the verb is excluded from the study. The results show that 10 verbs are direct object bias verbs, and six verbs are sentential complement bias verbs. The paired T-test was done to assure the statistical significance (p = 0.0001062 for DO bias verb, p=0.001149 for SC bias verb). The result has shown that in children’s reading materials, the DO biased verbs are used more than the SC bias verbs since the simplest structure of sentences is easier for children’s sentence comprehension or processing. In sum, this study not only discussed verb bias in child's reading materials but also provided basic coding criteria for verb bias analysis in Mandarin and underscored the role of context. Sentences are easier for children’s sentence comprehension or processing. In sum, this study not only discussed verb bias in child corpus, but also provided basic coding criteria for verb bias analysis in Mandarin and underscored the role of context.

Keywords: corpus linguistics, verb bias, child language, psycholinguistics

Procedia PDF Downloads 261
530 Exploring the Effectiveness and Challenges of Implementing Self-Regulated Learning to Improve Spoken English

Authors: Md. Shaiful Islam, Mahani Bt. Stapa

Abstract:

To help learners overcome their struggle in developing proficiency in spoken English, self-regulated learning strategies seem to be promising. Students in the private universities in Bangladesh are expected to communicate with the teachers, peers, and staff members in English, but most of them suffer from their inadequate oral communicative competence in English. To address this problem, the researchers adopted a qualitative research approach to answer the research questions. They employed the learner diary method to collect data from the first-semester undergraduate students of a reputed private university in Bangladesh who were involved in writing weekly diaries about their use of self-regulated learning strategies to improve speaking in an English speaking course. The learners were provided with prompts for writing the diaries. The thematic analysis method was applied to analyze the entries of the diaries for the identification of themes. Seven strategies related to the effectiveness of SRL for the improvement of spoken English were identified from the data, and they include goal-setting, strategic planning, identifying the sources of self-motivation, help-seeking, environmental restructuring, self-monitoring, and self-evaluation. However, the students reported in their diaries that they faced challenges that impeded their SRL strategy use. Five challenges were identified, and they entail the complex nature of SRL, lack of literacy on SRL, teachers’ preference for controlling the class, learners’ past habit of learning, and students’ addiction to gadgets. The implications the study addresses include revising the syllabus and curriculum, facilitating SRL training for students and teachers, and integrating SRL in the lessons.

Keywords: private university in Bangladesh, proficiency, self-regulated learning, spoken English

Procedia PDF Downloads 143
529 The Effects of Incompetence in the Use of Mother Tongue on the Spoken English of Selected Primary School Pupils in Abeokuta South Local Government Ogun State, Nigeria

Authors: K. G. Adeosun, K. Osunaiye, E. C. Chinaguh, M. A. Aliyu, C. A. Onifade

Abstract:

This study examined the effects of incompetence in the use of the mother tongue on the spoken English of selected Primary School pupils in Abeokuta South Local Government, Ogun State, Nigeria. The study used a structured questionnaire and interview guide as data collection instruments. The target population was 110 respondents. The sample was obtained by the use of simple random and stratified sampling techniques. The study samples were pupils from Government Primary Schools in Abeokuta South Local Government. The result revealed that the majority of pupils exhibited mother tongue interference in their oral production stage and that the local indigenous languages interfered with the pronunciation of English words to a large extent such that they pronounced ‘people’ as ‘fitful.’ The findings also revealed that there is no significant difference between inadequate teaching materials, shortage of funds towards the promotion of the mother tongue (Yoruba) and spoken English of Primary school pupils in the study area. The study recommended, among other things, that government should provide the necessary support for schools in the areas of teaching and learning materials, funds and other related materials that can enhance the effective use of the mother tongue towards spoken English by Primary School pupils. Government should ensure that oral English is taught to the pupils and the examination at the end of Primary school education should be made compulsory for all pupils. More so, the Government should provide language laboratories and other equipment to facilitate good teaching and learning of oral English.

Keywords: education, effective, government, learning, teaching

Procedia PDF Downloads 59
528 Designing a Corpus Database to Enhance the Learning of Old English Language

Authors: Raquel Mateo Mendaza, Carmen Novo Urraca

Abstract:

The current paper presents the elaboration of a corpus database that aligns two different corpora in order to simplify the search of information both for researchers and students of Old English. This database comprises the information contained in two main reference corpora, namely the Dictionary of Old English Corpus (DOEC), compiled at the University of Toronto, and the York-Toronto-Helsinki Parsed Corpus of Old English (YCOE). The first one provides information on all surviving texts written in the Old English language. The latter offers the syntactical and morphological annotation of several texts included in the DOEC. Although both corpora are closely related, as the YCOE includes the DOE source text identifier, the main problem detected is that there is not an alignment of texts that allows for the search of whole fragments to be further analysed in terms of morphology and syntax. The database proposed in this paper gathers all this information and presents it in a simple, more accessible, visual, and educational way. The alignment of fragments has been done in an automatized way. However, some problems have emerged during the creating process particularly related to the lack of correspondence in the division of fragments. For this reason, it has been necessary to revise the whole entries manually to obtain a truthful high-quality product and to carefully indicate the gaps encountered in these corpora. All in all, this database contains more than 60,000 entries corresponding with the DOE fragments annotated by the YCOE. The main strength of the resulting product is its research and teaching implications in the study of Old English. The use of this database will help researchers and students in the study of different aspects of the language, such as inflectional morphology, syntactic behaviour of given words, or translation studies, among others. By means of the search of words or fragments, the annotated information on morphology and syntax will be automatically displayed, automatizing, and speeding up the search of data.

Keywords: alignment, corpus database, morphosyntactic analysis, Old English

Procedia PDF Downloads 107
527 Exploring Reading into Writing: A Corpus-Based Analysis of Postgraduate Students’ Literature Review Essays

Authors: Tanzeela Anbreen, Ammara Maqsood

Abstract:

Reading into writing is one of university students' most required academic skills. The current study explored postgraduate university students’ writing quality using a corpus-based approach. Twelve postgraduate students’ literature review essays were chosen for the corpus-based analysis. These essays were chosen because students had to incorporate multiple reading sources in these essays, which was a new writing exercise for them. The students were provided feedback at least two times which comprised of the written comments by the tutor highlighting the areas of improvement and also by using the ‘track changes’ function. This exercise was repeated two times, and students submitted two drafts. This investigation included only the finally submitted work of the students. A corpus-based approach was adopted to analyse the essays because it promotes autonomous discovery and personalised learning. The aim of this analysis was to understand the existing level of students’ writing before the start of their postgraduate thesis. Text Inspector was used to analyse the quality of essays. With the help of the Text Inspector tool, the vocabulary used in the essays was compared to the English Vocabulary Profile (EVP), which describes what learners know and can do at each Common European Framework of Reference (CEFR) level. Writing quality was also measured for the Flesch reading ease score, which is a standard to describe the ease of understanding the writing content. The results reflected that students found writing essays using multiple sources challenging. In most essays, the vocabulary level achieved was between B1-B2 of the CEFL level. The study recommends that students need extensive training in developing academic writing skills, particularly in writing the literature review type assignment, which requires multiple sources citations.

Keywords: literature review essays, postgraduate students, corpus-based analysis, vocabulary proficiency

Procedia PDF Downloads 44
526 The Effect of Speech-Shaped Noise and Speaker’s Voice Quality on First-Grade Children’s Speech Perception and Listening Comprehension

Authors: I. Schiller, D. Morsomme, A. Remacle

Abstract:

Children’s ability to process spoken language develops until the late teenage years. At school, where efficient spoken language processing is key to academic achievement, listening conditions are often unfavorable. High background noise and poor teacher’s voice represent typical sources of interference. It can be assumed that these factors particularly affect primary school children, because their language and literacy skills are still low. While it is generally accepted that background noise and impaired voice impede spoken language processing, there is an increasing need for analyzing impacts within specific linguistic areas. Against this background, the aim of the study was to investigate the effect of speech-shaped noise and imitated dysphonic voice on first-grade primary school children’s speech perception and sentence comprehension. Via headphones, 5 to 6-year-old children, recruited within the French-speaking community of Belgium, listened to and performed a minimal-pair discrimination task and a sentence-picture matching task. Stimuli were randomly presented according to four experimental conditions: (1) normal voice / no noise, (2) normal voice / noise, (3) impaired voice / no noise, and (4) impaired voice / noise. The primary outcome measure was task score. How did performance vary with respect to listening condition? Preliminary results will be presented with respect to speech perception and sentence comprehension and carefully interpreted in the light of past findings. This study helps to support our understanding of children’s language processing skills under adverse conditions. Results shall serve as a starting point for probing new measures to optimize children’s learning environment.

Keywords: impaired voice, sentence comprehension, speech perception, speech-shaped noise, spoken language processing

Procedia PDF Downloads 168
525 Conceptual Metaphors of Responsibility in Arabic to English Translation of Political Speeches: A Corpus-Based Study

Authors: Amr Anany

Abstract:

This study offers a corpus-based analysis of the conceptual metaphors of RESPONSIBILITY inherent in the Arabic political speeches of King Abdulla II and their English translations rendered by the translators of the Royal Hashemite Court ("RHC translators"). In view of the Conceptual Metaphor Theory (CMT), the current study aims to uncover the extent to which the dominant ideology in the source Arabic speeches of King Abdulla II is conveyed into the target English translation. The study explores a bilingual corpus, including eleven authentic Arabic speeches delivered by King Abdulla II and their English translations. The study finds that both Arabic and English share several metaphorical expressions of RESPONSIBILITY that are based on bodily experience such as RESPONSIBILITY IS UP, RESPONSIBILITY IS AN OBJECT, and RESPONSIBILITY IS AN HONOR. Apparently, the study concludes that RHC translators succeed to convey the dominant ideology from the source Arabic speeches to the English ones using specific translation strategies.

Keywords: cognitive linguistics, CDA, conceptual metaphor theory, ideology, responsibility

Procedia PDF Downloads 46
524 Information and Communication Technology (ICT) and Yoruba Language Teaching

Authors: Ayoola Idowu Olasebikan

Abstract:

The global community has become increasingly dependent on various kinds of technologies out of which Information and Communication Technologies (ICTs) appear to be the most prominent. ICTs have become multipurpose tools which have had a revolutionary impact on how we see the world and how we live in it. Yoruba is the most widely spoken African language outside Africa but it remains one of the badly spoken language in the world as a result of its outdated teaching method in the African schools which prevented its standard version from being spoken and written. This paper conducts a critical review of the traditional methods of teaching Yoruba language. It then examines the possibility of leveraging on ICTs for improved methods of teaching Yoruba language to achieve global standard and spread. It identified key ICT platforms that can be deployed for the teaching of Yoruba language and the constraints facing each of them. The paper concludes that Information and Communication Technologies appear to provide veritable opportunity for paradigm shift in the methods of teaching Yoruba Language. It also opines that Yoruba language has the potential to transform economic fortune of Africa for sustainable development provided its teaching is taken beyond the brick and mortar classroom to the virtual classroom/global information super highway called internet or any other ICTs medium. It recommends that students and teachers of Yoruba language should be encouraged to acquire basic skills in computer and internet technology in order to enhance their ability to develop and retrieve electronic Yoruba language teaching materials.

Keywords: Africa, ICT, teaching method, Yoruba language

Procedia PDF Downloads 324
523 Investigating (Im)Politeness Strategies in Email Communication: The Case Algerian PhD Supervisees and Irish Supervisors

Authors: Zehor Ktitni

Abstract:

In pragmatics, politeness is regarded as a feature of paramount importance to successful interpersonal relationships. On the other hand, emails have recently become one of the indispensable means of communication in educational settings. This research puts email communication at the core of the study and analyses it from a politeness perspective. More specifically, it endeavours to look closely at how the concept of (im)politeness is reflected through students’ emails. To this end, a corpus of Algerian supervisees’ email threads, exchanged with their Irish supervisors, was compiled. Leech’s model of politeness (2014) was selected as the main theoretical framework of this study, in addition to making reference to Brown and Levinson’s model (1987) as it is one of the most influential models in the area of pragmatic politeness. Further, some follow-up interviews are to be conducted with Algerian students to reinforce the results derived from the corpus. Initial findings suggest that Algerian Ph.D. students’ emails tend to include more politeness markers than impoliteness ones, they heavily make use of academic titles when addressing their supervisors (Dr. or Prof.), and they rely on hedging devices in order to sound polite.

Keywords: politeness, email communication, corpus pragmatics, Algerian PhD supervisees, Irish supervisors

Procedia PDF Downloads 44
522 Introducing Data-Driven Learning into Chinese Higher Education English for Academic Purposes Writing Instructional Settings

Authors: Jingwen Ou

Abstract:

Writing for academic purposes in a second or foreign language is one of the most important and the most demanding skills to be mastered by non-native speakers. Traditionally, the EAP writing instruction at the tertiary level encompasses the teaching of academic genre knowledge, more specifically, the disciplinary writing conventions, the rhetorical functions, and specific linguistic features. However, one of the main sources of challenges in English academic writing for L2 students at the tertiary level can still be found in proficiency in academic discourse, especially vocabulary, academic register, and organization. Data-Driven Learning (DDL) is defined as “a pedagogical approach featuring direct learner engagement with corpus data”. In the past two decades, the rising popularity of the application of the data-driven learning (DDL) approach in the field of EAP writing teaching has been noticed. Such a combination has not only transformed traditional pedagogy aided by published DDL guidebooks in classroom use but also triggered global research on corpus use in EAP classrooms. This study endeavors to delineate a systematic review of research in the intersection of DDL and EAP writing instruction by conducting a systematic literature review on both indirect and direct DDL practice in EAP writing instructional settings in China. Furthermore, the review provides a synthesis of significant discoveries emanating from prior research investigations concerning Chinese university students’ perception of Data-Driven Learning (DDL) and the subsequent impact on their academic writing performance following corpus-based training. Research papers were selected from Scopus-indexed journals and core journals from two main Chinese academic databases (CNKI and Wanfang) published in both English and Chinese over the last ten years based on keyword searches. Results indicated an insufficiency of empirical DDL research despite a noticeable upward trend in corpus research on discourse analysis and indirect corpus applications for material design by language teachers. Research on the direct use of corpora and corpus tools in DDL, particularly in combination with genre-based EAP teaching, remains a relatively small fraction of the whole body of research in Chinese higher education settings. Such scarcity is highly related to the prevailing absence of systematic training in English academic writing registers within most Chinese universities' EAP syllabi due to the Chinese English Medium Instruction policy, where only English major students are mandated to submit English dissertations. Findings also revealed that Chinese learners still held mixed attitudes towards corpus tools influenced by learner differences, limited access to language corpora, and insufficient pre-training on corpus theoretical concepts, despite their improvements in final academic writing performance.

Keywords: corpus linguistics, data-driven learning, EAP, tertiary education in China

Procedia PDF Downloads 29
521 Historical Development of Negative Emotive Intensifiers in Hungarian

Authors: Martina Katalin Szabó, Bernadett Lipóczi, Csenge Guba, István Uveges

Abstract:

In this study, an exhaustive analysis was carried out about the historical development of negative emotive intensifiers in the Hungarian language via NLP methods. Intensifiers are linguistic elements which modify or reinforce a variable character in the lexical unit they apply to. Therefore, intensifiers appear with other lexical items, such as adverbs, adjectives, verbs, infrequently with nouns. Due to the complexity of this phenomenon (set of sociolinguistic, semantic, and historical aspects), there are many lexical items which can operate as intensifiers. The group of intensifiers are admittedly one of the most rapidly changing elements in the language. From a linguistic point of view, particularly interesting are a special group of intensifiers, the so-called negative emotive intensifiers, that, on their own, without context, have semantic content that can be associated with negative emotion, but in particular cases, they may function as intensifiers (e.g.borzasztóanjó ’awfully good’, which means ’excellent’). Despite their special semantic features, negative emotive intensifiers are scarcely examined in literature based on large Historical corpora via NLP methods. In order to become better acquainted with trends over time concerning the intensifiers, The exhaustively analysed a specific historical corpus, namely the Magyar TörténetiSzövegtár (Hungarian Historical Corpus). This corpus (containing 3 millions text words) is a collection of texts of various genres and styles, produced between 1772 and 2010. Since the corpus consists of raw texts and does not contain any additional information about the language features of the data (such as stemming or morphological analysis), a large amount of manual work was required to process the data. Thus, based on a lexicon of negative emotive intensifiers compiled in a previous phase of the research, every occurrence of each intensifier was queried, and the results were stored in a separate data frame. Then, basic linguistic processing (POS-tagging, lemmatization etc.) was carried out automatically with the ‘magyarlanc’ NLP-toolkit. Finally, the frequency and collocation features of all the negative emotive words were automatically analyzed in the corpus. Outcomes of the research revealed in detail how these words have proceeded through grammaticalization over time, i.e., they change from lexical elements to grammatical ones, and they slowly go through a delexicalization process (their negative content diminishes over time). What is more, it was also pointed out which negative emotive intensifiers are at the same stage in this process in the same time period. Giving a closer look to the different domains of the analysed corpus, it also became certain that during this process, the pragmatic role’s importance increases: the newer use expresses the speaker's subjective, evaluative opinion at a certain level.

Keywords: historical corpus analysis, historical linguistics, negative emotive intensifiers, semantic changes over time

Procedia PDF Downloads 208
520 Towards a Large Scale Deep Semantically Analyzed Corpus for Arabic: Annotation and Evaluation

Authors: S. Alansary, M. Nagi

Abstract:

This paper presents an approach of conducting semantic annotation of Arabic corpus using the Universal Networking Language (UNL) framework. UNL is intended to be a promising strategy for providing a large collection of semantically annotated texts with formal, deep semantics rather than shallow. The result would constitute a semantic resource (semantic graphs) that is editable and that integrates various phenomena, including predicate-argument structure, scope, tense, thematic roles and rhetorical relations, into a single semantic formalism for knowledge representation. The paper will also present the Interactive Analysis​ tool for automatic semantic annotation (IAN). In addition, the cornerstone of the proposed methodology which are the disambiguation and transformation rules, will be presented. Semantic annotation using UNL has been applied to a corpus of 20,000 Arabic sentences representing the most frequent structures in the Arabic Wikipedia. The representation, at different linguistic levels was illustrated starting from the morphological level passing through the syntactic level till the semantic representation is reached. The output has been evaluated using the F-measure. It is 90% accurate. This demonstrates how powerful the formal environment is, as it enables intelligent text processing and search.

Keywords: semantic analysis, semantic annotation, Arabic, universal networking language

Procedia PDF Downloads 566
519 Developing an Intonation Labeled Dataset for Hindi

Authors: Esha Banerjee, Atul Kumar Ojha, Girish Nath Jha

Abstract:

This study aims to develop an intonation labeled database for Hindi. Although no single standard for prosody labeling exists in Hindi, researchers in the past have employed perceptual and statistical methods in literature to draw inferences about the behavior of prosody patterns in Hindi. Based on such existing research and largely agreed upon intonational theories in Hindi, this study attempts to develop a manually annotated prosodic corpus of Hindi speech data, which can be used for training speech models for natural-sounding speech in the future. 100 sentences ( 500 words) each for declarative and interrogative types have been labeled using Praat.

Keywords: speech dataset, Hindi, intonation, labeled corpus

Procedia PDF Downloads 168
518 A Corpus-Based Study of Subtitling Religious Words into Arabic

Authors: Yousef Sahari, Eisa Asiri

Abstract:

Hollywood films are produced in an open and liberal context, and when subtitling for a more conservative and closed society such as an Arabic society, religious words can pose a thorny challenge for subtitlers. Using a corpus of 90 Hollywood films released between 2000 and 2018 and applying insights from Descriptive Translation Studies (Toury, 1995, 2012) and the dichotomy of domestication and foreignization, this paper investigates three main research questions: (1) What are the dominant religious terms and functions in the English subtitles? (2) What are the dominant translation strategies used in the translation of religious words? (3) Do these strategies tend to be SL-oriented or TL-oriented (domesticating or foreignising)? To answer the research questions above, a quantitative and qualitative analysis of the corpus is conducted, in which the researcher adopts a self-designed, parallel, aligned corpus of ninety films and their Arabic subtitles. A quantitative analysis is performed to compare the frequencies and distribution of religious words, their functions, and the translation strategies employed by the subtitlers of ninety films, with the aim of identifying similarities or differences in addition to identifying the impact of functions of religious terms on the use of subtitling strategies. Based on the quantitative analysis, a qualitative analysis is performed to identify any translational patterns in Arabic translations of religious words and the possible reasons for subtitlers’ choices. The results show that the function of religious words has a strong influence on the choice of subtitling strategies. Also, it is found that foreignization strategies are applied in about two-thirds of the total occurrences of religious words.

Keywords: religious terms, subtitling, audiovisual translation, modern standard arabic, subtitling strategies, english-arabic subtitling

Procedia PDF Downloads 135
517 The Acoustic Features of Ulu Terengganu Malay Monophthongs

Authors: Siti Nadiah Nuwawi, Roshidah Hassan

Abstract:

Dialect is one of the language variants emerge due to certain factors. One of the distinctive dialects spoken by people in Malaysia is the one spoken by those who reside in the inland area of the East Peninsular Malaysia; Hulu Terengganu, which is known as Ulu Terengganu Malay dialect. This dialect is unique since it possesses ancient elements in its phonology elements, which makes it is hard to be understood by people who come from other states. There is dearth of acoustic studies of the dialect in which this paper aims to attain by describing the quality of the monophthongs found in the dialect instrumentally based on their first and second formant values. The hertz values are observed and recorded from the waveforms and spectrograms depicted in PRAAT version 6.0.43 software. The findings show that Ulu Terengganu Malay speakers produced ten monophthongs namely /ɛ/, /e/, /a/, /ɐ/, /ɞ/, /ɔ/, /i/, /o/, /ɵ/ and /ɘ/ which applauds a few monophthongs suggested by past researchers which were based on auditory impression namely /ɛ/, /e/, /a/, ɔ/, and /i/. It also discovers the other five monophthongs of the dialect which are unknown before namely /ɐ/, /ɞ/, /o/, /ɵ/ and /ɘ/.

Keywords: acoustic analysis, dialect, formant values, monophthongs, Ulu Terengganu Malay

Procedia PDF Downloads 151
516 A Comparison of the First Language Vocabulary Used by Indonesian Year 4 Students and the Vocabulary Taught to Them in English Language Textbooks

Authors: Fitria Ningsih

Abstract:

This study concerns on the process of making corpus obtained from Indonesian year 4 students’ free writing compared to the vocabulary taught in English language textbooks. 369 students’ sample writings from 19 public elementary schools in Malang, East Java, Indonesia and 5 selected English textbooks were analyzed through corpus in linguistics method using AdTAT -the Adelaide Text Analysis Tool- program. The findings produced wordlists of the top 100 words most frequently used by students and the top 100 words given in English textbooks. There was a 45% match between the two lists. Furthermore, the classifications of the top 100 most frequent words from the two corpora based on part of speech found that both the Indonesian and English languages employed a similar use of nouns, verbs, adjectives, and prepositions. Moreover, to see the contextualizing the vocabulary of learning materials towards the students’ need, a depth-analysis dealing with the content and the cultural views from the vocabulary taught in the textbooks was discussed through the criteria developed from the checklist. Lastly, further suggestions are addressed to language teachers to understand the students’ background such as recognizing the basic words students acquire before teaching them new vocabulary in order to achieve successful learning of the target language.

Keywords: corpus, frequency, English, Indonesian, linguistics, textbooks, vocabulary, wordlists, writing

Procedia PDF Downloads 163
515 Using Corpora in Semantic Studies of English Adjectives

Authors: Oxana Lukoshus

Abstract:

The methods of corpus linguistics, a well-established field of research, are being increasingly applied in cognitive linguistics. Corpora data are especially useful for different quantitative studies of grammatical and other aspects of language. The main objective of this paper is to demonstrate how present-day corpora can be applied in semantic studies in general and in semantic studies of adjectives in particular. Polysemantic adjectives have been the subject of numerous studies. But most of them have been carried out on dictionaries. Undoubtedly, dictionaries are viewed as one of the basic data sources, but only at the initial steps of a research. The author usually starts with the analysis of the lexicographic data after which s/he comes up with a hypothesis. In the research conducted three polysemantic synonyms true, loyal, faithful have been analyzed in terms of differences and similarities in their semantic structure. A corpus-based approach in the study of the above-mentioned adjectives involves the following. After the analysis of the dictionary data there was the reference to the following corpora to study the distributional patterns of the words under study – the British National Corpus (BNC) and the Corpus of Contemporary American English (COCA). These corpora are continually updated and contain thousands of examples of the words under research which make them a useful and convenient data source. For the purpose of this study there were no special needs regarding genre, mode or time of the texts included in the corpora. Out of the range of possibilities offered by corpus-analysis software (e.g. word lists, statistics of word frequencies, etc.), the most useful tool for the semantic analysis was the extracting a list of co-occurrence for the given search words. Searching by lemmas, e.g. true, true to, and grouping the results by lemmas have proved to be the most efficient corpora feature for the adjectives under the study. Following the search process, the corpora provided a list of co-occurrences, which were then to be analyzed and classified. Not every co-occurrence was relevant for the analysis. For example, the phrases like An enormous sense of responsibility to protect the minds and hearts of the faithful from incursions by the state was perceived to be the basic duty of the church leaders or ‘True,’ said Phoebe, ‘but I'd probably get to be a Union Official immediately were left out as in the first example the faithful is a substantivized adjective and in the second example true is used alone with no other parts of speech. The subsequent analysis of the corpora data gave the grounds for the distribution groups of the adjectives under the study which were then investigated with the help of a semantic experiment. To sum it up, the corpora-based approach has proved to be a powerful, reliable and convenient tool to get the data for the further semantic study.

Keywords: corpora, corpus-based approach, polysemantic adjectives, semantic studies

Procedia PDF Downloads 296
514 The Study of Formal and Semantic Errors of Lexis by Persian EFL Learners

Authors: Mohammad J. Rezai, Fereshteh Davarpanah

Abstract:

Producing a text in a language which is not one’s mother tongue can be a demanding task for language learners. Examining lexical errors committed by EFL learners is a challenging area of investigation which can shed light on the process of second language acquisition. Despite the considerable number of investigations into grammatical errors, few studies have tackled formal and semantic errors of lexis committed by EFL learners. The current study aimed at examining Persian learners’ formal and semantic errors of lexis in English. To this end, 60 students at three different proficiency levels were asked to write on 10 different topics in 10 separate sessions. Finally, 600 essays written by Persian EFL learners were collected, acting as the corpus of the study. An error taxonomy comprising formal and semantic errors was selected to analyze the corpus. The formal category covered misselection and misformation errors, while the semantic errors were classified into lexical, collocational and lexicogrammatical categories. Each category was further classified into subcategories depending on the identified errors. The results showed that there were 2583 errors in the corpus of 9600 words, among which, 2030 formal errors and 553 semantic errors were identified. The most frequent errors in the corpus included formal error commitment (78.6%), which were more prevalent at the advanced level (42.4%). The semantic errors (21.4%) were more frequent at the low intermediate level (40.5%). Among formal errors of lexis, the highest number of errors was devoted to misformation errors (98%), while misselection errors constituted 2% of the errors. Additionally, no significant differences were observed among the three semantic error subcategories, namely collocational, lexical choice and lexicogrammatical. The results of the study can shed light on the challenges faced by EFL learners in the second language acquisition process.

Keywords: collocational errors, lexical errors, Persian EFL learners, semantic errors

Procedia PDF Downloads 119
513 Automatic Tagging and Accuracy in Assamese Text Data

Authors: Chayanika Hazarika Bordoloi

Abstract:

This paper is an attempt to work on a highly inflectional language called Assamese. This is also one of the national languages of India and very little has been achieved in terms of computational research. Building a language processing tool for a natural language is not very smooth as the standard and language representation change at various levels. This paper presents inflectional suffixes of Assamese verbs and how the statistical tools, along with linguistic features, can improve the tagging accuracy. Conditional random fields (CRF tool) was used to automatically tag and train the text data; however, accuracy was improved after linguistic featured were fed into the training data. Assamese is a highly inflectional language; hence, it is challenging to standardizing its morphology. Inflectional suffixes are used as a feature of the text data. In order to analyze the inflections of Assamese word forms, a list of suffixes is prepared. This list comprises suffixes, comprising of all possible suffixes that various categories can take is prepared. Assamese words can be classified into inflected classes (noun, pronoun, adjective and verb) and un-inflected classes (adverb and particle). The corpus used for this morphological analysis has huge tokens. The corpus is a mixed corpus and it has given satisfactory accuracy. The accuracy rate of the tagger has gradually improved with the modified training data.

Keywords: CRF, morphology, tagging, tagset

Procedia PDF Downloads 175
512 Dialect as a Means of Identification among Hausa Speakers

Authors: Hassan Sabo

Abstract:

Language is a system of conventionally spoken, manual and written symbols by human beings that members of a certain social group and participants in its culture express themselves. Communication, expression of identity and imaginative expression are among the functions of language. Dialect is a form of language, or a regional variety of language that is spoken in a particular geographical setting by a particular group of people. Hausa is one of the major languages in Africa, in terms of large number of people for whom it is the first language. Hausa is one of the western Chadic groups of languages. It constitutes one of the five or six branches of Afro-Asiatic family. The predominant Hausa speakers are in Nigeria and they live in different geographical locations which resulted to variety of dialects within the Hausa language apart of the standard Hausa language, the Hausa language has a variety of dialect that distinguish from one another by such features as phonology, grammar and vocabulary. This study intends to examine such features that serve as means of identification among Hausa speakers who are set off from others, geographically or socially.

Keywords: dialect, features, geographical location, Hausa language

Procedia PDF Downloads 171
511 The Advancements of Transformer Models in Part-of-Speech Tagging System for Low-Resource Tigrinya Language

Authors: Shamm Kidane, Ibrahim Abdella, Fitsum Gaim, Simon Mulugeta, Sirak Asmerom, Natnael Ambasager, Yoel Ghebrihiwot

Abstract:

The call for natural language processing (NLP) systems for low-resource languages has become more apparent than ever in the past few years, with the arduous challenges still present in preparing such systems. This paper presents an improved dataset version of the Nagaoka Tigrinya Corpus for Parts-of-Speech (POS) classification system in the Tigrinya language. The size of the initial Nagaoka dataset was incremented, totaling the new tagged corpus to 118K tokens, which comprised the 12 basic POS annotations used previously. The additional content was also annotated manually in a stringent manner, followed similar rules to the former dataset and was formatted in CONLL format. The system made use of the novel approach in NLP tasks and use of the monolingually pre-trained TiELECTRA, TiBERT and TiRoBERTa transformer models. The highest achieved score is an impressive weighted F1-score of 94.2%, which surpassed the previous systems by a significant measure. The system will prove useful in the progress of NLP-related tasks for Tigrinya and similarly related low-resource languages with room for cross-referencing higher-resource languages.

Keywords: Tigrinya POS corpus, TiBERT, TiRoBERTa, conditional random fields

Procedia PDF Downloads 66
510 Metaphors Investigation between President Xi Jinping of China and Trump of Us on the Corpus-Based Approach

Authors: Jie Zheng, Ruifeng Luo

Abstract:

The United States is the world’s most developed economy with the strongest military power. China is the fastest growing country with growing comprehensive strength and its economic strength is second only to the US. However, the conflict between them is getting serious in recent years. President’s address is the representative of a nation’s ideology. The paper has built up a small sized corpus of President Xi Jinping and Trump’s speech in Davos to investigate their respective use and types of metaphors and calculate the respective percentage of each type of metaphor. The result shows President Xi Jinping employs more metaphors than Trump. The metaphors of Xi includes “building” metaphor, “plant” metaphor, “journey” metaphor, “ship” metaphor, “traffic” metaphor, “nation is a person” metaphor, “show” metaphor, etc while Trump’s comprises “war” metaphor, “building” metaphor, “journey” metaphor, “traffic” metaphor, “tax” metaphor, “book” metaphor, etc. After investigating metaphor use differences, the paper makes an analysis of the underlying ideology between the two nations. China is willing to strengthen ties with all the countries all over the world and has built a platform of development for them and itself to go to the destination of social well being while the US pays much concern to itself, emphasizing its first leading position and is also willing to help its alliances to development. The paper’s comparison of the ideology difference between the two countries will help them get a better understanding and reduce the conflict to some extent.

Keywords: metaphor; corpus; ideology; conflict

Procedia PDF Downloads 129
509 Mouthing Patterns in Indian Sign Language

Authors: Neha Kulshreshtha

Abstract:

This paper examines the patterns of 'Mouthing', a non-manual marker, and its distribution in Indian Sign Language (ISL). Linguistic research in Indian Sign Language is an emerging field where much is needed to be done. The little research which has happened focuses on the structure of ISL in terms of physical or manual markers, therefore a study of mouthing patterns would give an insight into the distribution of this particular non-manual marker. Data has been collected with the help of native ISL users through various techniques in which natural signs can be captured, for example, storytelling, informal conversations etc. The aim of the study is to find out the various situations where mouthing is used. Sometimes, the mouthing is not actually the articulation of the word as spoken in the local languages. The paper aims to find out whether the mouthing patterns in ISL are influenced by any local language or they are independent of any influence from the local language or both. Mouthing patterns have been studied in many sign languages and an investigation into ISL will reveal whether it falls in pattern with the other sign languages.

Keywords: Indian sign language, mouthing, non-manual marker, spoken language influence

Procedia PDF Downloads 230
508 On the Semantics and Pragmatics of 'Be Able To': Modality and Actualisation

Authors: Benoît Leclercq, Ilse Depraetere

Abstract:

The goal of this presentation is to shed new light on the semantics and pragmatics of be able to. It presents the results of a corpus analysis based on data from the BNC (British National Corpus), and discusses these results in light of a specific stance on the semantics-pragmatics interface taking into account recent developments. Be able to is often discussed in relation to can and could, all of which can be used to express ability. Such an onomasiological approach often results in the identification of usage constraints for each expression. In the case of be able to, it is the formal properties of the modal expression (unlike can and could, be able to has non-finite forms) that are in the foreground, and the modal expression is described as the verb that conveys future ability. Be able to is also argued to expressed actualised ability in the past (I was able/could to open the door). This presentation aims to provide a more accurate pragmatic-semantic profile of be able to, based on extensive data analysis and one that is embedded in a very explicit view on the semantics-pragmatics interface. A random sample of 3000 examples (1000 for each modal verb) extracted from the BNC was analysed to account for the following issues. First, the challenge is to identify the exact semantic range of be able to. The results show that, contrary to general assumption, be able to does not only express ability but it shares most of the root meanings usually associated with the possibility modals can and could. The data reveal that what is called opportunity is, in fact, the most frequent meaning of be able to. Second, attention will be given to the notion of actualisation. It is commonly argued that be able to is the preferred form when the residue actualises: (1) The only reason he was able to do that was because of the restriction (BNC, spoken) (2) It is only through my imaginative shuffling of the aces that we are able to stay ahead of the pack. (BNC, written) Although this notion has been studied in detail within formal semantic approaches, empirical data is crucially lacking and it is unclear whether actualisation constitutes a conventional (and distinguishing) property of be able to. The empirical analysis provides solid evidence that actualisation is indeed a conventional feature of the modal. Furthermore, the dataset reveals that be able to expresses actualised 'opportunities' and not actualised 'abilities'. In the final part of this paper, attention will be given to the theoretical implications of the empirical findings, and in particular to the following paradox: how can the same expression encode both modal meaning (non-factual) and actualisation (factual)? It will be argued that this largely depends on one's conception of the semantics-pragmatics interface, and that this need not be an issue when actualisation (unlike modality) is analysed as a generalised conversational implicature and thus is considered part of the conventional pragmatic layer of be able to.

Keywords: Actualisation, Modality, Pragmatics, Semantics

Procedia PDF Downloads 105
507 Translating the Gendered Discourse: A Corpus-Based Study of the Chinese Science Fiction The Three Body Problem

Authors: Yi Gu

Abstract:

The Three-Body Problem by Cixin Liu has been a bestseller Chinese Sci-Fi novel for years since 2008. The book was translated into English by Ken Liu in 2014 and won the prestigious 2015 science fiction and fantasy writing Hugo Award, drawing greater attention from wider international communities. The story exposes the horrors of the Chinese Cultural Revolution in the 1960s, in an intriguing narrative for readers at home and abroad. However, without the access to the source text, western readers may not be aware that the original Chinese version of the book is rich in gender-bias. Some Chinese scholars have applied feminist translation theories to their analysis on this book before, based on isolated selected, cherry-picking examples. Thus this paper aims to obtain a more thorough picture of how translators can cope with gender discrimination and reshape the gendered discourse from the source text, by systematically investigating the lexical and syntactic patterns in the translation of Liu’s entire book of 400 pages. The source text and the translation were downloaded into digital files, automatically aligned at paragraph level and then manually post-edited. They were then compiled into a parallel corpus of 114,629 English words and 204,145 Chinese characters using Sketch Engine. Gender-discrimination markers such as the overuse of ‘girl’ to describe an adult woman were searched in the source text, and the alignment made it possible to identify the strategies adopted by the translator to mitigate gender discrimination. The results provide a framework for translators to address gender bias. The study also shows how corpus methods can be used to further research in feminist translation and critical discourse analysis.

Keywords: corpus, discourse analysis, feminist translation, science fiction translation

Procedia PDF Downloads 234
506 American Slang: Perception and Connotations – Issues of Translation

Authors: Lison Carlier

Abstract:

The English language that is taught in school or used in media nowadays is defined as 'standard English,' although unstandardized Englishes, or 'parallel' Englishes, are practiced throughout the world. The existence of these 'parallel' Englishes has challenged standardization by imposing its own specific vocabulary or grammar. These non-standard languages tend to be regarded as inferior and, therefore, pose a problem regarding their translation. In the USA, 'slanguage', or slang, is a good example of a 'parallel' language. It consists of a particular set of vocabulary, used mostly in speech, and rarely in writing. Qualified as vulgar, often reduced to an urban language spoken by young people from lower classes, slanguage – or the language that is often first spoken between youths – is still the most common language used in the English-speaking world. Moreover, it appears that the prime meaning of 'informal' (as in an informal language) – a language that is spoken with persons the speaker knows – has been put aside and replaced in the general mind by the idea of vulgarity and non-appropriateness, when in fact informality is a sign of intimacy, not of vulgarity. When it comes to translating American slang, the main problem a translator encounters is the image and the cultural background usually associated with this 'parallel' language. Indeed, one will have, unwillingly, a predisposition to categorize a speaker of a 'parallel' language as being part of a particular group of people. The way one sees a speaker using it is paramount, and needs to be transposed into the target language. This paper will conduct an analysis of American slang – its use, perception and the image it gives of its speakers – and its translation into French, using the novel Is Everyone Hanging Out Without Me? (and other concerns) by way of example. In her autobiography/personal essay book, comedy writer, actress and author Mindy Kaling speaks with a very familiar English, including slang, which participates in the construction of her own voice and style, and enables a deeper connection with her readers.

Keywords: translation, English, slang, French

Procedia PDF Downloads 303
505 Detonalization of Punjabi: Towards a Loss of Linguistic Indigeneity

Authors: Sukhvinder Singh

Abstract:

Punjabi language is related to the languages of New Indo-Aryan group that, in turn, is related to the branch of Indo-European language family. Punjabi language covers the areas of Western part (that is in Pakistan) and Eastern part (the Punjab state, Haryana, Delhi Himachal and J&K) and abroad (particularly Canada, USA, U.K. and Arab Emirates), where it is spoken widely. Besides India and Pakistan, Punjabi is the third language spoken in Canada after English, French having more than one hundred millions speakers worldwide. It is the fourth language spoken in Canada after English, French, and Chinese. It is also being taught as second language in most of the community school of British Columbia. The total number of Punjabi speakers is more than one hundred millions including India, Pakistan and abroad. Punjabi has a long tradition of linguistic tradition. A large number of scholars have studied Punjabi at different linguistic levels. Various studies are devoted to its special phonological characteristics, especially the tone, which has now started disappearing in favour of aspiration, a rare example of a language change in progress in its reversal direction. This process of language change in progress in reversal is dealt with in this paper a change towards a loss of linguistic indigeneity. The tone being a distinctive linguistic feature of Punjabi language is getting lost due to the increasing influence of Hindi and English particularly in the speech Urban Punjabi and Punjabi settled abroad. In this paper, an attempt has been made to discuss the sociolinguistics and sociology of Punjabi language and Punjab to trace the initiation and progression of this change towards a loss of Linguistic Indigeneity.

Keywords: language change in reversal, reaspiration, detonalization, new Indo-Aryan group

Procedia PDF Downloads 156
504 Prostitution in Colonial Bengal: Autobiographical Articulations and Fictional Representations

Authors: Aparna Bandyopadhyay

Abstract:

The proposed paper will examine how prostitution produced a vast corpus of literature in colonial Bengal. This corpus included autobiographical accounts by prostitutes themselves. While the authenticity of some of these has, at times, been doubted by contemporary observers, the sheer magnitude of such narrative prose demands critical attention. Many of these autobiographical narratives focused on the prostitute’s early life within respectable society and then proceeded to delineate the transgressions and the inescapable chain of circumstances that eventually rendered her a prostitute. Significantly, these serve to corroborate the findings of official investigations regarding the circumstances that led upper-caste Hindu women in Bengal to embrace prostitution in this period. The literary corpus that dwelt on prostitution also included a vast volume of fiction penned by celebrated writers. These foregrounded a prostitute as the central protagonist, telling the life-stories of prostitutes and the circumstances that made them what they were. Novels and short stories often represented the prostitute as an affective being – an individual capable of deep emotions despite her profession. She was seldom a person who had voluntarily embraced prostitution. She was always a figure of helplessness and suffering, a woman whose desire to love and be loved transcended the carnality of her livelihood. She was an outcast, but she experienced the entire repertoire of emotions experienced by her respectable counterparts. The proposed paper will examine the trends and characteristics of the available repertoire of prostitute-oriented literature in late colonial Bengal. It will begin by focusing on the existing perspectives on the origins of prostitution in late colonial Bengal. It will proceed to discuss the literary corpus supposedly penned by prostitutes themselves and then focus on the manner in which some of the stalwarts of high literature represented the prostitute in their literary creations.

Keywords: emotions, literature, prostitution, transgression

Procedia PDF Downloads 93
503 Number Variation of the Personal Pronoun we Used by Chinese English Learners

Authors: Qiong Hu, Ming Yue

Abstract:

Language variation signals the newest usage of language community, which might become the developmental trend of that language. However, language textbooks cannot keep up with these emergent usages. Most Chinese English learners nowadays are still exposed to traditional grammar prescribed in the textbook so that some variational usages cannot be acquired. The personal pronoun we is prescribed as a plural pronoun in the textbook grammar, but its number value is more flexible in actual use. Based on the Chinese Learner English Corpus (CLEC), and with the homemade Friends corpus as reference, the present research explores the number value of the first person pronoun we used by Chinese English learners. With consideration of the subjectivity of we, this paper annotated the number value of all the wes in “we+ PCU (Perception-cognation-utterance) verbs” collocations. Results show that though exposed to traditional textbooks which prescribe the plural reference of we, there still exists some unconventional usage (singular or vague in reference) in the writings of Chinese English learners, which is less frequent than that of the native speeches. Corpus data and results from manual semantic annotation show that this could be due to the impact of formulaic sequence on the learners and the positive transfer from their native language. An improved SLA model of native language, target language and interlanguage is put forward to recognize the existence of variation in second language acquisition, which should be given more attention during teaching.

Keywords: Chinese English learners, number, PCU verbs, Personal pronoun we

Procedia PDF Downloads 333