Search results for: conversational corpora
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 167

Search results for: conversational corpora

47 Translanguaging and Cross-languages Analyses in Writing and Oral Production with Multilinguals: a Systematic Review

Authors: Maryvone Cunha de Morais, Lilian Cristine Hübner

Abstract:

Based on a translanguaging theoretical approach, which considers language not as separate entities but as an entire repertoire available to bilingual individuals, this systematic review aimed at analyzing the methods (aims, samples investigated, type of stimuli, and analyses) adopted by studies on translanguaging practices associated with written and oral tasks (separately or integrated) in bilingual education. The PRISMA criteria for systematic reviews were adopted, with the descriptors "translanguaging", "bilingual education" and/or “written and oral tasks" to search in Pubmed/Medline, Lilacs, Eric, Scopus, PsycINFO, and Web of Science databases for articles published between 2017 and 2021. 280 registers were found, and after following the inclusion/exclusion criteria, 24 articles were considered for this analysis. The results showed that translanguaging practices were investigated on four studies focused on written production analyses, ten focused on oral production analysis, whereas ten studies focused on both written and oral production analyses. The majority of the studies followed a qualitative approach, while five studies have attempted to study translanguaging with quantitative statistical measures. Several types of methods were used to investigate translanguaging practices in written and oral production, with different approaches and tools indicating that the methods are still in development. Moreover, the findings showed that students’ interactions have received significant attention, and studies have been developed not just in language classes in bilingual education, but also including diverse educational and theoretical contexts such as Content and Language Integrated Learning, task repetition, Science classes, collaborative writing, storytelling, peer feedback, Speech Act theory and collective thinking, language ideologies, conversational analysis, and discourse analyses. The studies, whether focused either on writing or oral tasks or in both, have portrayed significant research and pedagogical implications, grounded on the view of integrated languages in bi-and multilinguals.

Keywords: bilingual education, oral production, translanguaging, written production

Procedia PDF Downloads 102
46 Low-Income African-American Fathers' Gendered Relationships with Their Children: A Study Examining the Impact of Child Gender on Father-Child Interactions

Authors: M. Lim Haslip

Abstract:

This quantitative study explores the correlation between child gender and father-child interactions. The author analyzes data from videotaped interactions between African-American fathers and their boy or girl toddler to explain how African-American fathers and toddlers interact with each other and whether these interactions differ by child gender. The purpose of this study is to investigate the research question: 'How, if at all, do fathers’ speech and gestures differ when interacting with their two-year-old sons versus daughters during free play?' The objectives of this study are to describe how child gender impacts African-American fathers’ verbal communication, examine how fathers gesture and speak to their toddler by gender, and to guide interventions for low-income African-American families and their children in early language development. This study involves a sample of 41 low-income African-American fathers and their 24-month-old toddlers. The videotape data will be used to observe 10-minute father-child interactions during free play. This study uses the already transcribed and coded data provided by Dr. Meredith Rowe, who did her study on the impact of African-American fathers’ verbal input on their children’s language development. The Child Language Data Exchange System (CHILDES program), created to study conversational interactions, was used for transcription and coding of the videotape data. The findings focus on the quantity of speech, diversity of speech, complexity of speech, and the quantity of gesture to inform the vocabulary usage, number of spoken words, length of speech, and the number of object pointings observed during father-toddler interactions in a free play setting. This study will help intervention and prevention scientists understand early language development in the African-American population. It will contribute to knowledge of the role of African-American fathers’ interactions on their children’s language development. It will guide interventions for the early language development of African-American children.

Keywords: parental engagement, early language development, African-American families, quantity of speech, diversity of speech, complexity of speech and the quantity of gesture

Procedia PDF Downloads 85
45 A Novel Machine Learning Approach to Aid Agrammatism in Non-fluent Aphasia

Authors: Rohan Bhasin

Abstract:

Agrammatism in non-fluent Aphasia Cases can be defined as a language disorder wherein a patient can only use content words ( nouns, verbs and adjectives ) for communication and their speech is devoid of functional word types like conjunctions and articles, generating speech of with extremely rudimentary grammar . Past approaches involve Speech Therapy of some order with conversation analysis used to analyse pre-therapy speech patterns and qualitative changes in conversational behaviour after therapy. We describe this approach as a novel method to generate functional words (prepositions, articles, ) around content words ( nouns, verbs and adjectives ) using a combination of Natural Language Processing and Deep Learning algorithms. The applications of this approach can be used to assist communication. The approach the paper investigates is : LSTMs or Seq2Seq: A sequence2sequence approach (seq2seq) or LSTM would take in a sequence of inputs and output sequence. This approach needs a significant amount of training data, with each training data containing pairs such as (content words, complete sentence). We generate such data by starting with complete sentences from a text source, removing functional words to get just the content words. However, this approach would require a lot of training data to get a coherent input. The assumptions of this approach is that the content words received in the inputs of both text models are to be preserved, i.e, won't alter after the functional grammar is slotted in. This is a potential limit to cases of severe Agrammatism where such order might not be inherently correct. The applications of this approach can be used to assist communication mild Agrammatism in non-fluent Aphasia Cases. Thus by generating these function words around the content words, we can provide meaningful sentence options to the patient for articulate conversations. Thus our project translates the use case of generating sentences from content-specific words into an assistive technology for non-Fluent Aphasia Patients.

Keywords: aphasia, expressive aphasia, assistive algorithms, neurology, machine learning, natural language processing, language disorder, behaviour disorder, sequence to sequence, LSTM

Procedia PDF Downloads 142
44 Content and Language Integrated Instruction: An Investigation of Oral Corrective Feedback in the Chinese Immersion Classroom

Authors: Qin Yao

Abstract:

Content and language integrated instruction provides second language learners instruction in subject matter and language, and is greatly valued, particularly in the language immersion classroom where a language other than students’ first language is the vehicle for teaching school curriculum. Corrective feedback is an essential instructional technique for teachers to integrate a focus on language into their content instruction. This study aims to fill a gap in the literature on immersion—the lack of studies examining corrective feedback in Chinese immersion classrooms, by studying learning opportunities brought by oral corrective feedback in a Chinese immersion classroom. Specifically, it examines what is the distribution of different types of teacher corrective feedback and how students respond to each feedback type, as well as how the focus of the teacher-student interactional exchanges affect the effect of feedback. Two Chinese immersion teachers and their immersion classes were involved, and data were collected through classroom observations interviews. Observations document teachers’ provision of oral corrective feedback and students’ responses following the feedback in class, and interviews with teachers collected teachers’ reflective thoughts about their teaching. A primary quantitative and qualitative analysis of the data revealed that, among different types of corrective feedback, recast occurred most frequently. Metalinguistic clue and repetition were the least occurring feedback types. Clarification request lead to highest percentage of learner uptake manifested by learners’ oral production immediately following the feedback, while explicit correction came the second and recast the third. In addition, the results also showed the interactional context played a role in the effectiveness of the feedback: teachers were most likely to give feedback in conversational exchanges that focused on explicit language and content, while students were most likely to use feedback in exchanges that focused on explicit language. In conclusion, the results of this study indicate recasts are preferred by Chinese immersion teachers, confirming results of previous studies on corrective feedback in non-Chinese immersion classrooms; and clarification request and explicit language instruction elicit more target language production from students and are facilitative in their target language development, thus should not be overlooked in immersion and other content and language integrated classrooms.

Keywords: Chinese immersion, content and language integrated instruction, corrective feedback, interaction

Procedia PDF Downloads 384
43 A Case Study Comparing the Effect of Computer Assisted Task-Based Language Teaching and Computer-Assisted Form Focused Language Instruction on Language Production of Students Learning Arabic as a Foreign Language

Authors: Hanan K. Hassanein

Abstract:

Task-based language teaching (TBLT) and focus on form instruction (FFI) methods were proven to improve quality and quantity of immediate language production. However, studies that compare between the effectiveness of the language production when using TBLT versus FFI are very little with results that are not consistent. Moreover, teaching Arabic using TBLT is a new field with few research that has investigated its application inside classrooms. Furthermore, to the best knowledge of the researcher, there are no prior studies that compared teaching Arabic as a foreign language in a classroom setting using computer-assisted task-based language teaching (CATBLT) with computer-assisted form focused language instruction (CAFFI). Accordingly, the focus of this presentation is to display CATBLT and CAFFI tools when teaching Arabic as a foreign language as well as demonstrate an experimental study that aims to identify whether or not CATBLT is a more effective instruction method. The effectiveness will be determined through comparing CATBLT and CAFFI in terms of accuracy, lexical complexity, and fluency of language produced by students. The participants of the study are 20 students enrolled in two intermediate-level Arabic as a foreign language classes. The experiment will take place over the course of 7 days. Based on a study conducted by Abdurrahman Arslanyilmaz for teaching Turkish as a second language, an in-house computer assisted tool for the TBLT and another one for FFI will be designed for the experiment. The experimental group will be instructed using the in-house CATBLT tool and the control group will be taught through the in-house CAFFI tool. The data that will be analyzed are the dialogues produced by students in both the experimental and control groups when completing a task or communicating in conversational activities. The dialogues of both groups will be analyzed to understand the effect of the type of instruction (CATBLT or CAFFI) on accuracy, lexical complexity, and fluency. Thus, the study aims to demonstrate whether or not there is an instruction method that positively affects the language produced by students learning Arabic as a foreign language more than the other.

Keywords: computer assisted language teaching, foreign language teaching, form-focused instruction, task based language teaching

Procedia PDF Downloads 226
42 Distant Speech Recognition Using Laser Doppler Vibrometer

Authors: Yunbin Deng

Abstract:

Most existing applications of automatic speech recognition relies on cooperative subjects at a short distance to a microphone. Standoff speech recognition using microphone arrays can extend the subject to sensor distance somewhat, but it is still limited to only a few feet. As such, most deployed applications of standoff speech recognitions are limited to indoor use at short range. Moreover, these applications require air passway between the subject and the sensor to achieve reasonable signal to noise ratio. This study reports long range (50 feet) automatic speech recognition experiments using a Laser Doppler Vibrometer (LDV) sensor. This study shows that the LDV sensor modality can extend the speech acquisition standoff distance far beyond microphone arrays to hundreds of feet. In addition, LDV enables 'listening' through the windows for uncooperative subjects. This enables new capabilities in automatic audio and speech intelligence, surveillance, and reconnaissance (ISR) for law enforcement, homeland security and counter terrorism applications. The Polytec LDV model OFV-505 is used in this study. To investigate the impact of different vibrating materials, five parallel LDV speech corpora, each consisting of 630 speakers, are collected from the vibrations of a glass window, a metal plate, a plastic box, a wood slate, and a concrete wall. These are the common materials the application could encounter in a daily life. These data were compared with the microphone counterpart to manifest the impact of various materials on the spectrum of the LDV speech signal. State of the art deep neural network modeling approaches is used to conduct continuous speaker independent speech recognition on these LDV speech datasets. Preliminary phoneme recognition results using time-delay neural network, bi-directional long short term memory, and model fusion shows great promise of using LDV for long range speech recognition. To author’s best knowledge, this is the first time an LDV is reported for long distance speech recognition application.

Keywords: covert speech acquisition, distant speech recognition, DSR, laser Doppler vibrometer, LDV, speech intelligence surveillance and reconnaissance, ISR

Procedia PDF Downloads 153
41 Computational Linguistic Implications of Gender Bias: Machines Reflect Misogyny in Society

Authors: Irene Yi

Abstract:

Machine learning, natural language processing, and neural network models of language are becoming more and more prevalent in the fields of technology and linguistics today. Training data for machines are at best, large corpora of human literature and at worst, a reflection of the ugliness in society. Computational linguistics is a growing field dealing with such issues of data collection for technological development. Machines have been trained on millions of human books, only to find that in the course of human history, derogatory and sexist adjectives are used significantly more frequently when describing females in history and literature than when describing males. This is extremely problematic, both as training data, and as the outcome of natural language processing. As machines start to handle more responsibilities, it is crucial to ensure that they do not take with them historical sexist and misogynistic notions. This paper gathers data and algorithms from neural network models of language having to deal with syntax, semantics, sociolinguistics, and text classification. Computational analysis on such linguistic data is used to find patterns of misogyny. Results are significant in showing the existing intentional and unintentional misogynistic notions used to train machines, as well as in developing better technologies that take into account the semantics and syntax of text to be more mindful and reflect gender equality. Further, this paper deals with the idea of non-binary gender pronouns and how machines can process these pronouns correctly, given its semantic and syntactic context. This paper also delves into the implications of gendered grammar and its effect, cross-linguistically, on natural language processing. Languages such as French or Spanish not only have rigid gendered grammar rules, but also historically patriarchal societies. The progression of society comes hand in hand with not only its language, but how machines process those natural languages. These ideas are all extremely vital to the development of natural language models in technology, and they must be taken into account immediately.

Keywords: computational analysis, gendered grammar, misogynistic language, neural networks

Procedia PDF Downloads 93
40 Combining Corpus Linguistics and Critical Discourse Analysis to Study Power Relations in Hindi Newspapers

Authors: Vandana Mishra, Niladri Sekhar Dash, Jayshree Charkraborty

Abstract:

This present paper focuses on the application of corpus linguistics techniques for critical discourse analysis (CDA) of Hindi newspapers. While Corpus linguistics is the study of language as expressed in corpora (samples) of 'real world' text, CDA is an interdisciplinary approach to the study of discourse that views language as a form of social practice. CDA has mainly been studied from a qualitative perspective. However, we can say that recent studies have begun combining corpus linguistics with CDA in analyzing large volumes of text for the study of existing power relations in society. The corpus under our study is also of a sizable amount (1 million words of Hindi newspaper texts) and its analysis requires an alternative analytical procedure. So, we have combined both the quantitative approach i.e. the use of corpus techniques with CDA’s traditional qualitative analysis. In this context, we have focused on the Keyword Analysis Sorting Concordance Lines of the selected Keywords and calculating collocates of the keywords. We have made use of the Wordsmith Tool for all these analysis. The analysis starts with identifying the keywords in the political news corpus when compared with the main news corpus. The keywords are extracted from the corpus based on their keyness calculated through statistical tests like chi-squared test and log-likelihood test on the frequent words of the corpus. Some of the top occurring keywords are मोदी (Modi), भाजपा (BJP), कांग्रेस (Congress), सरकार (Government) and पार्टी (Political party). This is followed by the concordance analysis of these keywords which generates thousands of lines but we have to select few lines and examine them based on our objective. We have also calculated the collocates of the keywords based on their Mutual Information (MI) score. Both concordance and collocation help to identify lexical patterns in the political texts. Finally, all these quantitative results derived from the corpus techniques will be subjectively interpreted in accordance to the CDA’s theory to examine the ways in which political news discourse produces social and political inequality, power abuse or domination.

Keywords: critical discourse analysis, corpus linguistics, Hindi newspapers, power relations

Procedia PDF Downloads 194
39 Corpus-Based Neural Machine Translation: Empirical Study Multilingual Corpus for Machine Translation of Opaque Idioms - Cloud AutoML Platform

Authors: Khadija Refouh

Abstract:

Culture bound-expressions have been a bottleneck for Natural Language Processing (NLP) and comprehension, especially in the case of machine translation (MT). In the last decade, the field of machine translation has greatly advanced. Neural machine translation NMT has recently achieved considerable development in the quality of translation that outperformed previous traditional translation systems in many language pairs. Neural machine translation NMT is an Artificial Intelligence AI and deep neural networks applied to language processing. Despite this development, there remain some serious challenges that face neural machine translation NMT when translating culture bounded-expressions, especially for low resources language pairs such as Arabic-English and Arabic-French, which is not the case with well-established language pairs such as English-French. Machine translation of opaque idioms from English into French are likely to be more accurate than translating them from English into Arabic. For example, Google Translate Application translated the sentence “What a bad weather! It runs cats and dogs.” to “يا له من طقس سيء! تمطر القطط والكلاب” into the target language Arabic which is an inaccurate literal translation. The translation of the same sentence into the target language French was “Quel mauvais temps! Il pleut des cordes.” where Google Translate Application used the accurate French corresponding idioms. This paper aims to perform NMT experiments towards better translation of opaque idioms using high quality clean multilingual corpus. This Corpus will be collected analytically from human generated idiom translation. AutoML translation, a Google Neural Machine Translation Platform, is used as a custom translation model to improve the translation of opaque idioms. The automatic evaluation of the custom model will be compared to the Google NMT using Bilingual Evaluation Understudy Score BLEU. BLEU is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. Human evaluation is integrated to test the reliability of the Blue Score. The researcher will examine syntactical, lexical, and semantic features using Halliday's functional theory.

Keywords: multilingual corpora, natural language processing (NLP), neural machine translation (NMT), opaque idioms

Procedia PDF Downloads 110
38 A Study on Information Structure in the Vajrachedika-Prajna-paramita Sutra and Translation Aspect

Authors: Yoon-Cheol Park

Abstract:

This research focuses on examining the information structures in the old Chinese character-Korean translation of the Vajrachedika-prajna-paramita sutra. The background of this research comes from the fact that there were no previous researches which looked into the information structures in the target text of the Vajrachedika-prajna-paramita sutra by now. The existing researches on the Buddhist scripture translation mainly put weight on message conveyance by literal and semantic translation methods. But the message conveyance from one language to another has a necessity to be delivered with equivalent information structure. Thus, this research is intended to investigate on the flow of old and new information in the target text of Buddhist scripture, compared with source text. The Vajrachedika-prajna-paramita sutra unlike other Buddhist scriptures is composed of conversational structures between Buddha and his disciple, Suboli. This implies that the information flow can be changed by utterance context and some propositions. So, this research tries to analyze the flow of old and new information within the source and target text. As a result of analysis, this research can discover the following facts; firstly, there are the differences of the information flow in the message conveyance between the old Chinese character and Korean by language features. The old Chinese character reveals that old-new information flow is developed, while Korean indicates new-old information flow because of word order. Secondly, the source text of the Vajrachedika-prajna-paramita sutra includes abstruse terminologies, jargon and abstract words. These make influence on the target text and cause the change of the information flow. But the repetitive expressions of these words provide the old information in the target text. Lastly, the Vajrachedika-prajna-paramita sutra offers the expository structure from conversations between Buddha and Suboli. It means that the information flow is developed in the way of explaining specific subjects and of paraphrasing unfamiliar phrases and expressions. From the results of analysis above, this research can verify that the information structures in the target text of the Vajrachedika-prajna-paramita sutra are changed by specific subjects and terminologies, developed with the new-old information flow by repetitive expressions or word order and reveal the information structures familiar to target culture. It also implies that the translation of the Vajrachedika-prajna-paramita sutra as a religious book needs the message conveyance to take into account the information structures of two languages.

Keywords: abstruse terminologies, the information structure, new and old information, old Chinese character-Korean translation

Procedia PDF Downloads 346
37 A Method for Clinical Concept Extraction from Medical Text

Authors: Moshe Wasserblat, Jonathan Mamou, Oren Pereg

Abstract:

Natural Language Processing (NLP) has made a major leap in the last few years, in practical integration into medical solutions; for example, extracting clinical concepts from medical texts such as medical condition, medication, treatment, and symptoms. However, training and deploying those models in real environments still demands a large amount of annotated data and NLP/Machine Learning (ML) expertise, which makes this process costly and time-consuming. We present a practical and efficient method for clinical concept extraction that does not require costly labeled data nor ML expertise. The method includes three steps: Step 1- the user injects a large in-domain text corpus (e.g., PubMed). Then, the system builds a contextual model containing vector representations of concepts in the corpus, in an unsupervised manner (e.g., Phrase2Vec). Step 2- the user provides a seed set of terms representing a specific medical concept (e.g., for the concept of the symptoms, the user may provide: ‘dry mouth,’ ‘itchy skin,’ and ‘blurred vision’). Then, the system matches the seed set against the contextual model and extracts the most semantically similar terms (e.g., additional symptoms). The result is a complete set of terms related to the medical concept. Step 3 –in production, there is a need to extract medical concepts from the unseen medical text. The system extracts key-phrases from the new text, then matches them against the complete set of terms from step 2, and the most semantically similar will be annotated with the same medical concept category. As an example, the seed symptom concepts would result in the following annotation: “The patient complaints on fatigue [symptom], dry skin [symptom], and Weight loss [symptom], which can be an early sign for Diabetes.” Our evaluations show promising results for extracting concepts from medical corpora. The method allows medical analysts to easily and efficiently build taxonomies (in step 2) representing their domain-specific concepts, and automatically annotate a large number of texts (in step 3) for classification/summarization of medical reports.

Keywords: clinical concepts, concept expansion, medical records annotation, medical records summarization

Procedia PDF Downloads 108
36 Conversational Assistive Technology of Visually Impaired Person for Social Interaction

Authors: Komal Ghafoor, Tauqir Ahmad, Murtaza Hanif, Hira Zaheer

Abstract:

Assistive technology has been developed to support visually impaired people in their social interactions. Conversation assistive technology is designed to enhance communication skills, facilitate social interaction, and improve the quality of life of visually impaired individuals. This technology includes speech recognition, text-to-speech features, and other communication devices that enable users to communicate with others in real time. The technology uses natural language processing and machine learning algorithms to analyze spoken language and provide appropriate responses. It also includes features such as voice commands and audio feedback to provide users with a more immersive experience. These technologies have been shown to increase the confidence and independence of visually impaired individuals in social situations and have the potential to improve their social skills and relationships with others. Overall, conversation-assistive technology is a promising tool for empowering visually impaired people and improving their social interactions. One of the key benefits of conversation-assistive technology is that it allows visually impaired individuals to overcome communication barriers that they may face in social situations. It can help them to communicate more effectively with friends, family, and colleagues, as well as strangers in public spaces. By providing a more seamless and natural way to communicate, this technology can help to reduce feelings of isolation and improve overall quality of life. The main objective of this research is to give blind users the capability to move around in unfamiliar environments through a user-friendly device by face, object, and activity recognition system. This model evaluates the accuracy of activity recognition. This device captures the front view of the blind, detects the objects, recognizes the activities, and answers the blind query. It is implemented using the front view of the camera. The local dataset is collected that includes different 1st-person human activities. The results obtained are the identification of the activities that the VGG-16 model was trained on, where Hugging, Shaking Hands, Talking, Walking, Waving video, etc.

Keywords: dataset, visually impaired person, natural language process, human activity recognition

Procedia PDF Downloads 35
35 Cross-Tier Collaboration between Preservice and Inservice Language Teachers in Designing Online Video-Based Pragmatic Assessment

Authors: Mei-Hui Liu

Abstract:

This paper reports the progression of language teachers’ learning to assess students’ speech act performance via online videos in a cross-tier professional growth community. This yearlong research project collected multiple data sources from several stakeholders, including 12 preservice and 4 inservice English as a foreign language (EFL) teachers, 4 English professionals, and 82 high school students. Data sources included surveys, (focus group) interviews, online reflection journals, online video-based assessment items/scores, and artifacts related to teacher professional learning. The major findings depicted the effectiveness of this proposed learning module on language teacher development in pragmatic assessment as well as its impact on student learning experience. All these teachers appreciated this professional learning experience which enhanced their knowledge in assessing students’ pragmalinguistic and sociopragmatic performance in an English speech act (i.e., making refusals). They learned how to design online video-based assessment items by attending to specific linguistic structures, semantic formula, and sociocultural issues. They further became aware of how to sharpen pragmatic instructional skills in the near future after putting theories into online assessment and related classroom practices. Additionally, data analysis revealed students’ achievement in and satisfaction with the designed online assessment. Yet, during the professional learning process most participating teachers encountered challenges in reaching a consensus on selecting appropriate video clips from available sources to present the sociocultural values in English-speaking refusal contexts. Also included was to construct test items which could testify the influence of interlanguage transfer on students’ pragmatic performance in various conversational scenarios. With pedagogical implications and research suggestions, this study adds to the increasing amount of research into integrating preservice and inservice EFL teacher education in pragmatic assessment and relevant instruction. Acknowledgment: This research project is sponsored by the Ministry of Science and Technology in the Republic of China under the grant number of MOST 106-2410-H-029-038.

Keywords: cross-tier professional development, inservice EFL teachers, pragmatic assessment, preservice EFL teachers, student learning experience

Procedia PDF Downloads 231
34 Equivalences and Contrasts in the Morphological Formation of Echo Words in Two Indo-Aryan Languages: Bengali and Odia

Authors: Subhanan Mandal, Bidisha Hore

Abstract:

The linguistic process whereby repetition of all or part of the base word with or without internal change before or after the base itself takes place is regarded as reduplication. The reduplicated morphological construction annotates with itself a new grammatical category and meaning. Reduplication is a very frequent and abundant phenomenon in the eastern Indian languages from the states of West Bengal and Odisha, i.e. Bengali and Odia respectively. Bengali, an Indo-Aryan language and a part of the Indo-European language family is one of the largest spoken languages in India and is the national language of Bangladesh. Despite this classification, Bengali has certain influences in terms of vocabulary and grammar due to its geographical proximity to Tibeto-Burman and Austro-Asiatic language speaking communities. Bengali along with Odia belonged to a single linguistic branch. But with time and gradual linguistic changes due to various factors, Odia was the first to break away and develop as a separate distinct language. However, less of contrasts and more of similarities still exist among these languages along the line of linguistics, leaving apart the script. This paper deals with the procedure of echo word formations in Bengali and Odia. The morphological research of the two languages concerning the field of reduplication reveals several linguistic processes. The revelation is based on the information elicited from native language speakers and also on the analysis of echo words found in discourse and conversational patterns. For the purpose of partial reduplication analysis, prefixed class and suffixed class word formations are taken into consideration which show specific rule based changes. For example, in suffixed class categorization, both consonant and vowel alterations are found, following the rules: i) CVx à tVX, ii) CVCV à CVCi. Further classifications were also found on sentential studies of both languages which revealed complete reduplication complexities while forming echo words where the head word lose its original meaning. Complexities based on onomatopoetic/phonetic imitation of natural phenomena and not according to any rule-based occurrences were also found. Taking these aspects into consideration which are very prevalent in both the languages, inferences are drawn from the study which bring out many similarities in both the languages in this area in spite of branching away from each other several years ago.

Keywords: consonant alteration, onomatopoetic, partial reduplication and complete reduplication, reduplication, vowel alteration

Procedia PDF Downloads 219
33 Children’s Perception of Conversational Agents and Their Attention When Learning from Dialogic TV

Authors: Katherine Karayianis

Abstract:

Children with Attention Deficit Hyperactivity Disorder (ADHD) have trouble learning in traditional classrooms. These children miss out on important developmental opportunities in school, which leads to challenges starting in early childhood, and these problems persist throughout their adult lives. Despite receiving supplemental support in school, children with ADHD still perform below their non-ADHD peers. Thus, there is a great need to find better ways of facilitating learning in children with ADHD. Evidence has shown that children with ADHD learn best through interactive engagement, but this is not always possible in schools, given classroom restraints and the large student-to-teacher ratio. Redesigning classrooms may not be feasible, so informal learning opportunities provide a possible alternative. One popular informal learning opportunity is educational TV shows like Sesame Street. These types of educational shows can teach children foundational skills taught in pre-K and early elementary school. One downside to these shows is the lack of interactive dialogue between the TV characters and the child viewers. Pseudo-interaction is often deployed, but the benefits are limited if the characters can neither understand nor contingently respond to the child. AI technology has become extremely advanced and is now popular in many electronic devices that both children and adults have access to. AI has been successfully used to create interactive dialogue in children’s educational TV shows, and results show that this enhances children’s learning and engagement, especially when children perceive the character as a reliable teacher. It is likely that children with ADHD, whose minds may otherwise wander, may especially benefit from this type of interactive technology, possibly to a greater extent depending on their perception of the animated dialogic agent. To investigate this issue, I have begun examining the moderating role of inattention among children’s learning from an educational TV show with different types of dialogic interactions. Preliminary results have shown that when character interactions are neither immediate nor accurate, children who are more easily distracted will have greater difficulty learning from the show, but contingent interactions with a TV character seem to buffer these negative effects of distractibility by keeping the child engaged. To extend this line of work, the moderating role of the child’s perception of the dialogic agent as a reliable teacher will be examined in the association between children’s attention and the type of dialogic interaction in the TV show. As such, the current study will investigate this moderated moderation.

Keywords: attention, dialogic TV, informal learning, educational TV, perception of teacher

Procedia PDF Downloads 47
32 On the Semantics and Pragmatics of 'Be Able To': Modality and Actualisation

Authors: Benoît Leclercq, Ilse Depraetere

Abstract:

The goal of this presentation is to shed new light on the semantics and pragmatics of be able to. It presents the results of a corpus analysis based on data from the BNC (British National Corpus), and discusses these results in light of a specific stance on the semantics-pragmatics interface taking into account recent developments. Be able to is often discussed in relation to can and could, all of which can be used to express ability. Such an onomasiological approach often results in the identification of usage constraints for each expression. In the case of be able to, it is the formal properties of the modal expression (unlike can and could, be able to has non-finite forms) that are in the foreground, and the modal expression is described as the verb that conveys future ability. Be able to is also argued to expressed actualised ability in the past (I was able/could to open the door). This presentation aims to provide a more accurate pragmatic-semantic profile of be able to, based on extensive data analysis and one that is embedded in a very explicit view on the semantics-pragmatics interface. A random sample of 3000 examples (1000 for each modal verb) extracted from the BNC was analysed to account for the following issues. First, the challenge is to identify the exact semantic range of be able to. The results show that, contrary to general assumption, be able to does not only express ability but it shares most of the root meanings usually associated with the possibility modals can and could. The data reveal that what is called opportunity is, in fact, the most frequent meaning of be able to. Second, attention will be given to the notion of actualisation. It is commonly argued that be able to is the preferred form when the residue actualises: (1) The only reason he was able to do that was because of the restriction (BNC, spoken) (2) It is only through my imaginative shuffling of the aces that we are able to stay ahead of the pack. (BNC, written) Although this notion has been studied in detail within formal semantic approaches, empirical data is crucially lacking and it is unclear whether actualisation constitutes a conventional (and distinguishing) property of be able to. The empirical analysis provides solid evidence that actualisation is indeed a conventional feature of the modal. Furthermore, the dataset reveals that be able to expresses actualised 'opportunities' and not actualised 'abilities'. In the final part of this paper, attention will be given to the theoretical implications of the empirical findings, and in particular to the following paradox: how can the same expression encode both modal meaning (non-factual) and actualisation (factual)? It will be argued that this largely depends on one's conception of the semantics-pragmatics interface, and that this need not be an issue when actualisation (unlike modality) is analysed as a generalised conversational implicature and thus is considered part of the conventional pragmatic layer of be able to.

Keywords: Actualisation, Modality, Pragmatics, Semantics

Procedia PDF Downloads 101
31 Treating Voxels as Words: Word-to-Vector Methods for fMRI Meta-Analyses

Authors: Matthew Baucum

Abstract:

With the increasing popularity of fMRI as an experimental method, psychology and neuroscience can greatly benefit from advanced techniques for summarizing and synthesizing large amounts of data from brain imaging studies. One promising avenue is automated meta-analyses, in which natural language processing methods are used to identify the brain regions consistently associated with certain semantic concepts (e.g. “social”, “reward’) across large corpora of studies. This study builds on this approach by demonstrating how, in fMRI meta-analyses, individual voxels can be treated as vectors in a semantic space and evaluated for their “proximity” to terms of interest. In this technique, a low-dimensional semantic space is built from brain imaging study texts, allowing words in each text to be represented as vectors (where words that frequently appear together are near each other in the semantic space). Consequently, each voxel in a brain mask can be represented as a normalized vector sum of all of the words in the studies that showed activation in that voxel. The entire brain mask can then be visualized in terms of each voxel’s proximity to a given term of interest (e.g., “vision”, “decision making”) or collection of terms (e.g., “theory of mind”, “social”, “agent”), as measured by the cosine similarity between the voxel’s vector and the term vector (or the average of multiple term vectors). Analysis can also proceed in the opposite direction, allowing word cloud visualizations of the nearest semantic neighbors for a given brain region. This approach allows for continuous, fine-grained metrics of voxel-term associations, and relies on state-of-the-art “open vocabulary” methods that go beyond mere word-counts. An analysis of over 11,000 neuroimaging studies from an existing meta-analytic fMRI database demonstrates that this technique can be used to recover known neural bases for multiple psychological functions, suggesting this method’s utility for efficient, high-level meta-analyses of localized brain function. While automated text analytic methods are no replacement for deliberate, manual meta-analyses, they seem to show promise for the efficient aggregation of large bodies of scientific knowledge, at least on a relatively general level.

Keywords: FMRI, machine learning, meta-analysis, text analysis

Procedia PDF Downloads 424
30 Chinese Students’ Use of Corpus Tools in an English for Academic Purposes Writing Course: Influence on Learning Behaviour, Performance Outcomes and Perceptions

Authors: Jingwen Ou

Abstract:

Writing for academic purposes in a second or foreign language poses a significant challenge for non-native speakers, particularly at the tertiary level, where English academic writing for L2 students is often hindered by difficulties in academic discourse, including vocabulary, academic register, and organization. The past two decades have witnessed a rising popularity in the application of the data-driven learning (DDL) approach in EAP writing instruction. In light of such a trend, this study aims to enhance the integration of DDL into English for academic purposes (EAP) writing classrooms by investigating the perception of Chinese college students regarding the use of corpus tools for improving EAP writing. Additionally, the research explores their corpus consultation behaviors during training to provide insights into corpus-assisted EAP instruction for DDL practitioners. Given the uprising popularity of DDL, this research aims to investigate Chinese university students’ use of corpus tools with three main foci: 1) the influence of corpus tools on learning behaviours, 2) the influence of corpus tools on students’ academic writing performance outcomes, and 3) students’ perceptions and potential perceptional changes towards the use of such tools. Three corpus tools, CQPWeb, Sketch Engine, and LancsBox X, are selected for investigation due to the scarcity of empirical research on patterns of learners’ engagement with a combination of multiple corpora. The research adopts a pre-test / post-test design for the evaluation of students’ academic writing performance before and after the intervention. Twenty participants will be divided into two groups: an intervention and a non-intervention group. Three corpus training workshops will be delivered at the beginning, middle, and end of a semester. An online survey and three separate focus group interviews are designed to investigate students’ perceptions of the use of corpus tools for improving academic writing skills, particularly the rhetorical functions in different essay sections. Insights from students’ consultation sessions indicated difficulties with DDL practice, including insufficiency of time to complete all tasks, struggle with technical set-up, unfamiliarity with the DDL approach and difficulty with some advanced corpus functions. Findings from the main study aim to provide pedagogical insights and training resources for EAP practitioners and learners.

Keywords: corpus linguistics, data-driven learning, English for academic purposes, tertiary education in China

Procedia PDF Downloads 28
29 The Women-In-Mining Discourse: A Study Combining Corpus Linguistics and Discourse Analysis

Authors: Ylva Fältholm, Cathrine Norberg

Abstract:

One of the major threats identified to successful future mining is that women do not find the industry attractive. Many attempts have been made, for example in Sweden and Australia, to create organizational structures and mining communities attractive to both genders. Despite such initiatives, many mining areas are developing into gender-segregated fly-in/fly out communities dominated by men with both social and economic consequences. One of the challenges facing many mining companies is thus to break traditional gender patterns and structures. To do this increased knowledge about gender in the context of mining is needed. Since language both constitutes and reproduces knowledge, increased knowledge can be gained through an exploration and description of the mining discourse from a gender perspective. The aim of this study is to explore what conceptual ideas are activated in connection to the physical/geographical mining area and to work within the mining industry. We use a combination of critical discourse analysis implying close reading of selected texts, such as policy documents, interview materials, applications and research and innovation agendas, and analyses of linguistic patterns found in large language corpora covering millions of words of contemporary language production. The quantitative corpus data serves as a point of departure for the qualitative analysis of the texts, that is, suggests what patterns to explore further. The study shows that despite technological and organizational development, one of the most persistent discourses about mining is the conception of dangerous and unfriendly areas infused with traditional notions of masculinity ideals and manual hard work. Although some of the texts analyzed highlight gender issues, and describe gender-equalizing initiatives, such as wage-mapping systems, female networks and recruitment efforts for women executives, and thereby render the discourse less straightforward, it is shown that these texts are not unambiguous examples of a counter-discourse. They rather illustrate that discourses are not stable but include opposing discourses, in dialogue with each other. For example, many texts highlight why and how women are important to mining, at the same time as they suggest that gender and diversity are all about women: why mining is a problem for them, how they should be, and what they should do to fit in. Drawing on a constitutive view of discourse, knowledge about such conflicting perceptions of women is a prerequisite for succeeding in attracting women to the mining industry and thereby contributing to the development of future mining.

Keywords: discourse, corpus linguistics, gender, mining

Procedia PDF Downloads 237
28 The Automatisation of Dictionary-Based Annotation in a Parallel Corpus of Old English

Authors: Ana Elvira Ojanguren Lopez, Javier Martin Arista

Abstract:

The aims of this paper are to present the automatisation procedure adopted in the implementation of a parallel corpus of Old English, as well as, to assess the progress of automatisation with respect to tagging, annotation, and lemmatisation. The corpus consists of an aligned parallel text with word-for-word comparison Old English-English that provides the Old English segment with inflectional form tagging (gloss, lemma, category, and inflection) and lemma annotation (spelling, meaning, inflectional class, paradigm, word-formation and secondary sources). This parallel corpus is intended to fill a gap in the field of Old English, in which no parallel and/or lemmatised corpora are available, while the average amount of corpus annotation is low. With this background, this presentation has two main parts. The first part, which focuses on tagging and annotation, selects the layouts and fields of lexical databases that are relevant for these tasks. Most information used for the annotation of the corpus can be retrieved from the lexical and morphological database Nerthus and the database of secondary sources Freya. These are the sources of linguistic and metalinguistic information that will be used for the annotation of the lemmas of the corpus, including morphological and semantic aspects as well as the references to the secondary sources that deal with the lemmas in question. Although substantially adapted and re-interpreted, the lemmatised part of these databases draws on the standard dictionaries of Old English, including The Student's Dictionary of Anglo-Saxon, An Anglo-Saxon Dictionary, and A Concise Anglo-Saxon Dictionary. The second part of this paper deals with lemmatisation. It presents the lemmatiser Norna, which has been implemented on Filemaker software. It is based on a concordance and an index to the Dictionary of Old English Corpus, which comprises around three thousand texts and three million words. In its present state, the lemmatiser Norna can assign lemma to around 80% of textual forms on an automatic basis, by searching the index and the concordance for prefixes, stems and inflectional endings. The conclusions of this presentation insist on the limits of the automatisation of dictionary-based annotation in a parallel corpus. While the tagging and annotation are largely automatic even at the present stage, the automatisation of alignment is pending for future research. Lemmatisation and morphological tagging are expected to be fully automatic in the near future, once the database of secondary sources Freya and the lemmatiser Norna have been completed.

Keywords: corpus linguistics, historical linguistics, old English, parallel corpus

Procedia PDF Downloads 175
27 Towards Creative Movie Title Generation Using Deep Neural Models

Authors: Simon Espigolé, Igor Shalyminov, Helen Hastie

Abstract:

Deep machine learning techniques including deep neural networks (DNN) have been used to model language and dialogue for conversational agents to perform tasks, such as giving technical support and also for general chit-chat. They have been shown to be capable of generating long, diverse and coherent sentences in end-to-end dialogue systems and natural language generation. However, these systems tend to imitate the training data and will only generate the concepts and language within the scope of what they have been trained on. This work explores how deep neural networks can be used in a task that would normally require human creativity, whereby the human would read the movie description and/or watch the movie and come up with a compelling, interesting movie title. This task differs from simple summarization in that the movie title may not necessarily be derivable from the content or semantics of the movie description. Here, we train a type of DNN called a sequence-to-sequence model (seq2seq) that takes as input a short textual movie description and some information on e.g. genre of the movie. It then learns to output a movie title. The idea is that the DNN will learn certain techniques and approaches that the human movie titler may deploy that may not be immediately obvious to the human-eye. To give an example of a generated movie title, for the movie synopsis: ‘A hitman concludes his legacy with one more job, only to discover he may be the one getting hit.’; the original, true title is ‘The Driver’ and the one generated by the model is ‘The Masquerade’. A human evaluation was conducted where the DNN output was compared to the true human-generated title, as well as a number of baselines, on three 5-point Likert scales: ‘creativity’, ‘naturalness’ and ‘suitability’. Subjects were also asked which of the two systems they preferred. The scores of the DNN model were comparable to the scores of the human-generated movie title, with means m=3.11, m=3.12, respectively. There is room for improvement in these models as they were rated significantly less ‘natural’ and ‘suitable’ when compared to the human title. In addition, the human-generated title was preferred overall 58% of the time when pitted against the DNN model. These results, however, are encouraging given the comparison with a highly-considered, well-crafted human-generated movie title. Movie titles go through a rigorous process of assessment by experts and focus groups, who have watched the movie. This process is in place due to the large amount of money at stake and the importance of creating an effective title that captures the audiences’ attention. Our work shows progress towards automating this process, which in turn may lead to a better understanding of creativity itself.

Keywords: creativity, deep machine learning, natural language generation, movies

Procedia PDF Downloads 303
26 A Self-Built Corpus-Based Study of Four-Word Lexical Bundles in Native English Teachers’ EFL Classroom Discourse in Northeast China: The Significance of Stance

Authors: Fang Tan

Abstract:

This research focuses on the appropriate use of lexical bundles in spoken discourse, particularly in English as a Foreign Language (EFL) classrooms in Northeast China. While previous studies have mainly examined lexical bundles in written discourse, there is a need to investigate their usage in spoken discourse due to the limited availability of spoken discourse corpora. English teachers’ use of lexical bundles is crucial for effective teaching and communication in the EFL classroom. The aim of this study is to investigate the functions of four-word lexical bundles in native English teachers’ EFL oral English classes in Northeast China. Specifically, the research focuses on the usage of stance bundles, which were found to be the most significant type of bundle in the analyzed corpus. By comparing the self-built university spoken English classroom discourse corpus with the other self-built university English for General Purposes (EGP) corpus, the study aims to highlight the difference in bundle usage between native and non-native teachers in EFL classrooms. The research employs a corpus-based study. The observed corpus consists of more than 300,000 tokens, in which the data has been collected in the past five years. The reference corpus is composed of over 800,000 tokens, in which the data has been collected over 12 years. All the primary data collection involved transcribing and annotating spoken English classes taught by native English teachers. The analysis procedures included identifying and categorizing four-word lexical bundles, with specific emphasis on stance bundles. Frequency counts, and comparisons with the Chinese English teachers’ corpus were conducted to identify patterns and differences in bundle usage. The research addresses the following questions: 1) What are the functions of four-word lexical bundles in native English teachers’ EFL oral English classes? 2) How do stance bundles differ in usage between native and non-native English teachers’ classes? 3) What implications can be drawn for English teachers’ professional development based on the findings? In conclusion, this study provides valuable insights into the usage of four-word lexical bundles, particularly stance bundles, in native English teachers’ EFL oral English classes in Northeast China. The research highlights the difference in bundle usage between native and non-native English teachers’ classes and provides implications for English teachers’ professional development. The findings contribute to the understanding of lexical bundle usage in EFL classroom discourse and have theoretical importance for language teaching methodologies. The self-built university English classroom discourse corpus used in this research is a valuable resource for future studies in this field.

Keywords: EFL classroom discourse, four-word lexical bundles, stance, implication

Procedia PDF Downloads 34
25 Accomplishing Mathematical Tasks in Bilingual Primary Classrooms

Authors: Gabriela Steffen

Abstract:

Learning in a bilingual classroom not only implies learning in two languages or in an L2, it also means learning content subjects through the means of bilingual or plurilingual resources, which is of a qualitatively different nature than ‘monolingual’ learning. These resources form elements of a didactics of plurilingualism, aiming not only at the development of a plurilingual competence, but also at drawing on plurilingual resources for nonlinguistic subject learning. Applying a didactics of plurilingualism allows for taking account of the specificities of bilingual content subject learning in bilingual education classrooms. Bilingual education is used here as an umbrella term for different programs, such as bilingual education, immersion, CLIL, bilingual modules in which one or several non-linguistic subjects are taught partly or completely in an L2. This paper aims at discussing first results of a study on pupil group work in bilingual classrooms in several Swiss primary schools. For instance, it analyses two bilingual classes in two primary schools in a French-speaking region of Switzerland that follows a part of their school program through German in addition to French, the language of instruction in this region. More precisely, it analyses videotaped classroom interaction and in situ classroom practices of pupil group work in a mathematics lessons. The ethnographic observation of pupils’ group work and the analysis of their interaction (analytical tools of conversational analysis, discourse analysis and plurilingual interaction) enhance the description of whole-class interaction done in the same (and several other) classes. While the latter are teacher-student interactions, the former are student-student interactions giving more space to and insight into pupils’ talk. This study aims at the description of the linguistic and multimodal resources (in German L2 and/or French L1) pupils mobilize while carrying out a mathematical task. The analysis shows that the accomplishment of the mathematical task takes place in a bilingual mode, whether the whole-class interactions are conducted rather in a bilingual (German L2-French L1) or a monolingual mode in L2 (German). The pupils make plenty of use of German L2 in a setting that lends itself to use French L1 (peer groups with French as a dominant language, in absence of the teacher and a task with a mathematical aim). They switch from French to German and back ‘naturally’, which is regular for bilingual speakers. Their linguistic resources in German L2 are not sufficient to allow them to (inter-)act well enough to accomplish the task entirely in German L2, despite their efforts to do so. However, this does not stop them from carrying out the task in mathematics adequately, which is the main objective, by drawing on the bilingual resources at hand.

Keywords: bilingual content subject learning, bilingual primary education, bilingual pupil group work, bilingual teaching/learning resources, didactics of plurilingualism

Procedia PDF Downloads 137
24 Integration of an Innovative Complementary Approach Inspired by Clinical Hypnosis into Oncology Care: Nurses’ Perception of Comfort Talk

Authors: Danny Hjeij, Karine Bilodeau, Caroline Arbour

Abstract:

Background: Chemotherapy infusions often lead to a cluster of co-occurring and difficult-to-treat symptoms (nausea, tingling, etc.), which may negatively impact the treatment experience at the outpatient clinic. Although several complementary approaches have shown beneficial effects for chemotherapy-induced symptom management, they are not easily implementable during chemotherapy infusion. In response to this limitation, comfort talk (CT), a simple, fast conversational method inspired by the language principles of clinical hypnosis, is known to optimize the management of symptoms related to antineoplastic treatments. However, the perception of nurses who have had to integrate this practice into their care has never been documented. Study design: A qualitative descriptive study with iterative content analysis was conducted among oncology nurses working in a chemotherapy outpatient clinic who had previous experience with CT. Semi-structured interviews were conducted by phone, using a pre-tested interview guide and a sociodemographic survey to document their perception of CT. The conceptual framework. Results: A total of six nurses (4 women, 2 men) took part in the interviews (N=6). The average age of participants was 49 years (36-61 years). Participants had an average of 24 years of experience (10-38 years) as a nurse, including 14.5 years in oncology (5-32 years). Data saturation (i.e., redundancy of words) was observed around the fifth interview. A sixth interview was conducted as confirmation. Six themes emerged: two addressing contextual and organizational obstacles at the chemotherapy outpatient clinic, and three addressing the added value of CT for oncology nursing care. Specific themes included: 1) the outpatient oncology clinic, a saturated care setting, 2) the keystones that support the integration of CT into care, 3) added value for patients, 4) a positive and rewarding experience for nurses, 5) collateral benefits, and 6) CT an approach to consider during the COVID-19 pandemic. Conclusion: For the first time, this study describes nurses' perception of the integration of CT into the care surrounding the administration of chemotherapy at the outpatient oncology clinic. In summary, contextual and organizational difficulties, as well as the lack of training, are among the main obstacles that could hinder the integration of CT in oncology. Still, the experience was reported mostly as positive. Indeed, nurses saw HC as an added value to patient care and meeting their need for holistic care. HC also appears to be beneficial for patients on several levels (for pain management in particular). Results will be used to inform future knowledge transfer activities related to CT in oncology nursing.

Keywords: cancer, chemotherapy, comfort talk, oncology nursing role

Procedia PDF Downloads 56
23 A Corpus-Based Analysis of Japanese Learners' English Modal Auxiliary Verb Usage in Writing

Authors: S. Nakayama

Abstract:

For non-native English speakers, using English modal auxiliary verbs appropriately can be among the most challenging tasks. This research sought to identify differences in modal verb usage between Japanese non-native English speakers (JNNS) and native speakers (NS) from two different perspectives: frequency of use and distribution of verb phrase structures (VPS) where modal verbs occur. This study can contribute to the identification of JNNSs' interlanguage with regard to modal verbs; the main aim is to make a suggestion for the improvement of teaching materials as well as to help language teachers to be able to teach modal verbs in a way that is helpful for learners. To address the primary question in this study, usage of nine central modals (‘can’, ‘could’, ‘may’, ‘might’, ‘shall’, ‘should’, ‘will’, ‘would’, and ‘must’) by JNNS was compared with that by NSs in the International Corpus Network of Asian Learners of English (ICNALE). This corpus is one of the largest freely-available corpora focusing on Asian English learners’ language use. The ICNALE corpus consists of four modules: ‘Spoken Monologue’, ‘Spoken Dialogue’, ‘Written Essays’, and ‘Edited Essays’. Among these, this research adopted the ‘Written Essays’ module only, which is the set of 200-300 word essays and contains approximately 1.3 million words in total. Frequency analysis revealed gaps as well as similarities in frequency order. Specifically, both JNNSs and NSs used ‘can’ with the most frequency, followed by ‘should’ and ‘will’; however, usage of all the other modals except for ‘shall’ was not identical to each other. A log-likelihood test uncovered JNNSs’ overuse of ‘can’ and ‘must’ as well as their underuse of ‘will’ and ‘would’. VPS analysis revealed that JNNSs used modal verbs in a relatively narrow range of VPSs as compared to NSs. Results showed that JNNSs used most of the modals with bare infinitives or the passive voice only whereas NSs used the modals in a wide range of VPSs including the progressive construction and the perfect aspect, both of which were the structures where JNNSs rarely used the modals. Results of frequency analysis suggest that language teachers or teaching materials should explain other modality items so that learners can avoid relying heavily on certain modals and have a wide range of lexical items to reflect their feelings more accurately. Besides, the underused modals should be more stressed in the classroom because they are members of epistemic modals, which allow us to not only interject our views into propositions but also build a relationship with readers. As for VPSs, teaching materials should present more examples of the modals occurring in a wide range of VPSs to help learners to be able to express their opinions from a variety of viewpoints.

Keywords: corpus linguistics, Japanese learners of English, modal auxiliary verbs, International Corpus Network of Asian Learners of English

Procedia PDF Downloads 109
22 Historical Development of Negative Emotive Intensifiers in Hungarian

Authors: Martina Katalin Szabó, Bernadett Lipóczi, Csenge Guba, István Uveges

Abstract:

In this study, an exhaustive analysis was carried out about the historical development of negative emotive intensifiers in the Hungarian language via NLP methods. Intensifiers are linguistic elements which modify or reinforce a variable character in the lexical unit they apply to. Therefore, intensifiers appear with other lexical items, such as adverbs, adjectives, verbs, infrequently with nouns. Due to the complexity of this phenomenon (set of sociolinguistic, semantic, and historical aspects), there are many lexical items which can operate as intensifiers. The group of intensifiers are admittedly one of the most rapidly changing elements in the language. From a linguistic point of view, particularly interesting are a special group of intensifiers, the so-called negative emotive intensifiers, that, on their own, without context, have semantic content that can be associated with negative emotion, but in particular cases, they may function as intensifiers (e.g.borzasztóanjó ’awfully good’, which means ’excellent’). Despite their special semantic features, negative emotive intensifiers are scarcely examined in literature based on large Historical corpora via NLP methods. In order to become better acquainted with trends over time concerning the intensifiers, The exhaustively analysed a specific historical corpus, namely the Magyar TörténetiSzövegtár (Hungarian Historical Corpus). This corpus (containing 3 millions text words) is a collection of texts of various genres and styles, produced between 1772 and 2010. Since the corpus consists of raw texts and does not contain any additional information about the language features of the data (such as stemming or morphological analysis), a large amount of manual work was required to process the data. Thus, based on a lexicon of negative emotive intensifiers compiled in a previous phase of the research, every occurrence of each intensifier was queried, and the results were stored in a separate data frame. Then, basic linguistic processing (POS-tagging, lemmatization etc.) was carried out automatically with the ‘magyarlanc’ NLP-toolkit. Finally, the frequency and collocation features of all the negative emotive words were automatically analyzed in the corpus. Outcomes of the research revealed in detail how these words have proceeded through grammaticalization over time, i.e., they change from lexical elements to grammatical ones, and they slowly go through a delexicalization process (their negative content diminishes over time). What is more, it was also pointed out which negative emotive intensifiers are at the same stage in this process in the same time period. Giving a closer look to the different domains of the analysed corpus, it also became certain that during this process, the pragmatic role’s importance increases: the newer use expresses the speaker's subjective, evaluative opinion at a certain level.

Keywords: historical corpus analysis, historical linguistics, negative emotive intensifiers, semantic changes over time

Procedia PDF Downloads 200
21 The Meaning Structures of Political Participation of Young Women: Preliminary Findings in a Practical Phenomenology Study

Authors: Amanda Aliende da Matta, Maria del Pilar Fogueiras Bertomeu, Valeria de Ormaechea Otalora, Maria Paz Sandin Esteban, Miriam Comet Donoso

Abstract:

This communication presents the preliminary emerging themes in a research on political participation of young women. The study follows a qualitative methodology; in particular, the applied hermeneutic phenomenological method, and the general objective of the research is to give an account of the experience of political participation as young women. The study participants are women aged 18 to 35 who have experience in political participation. The techniques of data collection are the descriptive story and the phenomenological interview. With respect to the first methodological steps, these have been: 1) collect and select stories of lived experience in political participation, 2) select descriptions of lived experience (DLEs) in political participation of the chosen stories, 3) to prepare phenomenological interviews from the selected DLEs, 4) to conduct phenomenological thematic analysis (PTA) of the DLEs. We have so far initiated the PTA on 5 vignettes. Hermeneutic phenomenology as a research approach is based on phenomenological philosophy and applied hermeneutics. Phenomenology is a descriptive philosophy of pure experience and essences, through which we seek to capture an experience at its origins without categorizing, interpreting or theorizing it. Hermeneutics, on the other hand, may be defined as a philosophical current that can be applied to data analysis. Max Van Manen wrote that hermeneutic phenomenology is a method of abstemious reflection on the basic structures of the lived experience of human existence. In hermeneutic phenomenology we focus, then, on the way we experience “things” in the first person, seeking to capture the world exactly as we experience it, not as we categorize or conceptualize it. In this study, the empirical methods used were: Lived experience description (written) and conversational interview. For these short stories, participants were asked: “What was your lived experience of participation in politics as a young woman? Can you tell me any stories or anecdotes that you think exemplify or typify your experience?”. The questions were accompanied by a list of guidelines for writing descriptive vignettes. And the analytical method was PTA. Among the provisional results, we found preliminary emerging themes, which could in the advance of the investigation result in meaning structures of political participation of young women. They are the following: - Complicity may be inherent/essential in political participation as a young woman; - Feelings may be essential/inherent in political participation as a young woman; - Hope may be essential in authentic political participation as a young woman; - Frustration may be essential in authentic political participation as a young woman; - Satisfaction may be essential in authentic political participation as a young woman; - There may be tension between individual/collective inherent/essential in political participation as a young woman; - Political participation as a young woman may include moments of public demonstration.

Keywords: applied hermeneutic phenomenology, hermeneutics, phenomenology, political participation

Procedia PDF Downloads 56
20 Exploring the Vocabulary and Grammar Advantage of US American over British English Speakers at Age 2;0

Authors: Janine Just, Kerstin Meints

Abstract:

The research aims to compare vocabulary size and grammatical development between US American English- and British English-speaking children at age 2;0. As there is evidence that precocious children with large vocabularies develop grammar skills earlier than their typically developing peers, it was investigated if this also holds true across varieties of English. Thus, if US American children start to produce words earlier than their British counterparts, this could mean that US children are also at an advantage in the early developmental stages of acquiring grammar. This research employs a British English adaptation of the MacArthur-Bates CDI Words and Sentences (Lincoln Toddler CDI) to compare vocabulary and also grammar scores with the updated US Toddler CDI norms. At first, the Lincoln TCDI was assessed for its concurrent validity with the Preschool Language Scale (PLS-5 UK). This showed high correlations for the vocabulary and grammar subscales between the tests. In addition, the frequency of the Toddler CDI’s words was also compared using American and British English corpora of adult spoken and written language. A paired-samples t-test found a significant difference in word frequency between the British and the American CDI demonstrating that the TCDI’s words were indeed of higher frequency in British English. We then compared language and grammar scores between US (N = 135) and British children (N = 96). A two-way between groups ANOVA examined if the two samples differed in terms of SES (i.e. maternal education) by investigating the impact of SES and country on vocabulary and sentence complexity. The two samples did not differ in terms of maternal education as the interaction effects between SES and country were not significant. In most cases, scores were not significantly different between US and British children, for example, for overall word production and most grammatical subscales (i.e. use of words, over- regularizations, complex sentences, word combinations). However, in-depth analysis showed that US children were significantly better than British children at using some noun categories (i.e. people, objects, places) and several categories marking early grammatical development (i.e. pronouns, prepositions, quantifiers, helping words). However, the effect sizes were small. Significant differences for grammar were found for irregular word forms and progressive tense suffixes. US children were more advanced in their use of these grammatical categories, but the effect sizes were small. In sum, while differences exist in terms of vocabulary and grammar ability, favouring US children, effect sizes were small. It can be concluded that most British children are ‘catching up’ with their US American peers at age 2;0. Implications of this research will be discussed.

Keywords: first language acquisition, grammar, parent report instrument, vocabulary

Procedia PDF Downloads 251
19 Biofilm Text Classifiers Developed Using Natural Language Processing and Unsupervised Learning Approach

Authors: Kanika Gupta, Ashok Kumar

Abstract:

Biofilms are dense, highly hydrated cell clusters that are irreversibly attached to a substratum, to an interface or to each other, and are embedded in a self-produced gelatinous matrix composed of extracellular polymeric substances. Research in biofilm field has become very significant, as biofilm has shown high mechanical resilience and resistance to antibiotic treatment and constituted as a significant problem in both healthcare and other industry related to microorganisms. The massive information both stated and hidden in the biofilm literature are growing exponentially therefore it is not possible for researchers and practitioners to automatically extract and relate information from different written resources. So, the current work proposes and discusses the use of text mining techniques for the extraction of information from biofilm literature corpora containing 34306 documents. It is very difficult and expensive to obtain annotated material for biomedical literature as the literature is unstructured i.e. free-text. Therefore, we considered unsupervised approach, where no annotated training is necessary and using this approach we developed a system that will classify the text on the basis of growth and development, drug effects, radiation effects, classification and physiology of biofilms. For this, a two-step structure was used where the first step is to extract keywords from the biofilm literature using a metathesaurus and standard natural language processing tools like Rapid Miner_v5.3 and the second step is to discover relations between the genes extracted from the whole set of biofilm literature using pubmed.mineR_v1.0.11. We used unsupervised approach, which is the machine learning task of inferring a function to describe hidden structure from 'unlabeled' data, in the above-extracted datasets to develop classifiers using WinPython-64 bit_v3.5.4.0Qt5 and R studio_v0.99.467 packages which will automatically classify the text by using the mentioned sets. The developed classifiers were tested on a large data set of biofilm literature which showed that the unsupervised approach proposed is promising as well as suited for a semi-automatic labeling of the extracted relations. The entire information was stored in the relational database which was hosted locally on the server. The generated biofilm vocabulary and genes relations will be significant for researchers dealing with biofilm research, making their search easy and efficient as the keywords and genes could be directly mapped with the documents used for database development.

Keywords: biofilms literature, classifiers development, text mining, unsupervised learning approach, unstructured data, relational database

Procedia PDF Downloads 143
18 Chatbots and the Future of Globalization: Implications of Businesses and Consumers

Authors: Shoury Gupta

Abstract:

Chatbots are a rapidly growing technological trend that has revolutionized the way businesses interact with their customers. With the advancements in artificial intelligence, chatbots can now mimic human-like conversations and provide instant and efficient responses to customer inquiries. In this research paper, we aim to explore the implications of chatbots on the future of globalization for both businesses and consumers. The paper begins by providing an overview of the current state of chatbots in the global market and their growth potential in the future. The focus is on how chatbots have become a valuable tool for businesses looking to expand their global reach, especially in areas with high population density and language barriers. With chatbots, businesses can engage with customers in different languages and provide 24/7 customer service support, creating a more accessible and convenient customer experience. The paper then examines the impact of chatbots on cross-cultural communication and how they can help bridge communication gaps between businesses and consumers from different cultural backgrounds. Chatbots can potentially facilitate cross-cultural communication by offering real-time translations, voice recognition, and other innovative features that can help users communicate effectively across different languages and cultures. By providing more accessible and inclusive communication channels, chatbots can help businesses reach new markets and expand their customer base, making them more competitive in the global market. However, the paper also acknowledges that there are potential drawbacks associated with chatbots. For instance, chatbots may not be able to address complex customer inquiries that require human input. Additionally, chatbots may perpetuate biases if they are programmed with certain stereotypes or assumptions about different cultures. These drawbacks may have significant implications for businesses and consumers alike. To explore the implications of chatbots on the future of globalization in greater detail, the paper provides a thorough review of existing literature and case studies. The review covers topics such as the benefits of chatbots for businesses and consumers, the potential drawbacks of chatbots, and how businesses can mitigate any risks associated with chatbot use. The paper also discusses the ethical considerations associated with chatbot use, such as privacy concerns and the need to ensure that chatbots do not discriminate against certain groups of people. The ethical implications of chatbots are particularly important given the potential for chatbots to be used in sensitive areas such as healthcare and financial services. Overall, this research paper provides a comprehensive analysis of chatbots and their implications for the future of globalization. By exploring both the potential benefits and drawbacks of chatbot use, the paper aims to provide insights into how businesses and consumers can leverage this technology to achieve greater global reach and improve cross-cultural communication. Ultimately, the paper concludes that chatbots have the potential to be a powerful tool for businesses looking to expand their global footprint and improve their customer experience, but that care must be taken to mitigate any risks associated with their use.

Keywords: chatbots, conversational AI, globalization, businesses

Procedia PDF Downloads 65