Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 12408

Search results for: Chinese natural language processing

12348 A Survey of the Applications of Sentiment Analysis

Abstract:

Natural language often conveys emotions of speakers. Therefore, sentiment analysis on what people say is prevalent in the field of natural language process and has great application value in many practical problems. Thus, to help people understand its application value, in this paper, we survey various applications of sentiment analysis, including the ones in online business and offline business as well as other types of its applications. In particular, we give some application examples in intelligent customer service systems in China. Besides, we compare the applications of sentiment analysis on Twitter, Weibo, Taobao and Facebook, and discuss some challenges. Finally, we point out the challenges faced in the applications of sentiment analysis and the work that is worth being studied in the future.

Keywords: application, natural language processing, online comments, sentiment analysis

Procedia PDF Downloads 230

12347 Content and Language Integrated Instruction: An Investigation of Oral Corrective Feedback in the Chinese Immersion Classroom

Authors: Qin Yao

Abstract:

Content and language integrated instruction provides second language learners instruction in subject matter and language, and is greatly valued, particularly in the language immersion classroom where a language other than students’ first language is the vehicle for teaching school curriculum. Corrective feedback is an essential instructional technique for teachers to integrate a focus on language into their content instruction. This study aims to fill a gap in the literature on immersion—the lack of studies examining corrective feedback in Chinese immersion classrooms, by studying learning opportunities brought by oral corrective feedback in a Chinese immersion classroom. Specifically, it examines what is the distribution of different types of teacher corrective feedback and how students respond to each feedback type, as well as how the focus of the teacher-student interactional exchanges affect the effect of feedback. Two Chinese immersion teachers and their immersion classes were involved, and data were collected through classroom observations interviews. Observations document teachers’ provision of oral corrective feedback and students’ responses following the feedback in class, and interviews with teachers collected teachers’ reflective thoughts about their teaching. A primary quantitative and qualitative analysis of the data revealed that, among different types of corrective feedback, recast occurred most frequently. Metalinguistic clue and repetition were the least occurring feedback types. Clarification request lead to highest percentage of learner uptake manifested by learners’ oral production immediately following the feedback, while explicit correction came the second and recast the third. In addition, the results also showed the interactional context played a role in the effectiveness of the feedback: teachers were most likely to give feedback in conversational exchanges that focused on explicit language and content, while students were most likely to use feedback in exchanges that focused on explicit language. In conclusion, the results of this study indicate recasts are preferred by Chinese immersion teachers, confirming results of previous studies on corrective feedback in non-Chinese immersion classrooms; and clarification request and explicit language instruction elicit more target language production from students and are facilitative in their target language development, thus should not be overlooked in immersion and other content and language integrated classrooms.

Keywords: Chinese immersion, content and language integrated instruction, corrective feedback, interaction

Procedia PDF Downloads 382

12346 Exploring the Potential of Replika: An AI Chatbot for Mental Health Support

Authors: Nashwah Alnajjar

Abstract:

This research paper provides an overview of Replika, an AI chatbot application that uses natural language processing technology to engage in conversations with users. The app was developed to provide users with a virtual AI friend who can converse with them on various topics, including mental health. This study explores the experiences of Replika users using quantitative research methodology. A survey was conducted with 12 participants to collect data on their demographics, usage patterns, and experiences with the Replika app. The results showed that Replika has the potential to play a role in mental health support and well-being.

Keywords: Replika, chatbot, mental health, artificial intelligence, natural language processing

Procedia PDF Downloads 54

12345 An ERP Study of Chinese Pseudo-Object Structures

Authors: Changyin Zhou

Abstract:

Verb-argument relation is a very important aspect of syntax-semantics interaction in sentence processing. Previous ERP (event related potentials) studies in this field mainly concentrated on the relation between the verb and its core arguments. The present study aims to reveal the ERP pattern of Chinese pseudo-object structures (SOSs), in which a peripheral argument is promoted to occupy the position of the patient object, as compared with the patient object structures (POSs). The ERP data were collected when participants were asked to perform acceptability judgments about Chinese phrases. Our result shows that, similar to the previous studies of number-of-argument violations, Chinese SOSs show a bilaterally distributed N400 effect. But different from all the previous studies of verb-argument relations, Chinese SOSs demonstrate a sustained anterior positivity (SAP). This SAP, which is the first report related to complexity of argument structure operation, reflects the integration difficulty of the newly promoted arguments and the progressive nature of well-formedness checking in the processing of Chinese SOSs.

Keywords: Chinese pseudo-object structures, ERP, sustained anterior positivity, verb-argument relation

Procedia PDF Downloads 413

12344 Teacher Agency in Localizing Textbooks for International Chinese Language Teaching: A Case of Minsk State Linguistic University

Authors: Min Bao

Abstract:

The teacher is at the core of the three fundamental factors in international Chinese language teaching, the other two being the textbook and the method. Professional development of the teacher comprises a self-renewing process that is characterized by knowledge impartment and self-reflection, in which individual agency plays a significant role. Agency makes a positive contribution to teachers’ teaching practice and their life-long learning. This study, taking Chinese teaching and learning in Minsk State Linguistic University of Belarus as an example, attempts to understand agency by investigating the teacher’s strategic adaptation of textbooks to meet local needs. Firstly, through in-depth interviews, teachers’ comments on textbooks are collected and analyzed to disclose their strategies of adapting and localizing textbooks. Then, drawing on the theory of 'The chordal triad of agency', the paper reveals the process in which teacher agency is exercised as well as its rationale. The results verify the theory, that is, given its temporal relationality, teacher agency is constructed through a combination of experiences, purposes and aims, and context, i.e., projectivity, iteration and practice-evaluation as mentioned in the theory. Evidence also suggests that the three dimensions effect differently; It is usually one or two dimensions that are of greater effects on the construction of teacher agency. Finally, the paper provides four specific insights to teacher development in international Chinese language teaching: 1) when recruiting teachers, priority be given on candidates majoring in Chinese language or international Chinese language teaching; 2) measures be taken to assure educational quality of the two said majors at various levels; 3) pre-service teacher training program be tailored for improved quality, and 4) management of overseas Confucius Institutions be enhanced.

Keywords: international Chinese language teaching, teacher agency, textbooks, localization

Procedia PDF Downloads 123

12343 Topic-to-Essay Generation with Event Element Constraints

Authors: Yufen Qin

Abstract:

Topic-to-Essay generation is a challenging task in Natural language processing, which aims to generate novel, diverse, and topic-related text based on user input. Previous research has overlooked the generation of articles under the constraints of event elements, resulting in issues such as incomplete event elements and logical inconsistencies in the generated results. To fill this gap, this paper proposes an event-constrained approach for a topic-to-essay generation that enforces the completeness of event elements during the generation process. Additionally, a language model is employed to verify the logical consistency of the generated results. Experimental results demonstrate that the proposed model achieves a better BLEU-2 score and performs better than the baseline in terms of subjective evaluation on a real dataset, indicating its capability to generate higher-quality topic-related text.

Keywords: event element, language model, natural language processing, topic-to-essay generation.

Procedia PDF Downloads 197

12342 Part of Speech Tagging Using Statistical Approach for Nepali Text

Authors: Archit Yajnik

Abstract:

Part of Speech Tagging has always been a challenging task in the era of Natural Language Processing. This article presents POS tagging for Nepali text using Hidden Markov Model and Viterbi algorithm. From the Nepali text, annotated corpus training and testing data set are randomly separated. Both methods are employed on the data sets. Viterbi algorithm is found to be computationally faster and accurate as compared to HMM. The accuracy of 95.43% is achieved using Viterbi algorithm. Error analysis where the mismatches took place is elaborately discussed.

Keywords: hidden markov model, natural language processing, POS tagging, viterbi algorithm

Procedia PDF Downloads 302

12341 Chinese Speakers’ Language Attitudes Towards English Accents: Comparing Mainland and Hong Kong English Major Students’ Accent Preferences in ELF Communication

Authors: Jiaqi XU, Qingru Sun

Abstract:

Accent plays a crucial role in second language (L2) learners’ performance in the schooling context in the era of globalization, where English is adopted as a lingua franca (ELF). Previous studies found that Chinese mainland students prefer American English accents, whereas the young generations in Hong Kong prefer British accents. However, these studies neglect the non-native accents of English and fail to elaborate much about why the L2 learners differ in accent preferences between the two regions. Therefore, this research aims to bridge the research gap by 1) including both native and non-native varieties of English accents: American accent, British accent, Chinese Mandarin English accent, and Hong Kong English accent; and 2) uncovering and comparing the deeper reasons for the similar or/and different accent preferences between the Chinese mainland and Hong Kong speakers. This research designed a questionnaire including objective and subjective questions to investigate the students’ accent inclinations and the attitudes and reasons behind their linguistic choices. The questionnaire was distributed to eight participants (4 Chinese mainland students and 4 Hong Kong students) who were postgraduate students at a Hong Kong university. Based on the data collection, this research finds out one similarity and two differences between the Chinese mainland and Hong Kong students’ attitudes. The theories of identity construction and standard language ideology are further applied to analyze the reasons behind the similarities and differences and to evaluate how language attitudes intertwine with their identity construction and language ideology.

Keywords: accent, language attitudes, identity construction, language ideology, ELF communication

Procedia PDF Downloads 117

12340 JaCoText: A Pretrained Model for Java Code-Text Generation

Authors: Jessica Lopez Espejel, Mahaman Sanoussi Yahaya Alassan, Walid Dahhane, El Hassane Ettifouri

Abstract:

Pretrained transformer-based models have shown high performance in natural language generation tasks. However, a new wave of interest has surged: automatic programming language code generation. This task consists of translating natural language instructions to a source code. Despite the fact that well-known pre-trained models on language generation have achieved good performance in learning programming languages, effort is still needed in automatic code generation. In this paper, we introduce JaCoText, a model based on Transformer neural network. It aims to generate java source code from natural language text. JaCoText leverages the advantages of both natural language and code generation models. More specifically, we study some findings from state of the art and use them to (1) initialize our model from powerful pre-trained models, (2) explore additional pretraining on our java dataset, (3) lead experiments combining the unimodal and bimodal data in training, and (4) scale the input and output length during the fine-tuning of the model. Conducted experiments on CONCODE dataset show that JaCoText achieves new state-of-the-art results.

Keywords: java code generation, natural language processing, sequence-to-sequence models, transformer neural networks

Procedia PDF Downloads 236

12339 Resource Framework Descriptors for Interestingness in Data

Authors: C. B. Abhilash, Kavi Mahesh

Abstract:

Human beings are the most advanced species on earth; it's all because of the ability to communicate and share information via human language. In today's world, a huge amount of data is available on the web in text format. This has also resulted in the generation of big data in structured and unstructured formats. In general, the data is in the textual form, which is highly unstructured. To get insights and actionable content from this data, we need to incorporate the concepts of text mining and natural language processing. In our study, we mainly focus on Interesting data through which interesting facts are generated for the knowledge base. The approach is to derive the analytics from the text via the application of natural language processing. Using semantic web Resource framework descriptors (RDF), we generate the triple from the given data and derive the interesting patterns. The methodology also illustrates data integration using the RDF for reliable, interesting patterns.

Keywords: RDF, interestingness, knowledge base, semantic data

Procedia PDF Downloads 125

12338 Are Some Languages Harder to Learn and Teach Than Others?

Authors: David S. Rosenstein

Abstract:

The author believes that modern spoken languages should be equally difficult (or easy) to learn, since all normal children learning their native languages do so at approximately the same rate and with the same competence, progressing from easy to more complex grammar and syntax in the same way. Why then, do some languages seem more difficult than others? Perhaps people are referring to the written language, where it may be true that mastering Chinese requires more time than French, which in turn requires more time than Spanish. But this may be marginal, since Chinese and French children quickly catch up to their Spanish peers in reading comprehension. Rather, the real differences in difficulty derive from two sources: hardened L1 language habits trying to cope with contrasting L2 habits; and unfamiliarity with unique L2 characteristics causing faulty expectations. It would seem that effective L2 teaching and learning must take these two sources of difficulty into consideration. The author feels that the latter (faulty expectations) causes the greatest difficulty, making effective teaching and learning somewhat different for each given foreign language. Examples from Chinese and other languages are presented.

Keywords: learning different languages, language learning difficulties, faulty language expectations

Procedia PDF Downloads 507

12337 Thinking for Writing: Evidence of Language Transfer in Chinese ESL Learners’ Written Narratives

Authors: Nan Yang, Hye Pae

Abstract:

English as a second language (ESL) learners are often observed to have transferred traits of their first languages (L1) and habits of using their L1s to their use of English (second language, L2), and this phenomenon is coined as language transfer. In addition to the transfer of linguistic features (e.g., grammar, vocabulary, etc.), which are relatively easy to observe and quantify, many cross-cultural theorists emphasized on a much subtle and fundamental transfer existing on a higher conceptual level that is referred to as conceptual transfer. Although a growing body of literature in linguistics has demonstrated evidence of L1 transfer in various discourse genres, very limited studies address the underlying conceptual transfer that is happening along with the language transfer, especially with the extended form of spontaneous discourses such as personal narrative. To address this issue, this study situates itself in the context of Chinese ESL learners’ written narratives, examines evidence of L1 conceptual transfer in comparison with native English speakers’ narratives, and provides discussion from the perspective of the conceptual transfer. It is hypothesized that Chinese ESL learners’ English narrative strategies are heavily influenced by the strategies that they use in Chinese as a result of the conceptual transfer. Understanding language transfer cognitively is of great significance in the realm of SLA, as it helps address challenges that ESL learners around the world are facing; allow native English speakers to develop a better understanding about how and why learners’ English is different; and also shed light in ESL pedagogy by providing linguistic and cultural expectations in native English-speaking countries. To achieve the goals, 40 college students were recruited (20 Chinese ESL learners and 20 native English speakers) in the United States, and their written narratives on the prompt 'The most frightening experience' were collected for quantitative discourse analysis. 40 written narratives (20 in Chinese and 20 in English) were collected from Chinese ESL learners, and 20 written narratives were collected from native English speakers. All written narratives were coded according to the coding scheme developed by the authors prior to data collection. Statistical descriptive analyses were conducted, and the preliminary results revealed that native English speakers included more narrative elements such as events and explicit evaluation comparing to Chinese ESL students’ both English and Chinese writings; the English group also utilized more evaluation device (i.e., physical state expressions, indirectly reported speeches, delineation) than Chinese ESL students’ both English and Chinese writings. It was also observed that Chinese ESL students included more orientation elements (i.e., the introduction of time/place, the introduction of character) in their Chinese and English writings than the native English-speaking participants. The findings suggest that a similar narrative strategy was observed in Chinese ESL learners’ Chinese narratives and English narratives, which is considered as the evidence of conceptual transfer from Chinese (L1) to English (L2). The results also indicate that distinct narrative strategies were used by Chinese ESL learners and native English speakers as a result of cross-cultural differences.

Keywords: Chinese ESL learners, language transfer, thinking-for-speaking, written narratives

Procedia PDF Downloads 92

12336 Implementing a Database from a Requirement Specification

Authors: M. Omer, D. Wilson

Abstract:

Creating a database scheme is essentially a manual process. From a requirement specification, the information contained within has to be analyzed and reduced into a set of tables, attributes and relationships. This is a time-consuming process that has to go through several stages before an acceptable database schema is achieved. The purpose of this paper is to implement a Natural Language Processing (NLP) based tool to produce a from a requirement specification. The Stanford CoreNLP version 3.3.1 and the Java programming were used to implement the proposed model. The outcome of this study indicates that the first draft of a relational database schema can be extracted from a requirement specification by using NLP tools and techniques with minimum user intervention. Therefore, this method is a step forward in finding a solution that requires little or no user intervention.

Keywords: information extraction, natural language processing, relation extraction

Procedia PDF Downloads 232

12335 Testing Chat-GPT: An AI Application

Authors: Jana Ismail, Layla Fallatah, Maha Alshmaisi

Abstract:

ChatGPT, a cutting-edge language model built on the GPT-3.5 architecture, has garnered attention for its profound natural language processing capabilities, holding promise for transformative applications in customer service and content creation. This study delves into ChatGPT's architecture, aiming to comprehensively understand its strengths and potential limitations. Through systematic experiments across diverse domains, such as general knowledge and creative writing, we evaluated the model's coherence, context retention, and task-specific accuracy. While ChatGPT excels in generating human-like responses and demonstrates adaptability, occasional inaccuracies and sensitivity to input phrasing were observed. The study emphasizes the impact of prompt design on output quality, providing valuable insights for the nuanced deployment of ChatGPT in conversational AI and contributing to the ongoing discourse on the evolving landscape of natural language processing in artificial intelligence.

Keywords: artificial Inelegance, chatGPT, open AI, NLP

Procedia PDF Downloads 36

12334 Computational Linguistic Implications of Gender Bias: Machines Reflect Misogyny in Society

Authors: Irene Yi

Abstract:

Machine learning, natural language processing, and neural network models of language are becoming more and more prevalent in the fields of technology and linguistics today. Training data for machines are at best, large corpora of human literature and at worst, a reflection of the ugliness in society. Computational linguistics is a growing field dealing with such issues of data collection for technological development. Machines have been trained on millions of human books, only to find that in the course of human history, derogatory and sexist adjectives are used significantly more frequently when describing females in history and literature than when describing males. This is extremely problematic, both as training data, and as the outcome of natural language processing. As machines start to handle more responsibilities, it is crucial to ensure that they do not take with them historical sexist and misogynistic notions. This paper gathers data and algorithms from neural network models of language having to deal with syntax, semantics, sociolinguistics, and text classification. Computational analysis on such linguistic data is used to find patterns of misogyny. Results are significant in showing the existing intentional and unintentional misogynistic notions used to train machines, as well as in developing better technologies that take into account the semantics and syntax of text to be more mindful and reflect gender equality. Further, this paper deals with the idea of non-binary gender pronouns and how machines can process these pronouns correctly, given its semantic and syntactic context. This paper also delves into the implications of gendered grammar and its effect, cross-linguistically, on natural language processing. Languages such as French or Spanish not only have rigid gendered grammar rules, but also historically patriarchal societies. The progression of society comes hand in hand with not only its language, but how machines process those natural languages. These ideas are all extremely vital to the development of natural language models in technology, and they must be taken into account immediately.

Keywords: computational analysis, gendered grammar, misogynistic language, neural networks

Procedia PDF Downloads 90

12333 From User's Requirements to UML Class Diagram

Authors: Zeineb Ben Azzouz, Wahiba Ben Abdessalem Karaa

Abstract:

The automated extraction of UML class diagram from natural language requirements is a highly challenging task. Many approaches, frameworks and tools have been presented in this field. Nonetheless, the experiments of these tools have shown that there is no approach that can work best all the time. In this context, we propose a new accurate approach to facilitate the automatic mapping from textual requirements to UML class diagram. Our new approach integrates the best properties of statistical Natural Language Processing (NLP) techniques to reduce ambiguity when analysing natural language requirements text. In addition, our approach follows the best practices defined by conceptual modelling experts to determine some patterns indispensable for the extraction of basic elements and concepts of the class diagram. Once the relevant information of class diagram is captured, a XMI document is generated and imported with a CASE tool to build the corresponding UML class diagram.

Keywords: class diagram, user’s requirements, XMI, software engineering

Procedia PDF Downloads 443

12332 The Use of Videoconferencing in a Task-Based Beginners' Chinese Class

Authors: Sijia Guo

Abstract:

The development of new technologies and the falling cost of high-speed Internet access have made it easier for institutes and language teachers to opt different ways to communicate with students at distance. The emergence of web-conferencing applications, which integrate text, chat, audio / video and graphic facilities, offers great opportunities for language learning to through the multimodal environment. This paper reports on data elicited from a Ph.D. study of using web-conferencing in the teaching of first-year Chinese class in order to promote learners’ collaborative learning. Firstly, a comparison of four desktop videoconferencing (DVC) tools was conducted to determine the pedagogical value of the videoconferencing tool-Blackboard Collaborate. Secondly, the evaluation of 14 campus-based Chinese learners who conducted five one-hour online sessions via the multimodal environment reveals the users’ choice of modes and their learning preference. The findings show that the tasks designed for the web-conferencing environment contributed to the learners’ collaborative learning and second language acquisition.

Keywords: computer-mediated communication (CMC), CALL evaluation, TBLT, web-conferencing, online Chinese teaching

Procedia PDF Downloads 282

12331 A Study on the Acquisition of Chinese Classifiers by Vietnamese Learners

Authors: Quoc Hung Le Pham

Abstract:

In the field of language study, classifier is an interesting research feature. In the world’s languages, some languages have classifier system, some do not. Mandarin Chinese and Vietnamese languages are a rich classifier system, however, because of the language system, the cognitive, cultural differences, so that the syntactic structure of classifier of them also dissimilar. When using Mandarin Chinese classifiers must collocate with nouns or verbs, in the lexical category it is not like nouns or verbs, belong to the open class. But some scholars believe that Mandarin Chinese measure words are similar to English and other Indo European languages. The word hanging on the structure and word formation (suffix), is a closed class. Compared to other languages, such as Chinese, Vietnamese, Thai and other Asian languages are still belonging to the classifier language’s second type, this type of language is classifier, it is in the majority of quantity must exist, and following deictic, anaphoric or quantity appearing together, not separation between its modified noun, also known as numeral classifier language. Main syntactic structure of Chinese classifiers are as follows: ‘quantity+measure+noun’, ‘pronoun+measure+noun’, ‘pronoun+quantity+measure+noun’, ‘prefix+quantity+measure +noun’, ‘quantity +adjective + measure +noun’, ‘ quantity (above 10 whole number), + duo (多)measure +noun’, ‘ quantity (around 10) + measure + duo (多) +noun’. Main syntactic structure of Vietnamese classifiers are: ‘quantity+measure+noun’, ‘ measure+noun+pronoun’, ‘quantity+measure+noun+pronoun’, ‘measure+noun+prefix+ quantity’, ‘quantity+measure+noun+adjective', ‘duo (多) +quanlity+measure+noun’, ‘quantity+measure+adjective+pronoun (quantity word could not be 1)’, ‘measure+adjective+pronoun’, ‘measure+pronoun’. In daily life, classifiers are commonly used, if Chinese learners failed to standardize this using catergory, because the negative impact might occur on their verbal communication. The richness of the Chinese classifier system contributes to the complexity in the study of the system by foreign learners, especially in the inter language of Vietnamese learners. As above mentioned, Vietnamese language also has a rich system of classifiers, however, the basic structure order of two languages are similar but both still have differences. These similarities and dissimilarities between Chinese and Vietnamese classifier systems contribute significantly to the common errors made by Vietnamese students while they acquire Chinese, which are distinct from the errors made by students from the other language background. This article from a comparative perspective of language, has an orientation towards Chinese and Vietnamese languages commonly used in classifiers semantics and structural form two aspects. This comparative study aims to identity Vietnamese students while learning Chinese classifiers may face some negative transference of mother language, beside that through the analysis of the classifiers questionnaire, find out the causes and patterns of the errors they made. As the preliminary analysis shows, Vietnamese students while learning Chinese classifiers made some errors such as: overuse classifier ‘ge’(个); misuse the other classifiers ‘*yi zhang ri ji’(yi pian ri ji), ‘*yi zuo fang zi’(yi jian fang zi), ‘*si zhang jin pai’(si mei jin pai); homonym words ‘dui, shuang, fu, tao’ (对、双、副、套), ‘ke, li’ (颗、粒).

Keywords: acquisition, classifiers, negative transfer, Vietnamse learners

Procedia PDF Downloads 420

12330 Investigating Classroom Teachers' Perceptions of Assessing U.S. College Students' L2 Chinese Oral Performance

Authors: Guangyan Chen

Abstract:

This study examined Chinese teachers’ perceptions of assessing U.S. college students’ L2 (second language) Chinese oral performances at different levels. Ten oral performances were videotaped from which three were chosen as samples to represent three different proficiency levels based on professionals’ judgments according to the ACTFL proficiency guidelines. The three samples were shown to L2 Chinese teachers who completed questionnaires about their assessments for each speech sample. In total, 104 L2 Chinese teachers responded to each of the three samples. The Exploratory Factor Analyses (EFA) of the teachers’ responses revealed three similar rating criteria patterns for assessing the three levels of oral performances. The teachers’ responses to Samples 2 and 3 revealed five rating criteria: Global proficiency, Chinese conceptual framework, content richness, communication appropriateness, and communication clarity. The teachers’ responses to Sample 1 revealed four rating criteria: global proficiency, Chinese conceptual framework, communication appropriateness/content richness, and communication clarity. However, the analyses of variance (ANOVAs) revealed that the proficiency levels of the three oral performances differed significantly across all rating criteria. Therefore, the data suggests that L2 classroom teachers could use the similar rating criteria pattern to assess college-level L2 Chinese students’ oral performances at different proficiency levels.

Keywords: language assessment, L2 Chinese, oral performance, rating criteria

Procedia PDF Downloads 515

12329 A Supervised Approach for Word Sense Disambiguation Based on Arabic Diacritics

Authors: Alaa Alrakaf, Sk. Md. Mizanur Rahman

Abstract:

Since the last two decades’ Arabic natural language processing (ANLP) has become increasingly much more important. One of the key issues related to ANLP is ambiguity. In Arabic language different pronunciation of one word may have a different meaning. Furthermore, ambiguity also has an impact on the effectiveness and efficiency of Machine Translation (MT). The issue of ambiguity has limited the usefulness and accuracy of the translation from Arabic to English. The lack of Arabic resources makes ambiguity problem more complicated. Additionally, the orthographic level of representation cannot specify the exact meaning of the word. This paper looked at the diacritics of Arabic language and used them to disambiguate a word. The proposed approach of word sense disambiguation used Diacritizer application to Diacritize Arabic text then found the most accurate sense of an ambiguous word using Naïve Bayes Classifier. Our Experimental study proves that using Arabic Diacritics with Naïve Bayes Classifier enhances the accuracy of choosing the appropriate sense by 23% and also decreases the ambiguity in machine translation.

Keywords: Arabic natural language processing, machine learning, machine translation, Naive bayes classifier, word sense disambiguation

Procedia PDF Downloads 327

12328 Experimental Research and Analyses of Yoruba Native Speakers’ Chinese Phonetic Errors

Authors: Obasa Joshua Ifeoluwa

Abstract:

Phonetics is the foundation and most important part of language learning. This article, through an acoustic experiment as well as using Praat software, uses Yoruba students’ Chinese consonants, vowels, and tones pronunciation to carry out a visual comparison with that of native Chinese speakers. This article is aimed at Yoruba native speakers learning Chinese phonetics; therefore, Yoruba students are selected. The students surveyed are required to be at an elementary level and have learned Chinese for less than six months. The students selected are all undergraduates majoring in Chinese Studies at the University of Lagos. These students have already learned Chinese Pinyin and are all familiar with the pinyin used in the provided questionnaire. The Chinese students selected are those that have passed the level two Mandarin proficiency examination, which serves as an assurance that their pronunciation is standard. It is discovered in this work that in terms of Mandarin’s consonants pronunciation, Yoruba students cannot distinguish between the voiced and voiceless as well as the aspirated and non-aspirated phonetics features. For instance, while pronouncing [ph] it is clearly shown in the spectrogram that the Voice Onset Time (VOT) of a Chinese speaker is higher than that of a Yoruba native speaker, which means that the Yoruba speaker is pronouncing the unaspirated counterpart [p]. Another difficulty is to pronounce some affricates like [tʂ]、[tʂʰ]、[ʂ]、[ʐ]、 [tɕ]、[tɕʰ]、[ɕ]. This is because these sounds are not in the phonetic system of the Yoruba language. In terms of vowels, some students find it difficult to pronounce some allophonic high vowels such as [ɿ] and [ʅ], therefore pronouncing them as their phoneme [i]; another pronunciation error is pronouncing [y] as [u], also as shown in the spectrogram, a student pronounced [y] as [iu]. In terms of tone, it is most difficult for students to differentiate between the second (rising) and third (falling and rising) tones because these tones’ emphasis is on the rising pitch. This work concludes that the major error made by Yoruba students while pronouncing Chinese sounds is caused by the interference of their first language (LI) and sometimes by their lingua franca.

Keywords: Chinese, Yoruba, error analysis, experimental phonetics, consonant, vowel, tone

Procedia PDF Downloads 76

12327 Benchmarking Bert-Based Low-Resource Language: Case Uzbek NLP Models

Authors: Jamshid Qodirov, Sirojiddin Komolov, Ravilov Mirahmad, Olimjon Mirzayev

Abstract:

Nowadays, natural language processing tools play a crucial role in our daily lives, including various techniques with text processing. There are very advanced models in modern languages, such as English, Russian etc. But, in some languages, such as Uzbek, the NLP models have been developed recently. Thus, there are only a few NLP models in Uzbek language. Moreover, there is no such work that could show which Uzbek NLP model behaves in different situations and when to use them. This work tries to close this gap and compares the Uzbek NLP models existing as of the time this article was written. The authors try to compare the NLP models in two different scenarios: sentiment analysis and sentence similarity, which are the implementations of the two most common problems in the industry: classification and similarity. Another outcome from this work is two datasets for classification and sentence similarity in Uzbek language that we generated ourselves and can be useful in both industry and academia as well.

Keywords: NLP, benchmak, bert, vectorization

Procedia PDF Downloads 25

12326 A Practical Survey on Zero-Shot Prompt Design for In-Context Learning

Authors: Yinheng Li

Abstract:

The remarkable advancements in large language models (LLMs) have brought about significant improvements in natural language processing tasks. This paper presents a comprehensive review of in-context learning techniques, focusing on different types of prompts, including discrete, continuous, few-shot, and zero-shot, and their impact on LLM performance. We explore various approaches to prompt design, such as manual design, optimization algorithms, and evaluation methods, to optimize LLM performance across diverse tasks. Our review covers key research studies in prompt engineering, discussing their methodologies and contributions to the field. We also delve into the challenges faced in evaluating prompt performance, given the absence of a single ”best” prompt and the importance of considering multiple metrics. In conclusion, the paper highlights the critical role of prompt design in harnessing the full potential of LLMs and provides insights into the combination of manual design, optimization techniques, and rigorous evaluation for more effective and efficient use of LLMs in various Natural Language Processing (NLP) tasks.

Keywords: in-context learning, prompt engineering, zero-shot learning, large language models

Procedia PDF Downloads 51

12325 Language Politics and Identity in Translation: From a Monolingual Text to Multilingual Text in Chinese Translations

Authors: Chu-Ching Hsu

Abstract:

This paper focuses on how the government-led language policies and the political changes in Taiwan manipulate the languages choice in translations and what translation strategies are employed by the translator to show his or her language ideology behind the power struggles and decision-making. Therefore, framed by Lefevere’s theoretical concept of translating as rewriting, and carried out a diachronic and chronological study, this paper specifically sets out to investigate the language ideology and translator’s idiolect of Chinese language translations of Anglo-American novels. The examples drawn to explore these issues were taken from different versions of Chinese renditions of Mark Twain’s English-language novel The Adventures of Huckleberry Finn in which there are several different dialogues originally written in the colloquial language and dialect used in the American state of Mississippi and reproduced in Mark Twain’s works. Also, adapted corpus methodology, many examples are extracted as instances from the translated texts and source text, to illuminate how the translators in Taiwan deal with the dialectal features encoded in Twain’s works, and how different versions of Chinese translations are employed by Taiwanese translators to confirm the language polices and to express their language identity textually in different periods of the past five decades, from the 1960s onward. The finding of this study suggests that the use of Taiwanese dialect and language patterns in translations does relate to the movement of the mother-tongue language and language ideology of the translator as well as to the issue of language identity raised in the island of Taiwan. Furthermore, this study confirms that the change of political power in Taiwan does bring significantly impact in language policy-- assimilationism, pluralism or multiculturalism, which also makes Taiwan from a monolingual to multilingual society, where the language ideology and identity can be revealed not only in people’s daily communication but also in written translations.

Keywords: language politics and policies, literary translation, mother-tongue, multiculturalism, translator’s ideology

Procedia PDF Downloads 367

12324 Music Reading Expertise Facilitates Implicit Statistical Learning of Sentence Structures in a Novel Language: Evidence from Eye Movement Behavior

Authors: Sara T. K. Li, Belinda H. J. Chung, Jeffery C. N. Yip, Janet H. Hsiao

Abstract:

Music notation and text reading both involve statistical learning of music or linguistic structures. However, it remains unclear how music reading expertise influences text reading behavior. The present study examined this issue through an eye-tracking study. Chinese-English bilingual musicians and non-musicians read English sentences, Chinese sentences, musical phrases, and sentences in Tibetan, a language novel to the participants, with their eye movement recorded. Each set of stimuli consisted of two conditions in terms of structural regularity: syntactically correct and syntactically incorrect musical phrases/sentences. They then completed a sentence comprehension (for syntactically correct sentences) or a musical segment/word recognition task afterwards to test their comprehension/recognition abilities. The results showed that in reading musical phrases, as compared with non-musicians, musicians had a higher accuracy in the recognition task, and had shorter reading time, fewer fixations, and shorter fixation duration when reading syntactically correct (i.e., in diatonic key) than incorrect (i.e., in non-diatonic key/atonal) musical phrases. This result reflects their expertise in music reading. Interestingly, in reading Tibetan sentences, which was novel to both participant groups, while non-musicians did not show any behavior differences between reading syntactically correct or incorrect Tibetan sentences, musicians showed a shorter reading time and had marginally fewer fixations when reading syntactically correct sentences than syntactically incorrect ones. However, none of the musicians reported discovering any structural regularities in the Tibetan stimuli after the experiment when being asked explicitly, suggesting that they may have implicitly acquired the structural regularities in Tibetan sentences. This group difference was not observed when they read English or Chinese sentences. This result suggests that music reading expertise facilities reading texts in a novel language (i.e., Tibetan), but not in languages that the readers are already familiar with (i.e., English and Chinese). This phenomenon may be due to the similarities between reading music notations and reading texts in a novel language, as in both cases the stimuli follow particular statistical structures but do not involve semantic or lexical processing. Thus, musicians may transfer their statistical learning skills stemmed from music notation reading experience to implicitly discover structures of sentences in a novel language. This speculation is consistent with a recent finding showing that music reading expertise modulates the processing of English nonwords (i.e., words that do not follow morphological or orthographic rules) but not pseudo- or real words. These results suggest that the modulation of music reading expertise on language processing depends on the similarities in the cognitive processes involved. It also has important implications for the benefits of music education on language and cognitive development.

Keywords: eye movement behavior, eye-tracking, music reading expertise, sentence reading, structural regularity, visual processing

Procedia PDF Downloads 351

12323 A Study on Bilingual Semantic Processing: Category Effects and Age Effects

Authors: Lai Yi-Hsiu

Abstract:

The present study addressed the nature of bilingual semantic processing in Mandarin Chinese and Southern Min and examined category effects and age effects. Nineteen bilingual adults of Mandarin Chinese and Southern Min, nine monolingual seniors of Mandarin Chinese, and ten monolingual seniors of Southern Min in Taiwan individually completed two semantic tasks: Picture naming and category fluency tasks. The instruments for the naming task were sixty black-and-white pictures, including thirty-five object pictures and twenty-five action pictures. The category fluency task also consisted of two semantic categories – objects (or nouns) and actions (or verbs). The reaction time for each picture/question was additionally calculated and analyzed. Oral productions in Mandarin Chinese and in Southern Min were compared and discussed to examine the category effects and age effects. The results of the category fluency task indicated that the content of information of these seniors was comparatively deteriorated, and thus they produced a smaller number of semantic-lexical items. Significant group differences were also found in the reaction time results. Category effects were significant for both adults and seniors in the semantic fluency task. The findings of the present study will help characterize the nature of the bilingual semantic processing of adults and seniors, and contribute to the fields of contrastive and corpus linguistics.

Keywords: bilingual semantic processing, aging, Mandarin Chinese, Southern Min

Procedia PDF Downloads 542

12322 New Chinese Landscapes in the Works of the Chinese Photographer Yao Lu

Authors: Xiaoling Dai

Abstract:

Many Chinese artists have used digital photography to create works with features of Chinese landscape paintings since the 20th century. The ‘New Mountains and Water’ works created by digital techniques reflect the fusion of photographic techniques and traditional Chinese aesthetic thoughts. Borrowing from Chinese landscape paintings in the Song Dynasty, the Chinese photographer Yao Lu uses digital photography to reflect contemporary environmental construction in his series New Landscapes. By portraying a variety of natural environments brought by urbanization in the contemporary period, Lu deconstructs traditional Chinese paintings and reconstructs contemporary photographic practices. The primary object of this study is to investigate how Chinese photographer Yao Lu redefines and re-interprets the relationship between tradition and contemporaneity. In this study, Yao Lu’s series work New Landscapes is used for photo elicitation, which seeks to broaden understanding of the development of Chinese landscape photography. Furthermore, discourse analysis will be used to evaluate how Chinese social developments influence the creation of photographic practices. Through visual and discourse analysis, this study aims to excavate the relationship between tradition and contemporaneity in Lu’s works. According to New Landscapes, the study argues that in Lu’s interpretations of landscapes, tradition and contemporaneity are seen to establish a new relationship. Traditional approaches to creation do not become obsolete over time. On the contrary, traditional notions and styles of creation can shed new light on contemporary issues or techniques.

Keywords: Chinese aesthetics, Yao Lu, new landscapes, tradition, contemporaneity

Procedia PDF Downloads 56

12321 A Corpus Output Error Analysis of Chinese L2 Learners From America, Myanmar, and Singapore

Authors: Qiao-Yu Warren Cai

Abstract:

Due to the rise of big data, building corpora and using them to analyze ChineseL2 learners’ language output has become a trend. Various empirical research has been conducted using Chinese corpora built by different academic institutes. However, most of the research analyzed the data in the Chinese corpora usingcorpus-based qualitative content analysis with descriptive statistics. Descriptive statistics can be used to make summations about the subjects or samples that research has actually measured to describe the numerical data, but the collected data cannot be generalized to the population. Comte, a Frenchpositivist, has argued since the 19th century that human beings’ knowledge, whether the discipline is humanistic and social science or natural science, should be verified in a scientific way to construct a universal theory to explain the truth and human beings behaviors. Inferential statistics, able to make judgments of the probability of a difference observed between groups being dependable or caused by chance (Free Geography Notes, 2015)and to infer from the subjects or examples what the population might think or behave, is just the right method to support Comte’s argument in the field of TCSOL. Also, inferential statistics is a core of quantitative research, but little research has been conducted by combing corpora with inferential statistics. Little research analyzes the differences in Chinese L2 learners’ language corpus output errors by using theOne-way ANOVA so that the findings of previous research are limited to inferring the population's Chinese errors according to the given samples’ Chinese corpora. To fill this knowledge gap in the professional development of Taiwanese TCSOL, the present study aims to utilize the One-way ANOVA to analyze corpus output errors of Chinese L2 learners from America, Myanmar, and Singapore. The results show that no significant difference exists in ‘shì (是) sentence’ and word order errors, but compared with Americans and Singaporeans, it is significantly easier for Myanmar to have ‘sentence blends.’ Based on the above results, the present study provides an instructional approach and contributes to further exploration of how Chinese L2 learners can have (and use) learning strategies to lower errors.

Keywords: Chinese corpus, error analysis, one-way analysis of variance, Chinese L2 learners, Americans, myanmar, Singaporeans

Procedia PDF Downloads 78

12320 Progress in Combining Image Captioning and Visual Question Answering Tasks

Authors: Prathiksha Kamath, Pratibha Jamkhandi, Prateek Ghanti, Priyanshu Gupta, M. Lakshmi Neelima

Abstract:

Combining Image Captioning and Visual Question Answering (VQA) tasks have emerged as a new and exciting research area. The image captioning task involves generating a textual description that summarizes the content of the image. VQA aims to answer a natural language question about the image. Both these tasks include computer vision and natural language processing (NLP) and require a deep understanding of the content of the image and semantic relationship within the image and the ability to generate a response in natural language. There has been remarkable growth in both these tasks with rapid advancement in deep learning. In this paper, we present a comprehensive review of recent progress in combining image captioning and visual question-answering (VQA) tasks. We first discuss both image captioning and VQA tasks individually and then the various ways in which both these tasks can be integrated. We also analyze the challenges associated with these tasks and ways to overcome them. We finally discuss the various datasets and evaluation metrics used in these tasks. This paper concludes with the need for generating captions based on the context and captions that are able to answer the most likely asked questions about the image so as to aid the VQA task. Overall, this review highlights the significant progress made in combining image captioning and VQA, as well as the ongoing challenges and opportunities for further research in this exciting and rapidly evolving field, which has the potential to improve the performance of real-world applications such as autonomous vehicles, robotics, and image search.

Keywords: image captioning, visual question answering, deep learning, natural language processing

Procedia PDF Downloads 49

12319 A Text Classification Approach Based on Natural Language Processing and Machine Learning Techniques

Authors: Rim Messaoudi, Nogaye-Gueye Gning, François Azelart

Abstract:

Automatic text classification applies mostly natural language processing (NLP) and other AI-guided techniques to automatically classify text in a faster and more accurate manner. This paper discusses the subject of using predictive maintenance to manage incident tickets inside the sociality. It focuses on proposing a tool that treats and analyses comments and notes written by administrators after resolving an incident ticket. The goal here is to increase the quality of these comments. Additionally, this tool is based on NLP and machine learning techniques to realize the textual analytics of the extracted data. This approach was tested using real data taken from the French National Railways (SNCF) company and was given a high-quality result.

Keywords: machine learning, text classification, NLP techniques, semantic representation

Procedia PDF Downloads 63