Search results for: object constraints language
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 5951

Search results for: object constraints language

4241 The Advancements of Transformer Models in Part-of-Speech Tagging System for Low-Resource Tigrinya Language

Authors: Shamm Kidane, Ibrahim Abdella, Fitsum Gaim, Simon Mulugeta, Sirak Asmerom, Natnael Ambasager, Yoel Ghebrihiwot

Abstract:

The call for natural language processing (NLP) systems for low-resource languages has become more apparent than ever in the past few years, with the arduous challenges still present in preparing such systems. This paper presents an improved dataset version of the Nagaoka Tigrinya Corpus for Parts-of-Speech (POS) classification system in the Tigrinya language. The size of the initial Nagaoka dataset was incremented, totaling the new tagged corpus to 118K tokens, which comprised the 12 basic POS annotations used previously. The additional content was also annotated manually in a stringent manner, followed similar rules to the former dataset and was formatted in CONLL format. The system made use of the novel approach in NLP tasks and use of the monolingually pre-trained TiELECTRA, TiBERT and TiRoBERTa transformer models. The highest achieved score is an impressive weighted F1-score of 94.2%, which surpassed the previous systems by a significant measure. The system will prove useful in the progress of NLP-related tasks for Tigrinya and similarly related low-resource languages with room for cross-referencing higher-resource languages.

Keywords: Tigrinya POS corpus, TiBERT, TiRoBERTa, conditional random fields

Procedia PDF Downloads 92
4240 Against Language Disorder: A Way of Reading Dialects in Yan Lianke’s Novels

Authors: Thuy Hanh Nguyen Thi

Abstract:

By the method of deep reading and text analysis, this article will analyze the use and creation of dialects as a way of demonstrating Yan Lianke's creative stance. This article indicates that this is the writer’s narrative strategy in a fight against aphasia, a language disorder of Chinese people and culture, demonstrating a sense of return to folklore and marks his own linguistic style. In terms of verbal text, the dialect in the Yan Lianke’s novels manifested through the use of words, sentences and dialects. There are two types of dialects that exist in Yan Lianke’s novels: the current dialect system and the particular dialect system of Pa Lau world created by the writer himself in order to enrich the vocabulary of Han Chinese.

Keywords: Yan Lianke , aphasia, dialect, Pa Lou world

Procedia PDF Downloads 119
4239 Role of Speech Articulation in English Language Learning

Authors: Khadija Rafi, Neha Jamil, Laiba Khalid, Meerub Nawaz, Mahwish Farooq

Abstract:

Speech articulation is a complex process to produce intelligible sounds with the help of precise movements of various structures within the vocal tract. All these structures in the vocal tract are named as articulators, which comprise lips, teeth, tongue, and palate. These articulators work together to produce a range of distinct phonemes, which happen to be the basis of language. It starts with the airstream from the lungs passing through the trachea and into oral and nasal cavities. When the air passes through the mouth, the tongue and the muscles around it form such coordination it creates certain sounds. It can be seen when the tongue is placed in different positions- sometimes near the alveolar ridge, soft palate, roof of the mouth or the back of the teeth which end up creating unique qualities of each phoneme. We can articulate vowels with open vocal tracts, but the height and position of the tongue is different every time depending upon each vowel, while consonants can be pronounced when we create obstructions in the airflow. For instance, the alphabet ‘b’ is a plosive and can be produced only by briefly closing the lips. Articulation disorders can not only affect communication but can also be a hurdle in speech production. To improve articulation skills for such individuals, doctors often recommend speech therapy, which involves various kinds of exercises like jaw exercises and tongue twisters. However, this disorder is more common in children who are going through developmental articulation issues right after birth, but in adults, it can be caused by injury, neurological conditions, or other speech-related disorders. In short, speech articulation is an essential aspect of productive communication, which also includes coordination of the specific articulators to produce different intelligible sounds, which are a vital part of spoken language.

Keywords: linguistics, speech articulation, speech therapy, language learning

Procedia PDF Downloads 58
4238 An Investigation of the Integration of Synchronous Online Tools into Task-Based Language Teaching: The Example of SpeakApps

Authors: Nouf Aljohani

Abstract:

The research project described in this presentation focuses on designing and evaluating oral tasks related to students’ needs and levels to foster communication and negotiation of meaning for a group of female Saudi university students. The significance of the current research project lies in its contribution to determining the usefulness of synchronous technology-mediated interactive group discussion in improving different speaking strategies through using synchronous technology. Also, it discovers how to optimize learning outcomes, expand evaluation for online learning tasks and engaging students’ experience in evaluating synchronous interactive tools and tasks. The researcher used SpeakApps, a synchronous technology, that allows the students to practice oral interaction outside the classroom. Such a course of action was considered necessary due to low English proficiency among Saudi students. According to the author's knowledge, the main factor that causes poor speaking skills is that students do not have sufficient time to communicate outside English language classes. Further, speaking and listening course contents are not well designed to match the Saudi learning context. The methodology included designing speaking tasks to match the educational setting; a CALL framework for designing and evaluating tasks; participant involvement in evaluating these tasks in each online session; and an investigation of the factors that led to the successful implementation of Task-based Language Teaching (TBLT) and using SpeakApps. The analysis and data were drawn from the technology acceptance model surveys, a group interview, teachers’ and students’ weekly reflections, and discourse analysis of students’ interactions.

Keywords: CALL evaluation, synchronous technology, speaking skill, task-based language teaching

Procedia PDF Downloads 308
4237 Optimizing Volume Fraction Variation Profile of Bidirectional Functionally Graded Circular Plate under Mechanical Loading to Minimize Its Stresses

Authors: Javad Jamali Khouei, Mohammadreza Khoshravan

Abstract:

Considering that application of functionally graded material is increasing in most industries, it seems necessary to present a methodology for designing optimal profile of structures such as plate under mechanical loading which is highly consumed in industries. Therefore, volume fraction variation profile of functionally graded circular plate which has been considered two-directional is optimized so that stress of structure is minimized. For this purpose, equilibrium equations of two-directional functionally graded circular plate are solved by applying semi analytical-numerical method under mechanical loading and support conditions. By solving equilibrium equations, deflections and stresses are obtained in terms of control variables of volume fraction variation profile. As a result, the problem formula can be defined as an optimization problem by aiming at minimization of critical von-mises stress under constraints of deflections, stress and a physical constraint relating to structure of material. Then, the related problem can be solved with help of one of the metaheuristic algorithms such as genetic algorithm. Results of optimization for the applied model under constraints and loadings and boundary conditions show that functionally graded plate should be graded only in radial direction and there is no need for volume fraction variation of the constituent particles in thickness direction. For validating results, optimal values of the obtained design variables are graphically evaluated.

Keywords: two-directional functionally graded material, single objective optimization, semi analytical-numerical solution, genetic algorithm, graphical solution with contour

Procedia PDF Downloads 275
4236 ExactData Smart Tool For Marketing Analysis

Authors: Aleksandra Jonas, Aleksandra Gronowska, Maciej Ścigacz, Szymon Jadczak

Abstract:

Exact Data is a smart tool which helps with meaningful marketing content creation. It helps marketers achieve this by analyzing the text of an advertisement before and after its publication on social media sites like Facebook or Instagram. In our research we focus on four areas of natural language processing (NLP): grammar correction, sentiment analysis, irony detection and advertisement interpretation. Our research has identified a considerable lack of NLP tools for the Polish language, which specifically aid online marketers. In light of this, our research team has set out to create a robust and versatile NLP tool for the Polish language. The primary objective of our research is to develop a tool that can perform a range of language processing tasks in this language, such as sentiment analysis, text classification, text correction and text interpretation. Our team has been working diligently to create a tool that is accurate, reliable, and adaptable to the specific linguistic features of Polish, and that can provide valuable insights for a wide range of marketers needs. In addition to the Polish language version, we are also developing an English version of the tool, which will enable us to expand the reach and impact of our research to a wider audience. Another area of focus in our research involves tackling the challenge of the limited availability of linguistically diverse corpora for non-English languages, which presents a significant barrier in the development of NLP applications. One approach we have been pursuing is the translation of existing English corpora, which would enable us to use the wealth of linguistic resources available in English for other languages. Furthermore, we are looking into other methods, such as gathering language samples from social media platforms. By analyzing the language used in social media posts, we can collect a wide range of data that reflects the unique linguistic characteristics of specific regions and communities, which can then be used to enhance the accuracy and performance of NLP algorithms for non-English languages. In doing so, we hope to broaden the scope and capabilities of NLP applications. Our research focuses on several key NLP techniques including sentiment analysis, text classification, text interpretation and text correction. To ensure that we can achieve the best possible performance for these techniques, we are evaluating and comparing different approaches and strategies for implementing them. We are exploring a range of different methods, including transformers and convolutional neural networks (CNNs), to determine which ones are most effective for different types of NLP tasks. By analyzing the strengths and weaknesses of each approach, we can identify the most effective techniques for specific use cases, and further enhance the performance of our tool. Our research aims to create a tool, which can provide a comprehensive analysis of advertising effectiveness, allowing marketers to identify areas for improvement and optimize their advertising strategies. The results of this study suggest that a smart tool for advertisement analysis can provide valuable insights for businesses seeking to create effective advertising campaigns.

Keywords: NLP, AI, IT, language, marketing, analysis

Procedia PDF Downloads 82
4235 The Meaning System of Tense: A Systemic Functional Approach

Authors: Cunyu Zhang

Abstract:

Through literature review about studies related to tense, it is found that there exist disagreements on the definition and existence of Chinese tense. Influenced by some researches on English language which regard tense as a grammatical category based on the verbal inflections of English, some Chinese researchers claim that there is no tense in Chinese language as there are no verbal inflections involved. Meanwhile, other Chinese researchers hold that Chinese still has tense although its verbs are non-inflectional based on the fact that Chinese lexical expressions can imply temporal meaning. We assume that the reasons for the above disagreements in terms of Chinese tense lie in the fact that all the previous studies prefer to view language “from the below” which means expressions of tense are the core part of these studies. However, there are about 6,000 languages with distinct expressions all over the world. Hence, if the language studies only concentrate on expressions, it must become more difficult to understand the nature of language. By contrast, functions of languages are similar; otherwise, the human beings could not communicate with each other. Therefore, we believe that it is necessary for us to have a theoretical study on Chinese tense within the framework of SFL which holds that language is a system where meaning is the core part while form is just the realization of meaning. In addition, SFL is a general linguistic providing a universal framework for languages all over the world. Therefore, based on Systemic Functional Linguistics, the paper firstly redefines tense as a deictic semantic category for describing the speaker’s temporal location of processes and relevant temporal relations. With reference to this definition, this study explores the meaning system of tense. It is proposed that tense expresses four kinds of meaning, namely interpersonal, experiential, logical and textual meanings. From the interpersonal angle, tense helps to exchange temporal information between the speaker and the listener, and the temporal information refers to the anchoring of a concerned process in the past, present or future by the speaker. From the experiential angle, tense plays a role in the temporal locating of material, mental, relational, existential, behavioral and verbal processes by the speaker. From the logical angle, tense denotes the temporal relations at the two levels of clause and clause complex, and such relations fall into simultaneity, anteriority and posteriority. From the textual angle, tense refers to the temporal relations at the level of text, and the temporal relations in question concern linear serial relations and synchronous serial relations.

Keywords: Chinese, meaning system, Systemic Functional Linguistics, tense

Procedia PDF Downloads 415
4234 The Identification of Environmentally Friendly People: A Case of South Sumatera Province, Indonesia

Authors: Marpaleni

Abstract:

The intergovernmental Panel on Climate Change (IPCC) declared in 2007 that global warming and climate change are not just a series of events caused by nature, but rather caused by human behaviour. Thus, to reduce the impact of human activities on climate change it is required to have information about how people respond to the environmental issues and what constraints they face. However, information on these and other phenomena remains largely missing, or not fully integrated within the existing data systems. The proposed study is aimed at filling the gap in this knowledge by focusing on Environmentally Friendly Behaviour (EFB) of the people of Indonesia, by taking the province of South Sumatera as a case of study. EFB is defined as any activity in which people engage to improve the conditions of the natural resources and/or to diminish the impact of their behaviour on the environment. This activity is measured in terms of consumption in five areas at the household level, namely housing, energy, water usage, recycling and transportation. By adopting the Indonesia’s Environmentally Friendly Behaviour conducted by Statistics Indonesia in 2013, this study aims to precisely identify one’s orientation towards EFB based on socio demographic characteristics such as: age, income, occupation, location, education, gender and family size. The results of this research will be useful to precisely identify what support people require to strengthen their EFB, to help identify specific constraints that different actors and groups face and to uncover a more holistic understanding of EFB in relation to particular demographic and socio-economics contexts. As the empirical data are examined from the national data sample framework, which will continue to be collected, it can be used to forecast and monitor the future of EFB.

Keywords: environmentally friendly behavior, demographic, South Sumatera, Indonesia

Procedia PDF Downloads 281
4233 Protective Effect of the Histamine H3 Receptor Antagonist DL77 in Behavioral Cognitive Deficits Associated with Schizophrenia

Authors: B. Sadek, N. Khan, D. Łażewska, K. Kieć-Kononowicz

Abstract:

The effects of the non-imidazole histamine H3 receptor (H3R) antagonist DL77 in passive avoidance paradigm (PAP) and novel object recognition (NOR) task in MK801-induced cognitive deficits associated with schizophrenia (CDS) in adult male rats, and applying donepezil (DOZ) as a reference drug were investigated. The results show that acute systemic administration of DL77 (2.5, 5, and 10 mg/kg, i.p.) significantly improved MK801-induced (0.1 mg/kg, i.p.) memory deficits in PAP. The ameliorating activity of DL77 (5 mg/kg, i.p.) in MK801-induced deficits was partly reversed when rats were pretreated with the centrally-acting H2R antagonist zolantidine (ZOL, 10 mg/kg, i.p.) or with the antimuscarinic antagonist scopolamine (SCO, 0.1 mg/kg, i.p.), but not with the CNS penetrant H1R antagonist pyrilamine (PYR, 10 mg/kg, i.p.). Moreover, the memory enhancing effect of DL77 (5 mg/kg, i.p.) in MK801-induced memory deficits in PAP was strongly reversed when rats were pretreated with a combination of ZOL (10 mg/kg, i.p.) and SCO (1.0 mg/kg, i.p.). Furthermore, the significant ameliorative effect of DL77 (5 mg/kg, i.p.) on MK801-induced long-term memory (LTM) impairment in NOR test was comparable to the DOZ-provided memory-enhancing effect, and was abrogated when animals were pretreated with the histamine H3R agonist R-(α)-methylhistamine (RAMH, 10 mg/kg, i.p.). However, DL77(5 mg/kg, i.p.) failed to provide procognitive effect on MK801-induced short-term memory (STM) impairment in NOR test. In addition, DL77 (5 mg/kg) did not alter anxiety levels and locomotor activity of animals naive to elevated-plus maze (EPM), demonstrating that improved performances with DL77 (5 mg/kg) in PAP or NOR are unrelated to changes in emotional responding or spontaneous locomotor activity. These results provide evidence for the potential of H3Rs for the treatment of neurodegenerative disorders related to impaired memory function, e.g. CDS.

Keywords: histamine H3 receptor, antagonist, learning, memory impairment, passive avoidance paradigm, novel object recognition

Procedia PDF Downloads 198
4232 Adjunct Placement in Educated Nigerian English

Authors: Juliet Charles Udoudom

Abstract:

In nonnative language use environments, language users have been known to demonstrate marked variations both in the spoken and written productions of the target language. For instance, analyses of the written productions of Nigerian users of English have shown inappropriate sequencing of sentence elements resulting in distortions in meaning and/or other problems of syntax. This study analyses the structure of sentences in the written production of 450 educated Nigerian users of English to establish their sensitivity to adjunct placement and the extent to which it exerts on meaning interpretation. The respondents were selected by a stratified random sampling technique from six universities in south-south Nigeria using education as the main yardstick for stratification. The systemic functional grammar analytic format was used in analyzing the sentences selected from the corpus. Findings from the analyses indicate that of the 8,576 tokens of adjuncts in the entire corpus, 4,550 (53.05%) of circumstantial adjuncts were appropriately placed while 2,839 (33.11%) of modal adjuncts occurred at appropriate locations in the clauses analyzed. Conjunctive adjunct placement accounted for 1,187 occurrences, representing 13.84% of the entire corpus. Further findings revealed that prepositional phrases (PPs) were not well construed by respondents to be capable of realizing adjunct functions, and were inappropriately placed.

Keywords: adjunct, adjunct placement, conjunctive adjunct, circumstantial adjunct, systemic grammar

Procedia PDF Downloads 5
4231 Investigating the Influence of Critical Thinking Skills on Learning Achievement among Higher Education Students in Foreign Language Programs

Authors: Mostafa Fanaei, Shahram R. Sistani, Athare Nazri-Panjaki

Abstract:

Introduction: Critical thinking skills are increasingly recognized as vital for academic success, particularly in higher education. This study examines the influence of critical thinking on learning achievement among undergraduate and master's students enrolled in foreign language programs. By investigating this correlation, educators can gain valuable insights into optimizing teaching methodologies and enhancing academic outcomes. Methods: This cross-sectional study involved 150 students from the Shahid Bahonar University of Kerman, recruited via random sampling. Participants completed the Critical Thinking Questionnaire (CThQ), assessing dimensions such as analysis, evaluation, creation, remembering, understanding, and application. Academic performance was measured using the students' GPA (0-20). Results: The participants' mean age was 21.46 ± 5.2 years, with 62.15% being female. The mean scores for critical thinking subscales were as follows: Analyzing (13.2 ± 3.5), Evaluating (12.8 ± 3.4), Creating (18.6 ± 4.8), Remembering (9.4 ± 2.1), Understanding (12.9 ± 3.3), and Applying (12.5 ± 3.2). The overall critical thinking score was 79.4 ± 18.1, and the average GPA was 15.7 ± 2.4. Significant positive correlations were found between GPA and several critical thinking subscales: Analyzing (r = 0.45, p = 0.013), Creating (r = 0.52, p < 0.001), Remembering (r = 0.29, p = 0.021), Understanding (r = 0.41, p = 0.002), and the overall CThQ score (r = 0.54, p = 0.043). Conclusion: The study demonstrates a significant positive relationship between critical thinking skills and learning achievement in foreign language programs. Enhancing critical thinking skills through educational interventions could potentially improve academic performance. Further research is recommended to explore the underlying mechanisms and long-term impacts of critical thinking on academic success.

Keywords: critical thinking, learning achievement, higher education, foreign language programs, student success

Procedia PDF Downloads 30
4230 Subjective Mapping Methodologies: Mapping Local Perceptions with Geographic Information Systems

Authors: A. Llopis Alvarez, D. Muller-Eie

Abstract:

Participatory GIS (geographic information systems) are designed for community mapping exercises in order to produce spatial representations of local knowledge. Ideally, participatory GIS caters to public participation through the use of spatial data in order to increase community-led policy-and decision-making. Having defined a spatial object, such as a neighborhood, subjective mapping involves attaining a description of the spatial, physical, social and psychological characteristics of that spatial object. This paper highlights an emerging appreciation of the subjective component, particularly in spatial analyses. The beliefs, feelings, and behaviors associated with an urban area reflect its sense of place for an individual or a group. It is important therefore to understand what types of beliefs, emotions, and behavioral patterns are relevant to particular resident, groups and urban scales. In this sense, resident’s emotional attachment to their urban areas motivates civic engagement and facilitates awareness of its strengths and its problems. Similarly, subjective perceptions act in complex ways to influence the formation and maintenance of social identity and quality of life. This paper reports on findings from a case study of immigrant population in Norwegian cities, their residential conditions and their relationship to quality of urban life. Cognitive mapping methodologies are used in this study to understand local perceptions of urban qualities. Thus, measures to alleviate disadvantages and improve quality of urban life are more likely to be effective when they are informed by an understanding of a place as constructed by those who live in it, meaning their subjective perceptions about it.

Keywords: mapping methodologies, participatory GIS, perceptual maps, public participation, spatial analysis, subjective perceptions

Procedia PDF Downloads 142
4229 Neural Machine Translation for Low-Resource African Languages: Benchmarking State-of-the-Art Transformer for Wolof

Authors: Cheikh Bamba Dione, Alla Lo, Elhadji Mamadou Nguer, Siley O. Ba

Abstract:

In this paper, we propose two neural machine translation (NMT) systems (French-to-Wolof and Wolof-to-French) based on sequence-to-sequence with attention and transformer architectures. We trained our models on a parallel French-Wolof corpus of about 83k sentence pairs. Because of the low-resource setting, we experimented with advanced methods for handling data sparsity, including subword segmentation, back translation, and the copied corpus method. We evaluate the models using the BLEU score and find that transformer outperforms the classic seq2seq model in all settings, in addition to being less sensitive to noise. In general, the best scores are achieved when training the models on word-level-based units. For subword-level models, using back translation proves to be slightly beneficial in low-resource (WO) to high-resource (FR) language translation for the transformer (but not for the seq2seq) models. A slight improvement can also be observed when injecting copied monolingual text in the target language. Moreover, combining the copied method data with back translation leads to a substantial improvement of the translation quality.

Keywords: backtranslation, low-resource language, neural machine translation, sequence-to-sequence, transformer, Wolof

Procedia PDF Downloads 142
4228 When the Rubber Hits the Road: The Enactment of Well-Intentioned Language Policy in Digital vs. In Situ Spaces on Washington, DC Public Transportation

Authors: Austin Vander Wel, Katherin Vargas Henao

Abstract:

Washington, DC, is a city in which Spanish, along with several other minority languages, is prevalent not only among tourists but also those living within city limits. In response to this linguistic diversity and DC’s adoption of the Language Access Act in 2004, the Washington Metropolitan Area Transit Authority (WMATA) committed to addressing the need for equal linguistic representation and established a five-step plan to provide the best multilingual information possible for public transportation users. The current study, however, strongly suggests that this de jure policy does not align with the reality of Spanish’s representation on DC public transportation–although perhaps doing so in an unexpected way. In order to investigate Spanish’s de facto representation and how it contrasts with de jure policy, this study implements a linguistic landscapes methodology that takes critical language-policy as its theoretical framework (Tollefson, 2005). Specifically concerning de facto representation, it focuses on the discrepancies between digital spaces and the actual physical spaces through which users travel. These digital vs. in situ conditions are further analyzed by separately addressing aural and visual modalities. In digital spaces, data was collected from WMATA’s website (visual) and their bilingual hotline (aural). For in situ spaces, both bus and metro areas of DC public transportation were explored, with signs comprising the visual modality and recordings, driver announcements, and interactions with metro kiosk workers comprising the aural modality. While digital spaces were considered to successfully fulfill WMATA’s commitment to representing Spanish as outlined in the de jure policy, physical spaces show a large discrepancy between what is said and what is done, particularly regarding the bus system, in addition to the aural modality overall. These discrepancies in situ spaces place Spanish speakers at a clear disadvantage, demanding additional resources and knowledge on the part of residents with limited or no English proficiency in order to have equal access to this public good. Based on our critical language-policy analysis, while Spanish is represented as a right in the de jure policy, its implementation in situ clearly portrays Spanish as a problem since those seeking bilingual information can not expect it to be present when and where they need it most (Ruíz, 1984; Tollefson, 2005). This study concludes with practical, data-based steps to improve the current situation facing DC’s public transportation context and serves as a model for responding to inadequate enactment of de jure policy in other language policy settings.

Keywords: Urban landscape, language access, critical-language policy, spanish, public transportation

Procedia PDF Downloads 71
4227 Thinking for Writing: Evidence of Language Transfer in Chinese ESL Learners’ Written Narratives

Authors: Nan Yang, Hye Pae

Abstract:

English as a second language (ESL) learners are often observed to have transferred traits of their first languages (L1) and habits of using their L1s to their use of English (second language, L2), and this phenomenon is coined as language transfer. In addition to the transfer of linguistic features (e.g., grammar, vocabulary, etc.), which are relatively easy to observe and quantify, many cross-cultural theorists emphasized on a much subtle and fundamental transfer existing on a higher conceptual level that is referred to as conceptual transfer. Although a growing body of literature in linguistics has demonstrated evidence of L1 transfer in various discourse genres, very limited studies address the underlying conceptual transfer that is happening along with the language transfer, especially with the extended form of spontaneous discourses such as personal narrative. To address this issue, this study situates itself in the context of Chinese ESL learners’ written narratives, examines evidence of L1 conceptual transfer in comparison with native English speakers’ narratives, and provides discussion from the perspective of the conceptual transfer. It is hypothesized that Chinese ESL learners’ English narrative strategies are heavily influenced by the strategies that they use in Chinese as a result of the conceptual transfer. Understanding language transfer cognitively is of great significance in the realm of SLA, as it helps address challenges that ESL learners around the world are facing; allow native English speakers to develop a better understanding about how and why learners’ English is different; and also shed light in ESL pedagogy by providing linguistic and cultural expectations in native English-speaking countries. To achieve the goals, 40 college students were recruited (20 Chinese ESL learners and 20 native English speakers) in the United States, and their written narratives on the prompt 'The most frightening experience' were collected for quantitative discourse analysis. 40 written narratives (20 in Chinese and 20 in English) were collected from Chinese ESL learners, and 20 written narratives were collected from native English speakers. All written narratives were coded according to the coding scheme developed by the authors prior to data collection. Statistical descriptive analyses were conducted, and the preliminary results revealed that native English speakers included more narrative elements such as events and explicit evaluation comparing to Chinese ESL students’ both English and Chinese writings; the English group also utilized more evaluation device (i.e., physical state expressions, indirectly reported speeches, delineation) than Chinese ESL students’ both English and Chinese writings. It was also observed that Chinese ESL students included more orientation elements (i.e., the introduction of time/place, the introduction of character) in their Chinese and English writings than the native English-speaking participants. The findings suggest that a similar narrative strategy was observed in Chinese ESL learners’ Chinese narratives and English narratives, which is considered as the evidence of conceptual transfer from Chinese (L1) to English (L2). The results also indicate that distinct narrative strategies were used by Chinese ESL learners and native English speakers as a result of cross-cultural differences.

Keywords: Chinese ESL learners, language transfer, thinking-for-speaking, written narratives

Procedia PDF Downloads 115
4226 The Role of Reading Self-Efficacy and Perception of Difficulty in English Reading among Chinese ESL Learners

Authors: Kevin Chan, Kevin K. H. Chung, Patcy P. S. Yeung, H. L. Ip, Bill T. C. Chung, Karen M. K. Chung

Abstract:

Purpose: Recent evidence shows that reading self-efficacy and students perceived difficulty in reading are significantly associated with word reading and reading fluency. However, little is known about these relationships among students learning to read English as a second language, particularly in Chinese students. This study examined the contributions of reading self-efficacy, perception of difficulty in reading, and cognitive-linguistic skills to performance on English word reading and reading fluency in Chinese students. Method: A sample of 122 second-and third-grade students in Hong Kong, China, participated in this study. Students completed the measures of reading self-efficacy and perception of difficulty in reading. They were assessed on their English cognitive-linguistic and reading skills: rapid automatized naming, nonword reading, phonological awareness, word reading, and one-minute word reading. Results: Results of path analysis indicated that when students’ grades were controlled, reading self-efficacy was a significant correlate of word reading and reading fluency, whereas perception of difficulty in reading negatively predicted word reading. Conclusion: These findings underscore the importance of taking students’ reading self-efficacy and perception of difficulty in reading and their cognitive-linguistic skills into consideration when designing reading intervention and instructions for students learning English as a second language.

Keywords: self-efficacy, perception of difficulty in reading, english as a second language, word reading

Procedia PDF Downloads 185
4225 Event Data Representation Based on Time Stamp for Pedestrian Detection

Authors: Yuta Nakano, Kozo Kajiwara, Atsushi Hori, Takeshi Fujita

Abstract:

In association with the wave of electric vehicles (EV), low energy consumption systems have become more and more important. One of the key technologies to realize low energy consumption is a dynamic vision sensor (DVS), or we can call it an event sensor, neuromorphic vision sensor and so on. This sensor has several features, such as high temporal resolution, which can achieve 1 Mframe/s, and a high dynamic range (120 DB). However, the point that can contribute to low energy consumption the most is its sparsity; to be more specific, this sensor only captures the pixels that have intensity change. In other words, there is no signal in the area that does not have any intensity change. That is to say, this sensor is more energy efficient than conventional sensors such as RGB cameras because we can remove redundant data. On the other side of the advantages, it is difficult to handle the data because the data format is completely different from RGB image; for example, acquired signals are asynchronous and sparse, and each signal is composed of x-y coordinate, polarity (two values: +1 or -1) and time stamp, it does not include intensity such as RGB values. Therefore, as we cannot use existing algorithms straightforwardly, we have to design a new processing algorithm to cope with DVS data. In order to solve difficulties caused by data format differences, most of the prior arts make a frame data and feed it to deep learning such as Convolutional Neural Networks (CNN) for object detection and recognition purposes. However, even though we can feed the data, it is still difficult to achieve good performance due to a lack of intensity information. Although polarity is often used as intensity instead of RGB pixel value, it is apparent that polarity information is not rich enough. Considering this context, we proposed to use the timestamp information as a data representation that is fed to deep learning. Concretely, at first, we also make frame data divided by a certain time period, then give intensity value in response to the timestamp in each frame; for example, a high value is given on a recent signal. We expected that this data representation could capture the features, especially of moving objects, because timestamp represents the movement direction and speed. By using this proposal method, we made our own dataset by DVS fixed on a parked car to develop an application for a surveillance system that can detect persons around the car. We think DVS is one of the ideal sensors for surveillance purposes because this sensor can run for a long time with low energy consumption in a NOT dynamic situation. For comparison purposes, we reproduced state of the art method as a benchmark, which makes frames the same as us and feeds polarity information to CNN. Then, we measured the object detection performances of the benchmark and ours on the same dataset. As a result, our method achieved a maximum of 7 points greater than the benchmark in the F1 score.

Keywords: event camera, dynamic vision sensor, deep learning, data representation, object recognition, low energy consumption

Procedia PDF Downloads 91
4224 Simo-syl: A Computer-Based Tool to Identify Language Fragilities in Italian Pre-Schoolers

Authors: Marinella Majorano, Rachele Ferrari, Tamara Bastianello

Abstract:

The recent technological advance allows for applying innovative and multimedia screen-based assessment tools to test children's language and early literacy skills, monitor their growth over the preschool years, and test their readiness for primary school. Several are the advantages that a computer-based assessment tool offers with respect to paper-based tools. Firstly, computer-based tools which provide the use of games, videos, and audio may be more motivating and engaging for children, especially for those with language difficulties. Secondly, computer-based assessments are generally less time-consuming than traditional paper-based assessments: this makes them less demanding for children and provides clinicians and researchers, but also teachers, with the opportunity to test children multiple times over the same school year and, thus, to monitor their language growth more systematically. Finally, while paper-based tools require offline coding, computer-based tools sometimes allow obtaining automatically calculated scores, thus producing less subjective evaluations of the assessed skills and provide immediate feedback. Nonetheless, using computer-based assessment tools to test meta-phonological and language skills in children is not yet common practice in Italy. The present contribution aims to estimate the internal consistency of a computer-based assessment (i.e., the Simo-syl assessment). Sixty-three Italian pre-schoolers aged between 4;10 and 5;9 years were tested at the beginning of the last year of the preschool through paper-based standardised tools in their lexical (Peabody Picture Vocabulary Test), morpho-syntactical (Grammar Repetition Test for Children), meta-phonological (Meta-Phonological skills Evaluation test), and phono-articulatory skills (non-word repetition). The same children were tested through Simo-syl assessment on their phonological and meta-phonological skills (e.g., recognise syllables and vowels and read syllables and words). The internal consistency of the computer-based tool was acceptable (Cronbach's alpha = .799). Children's scores obtained in the paper-based assessment and scores obtained in each task of the computer-based assessment were correlated. Significant and positive correlations emerged between all the tasks of the computer-based assessment and the scores obtained in the CMF (r = .287 - .311, p < .05) and in the correct sentences in the RCGB (r = .360 - .481, p < .01); non-word repetition standardised test significantly correlates with the reading tasks only (r = .329 - .350, p < .05). Further tasks should be included in the current version of Simo-syl to have a comprehensive and multi-dimensional approach when assessing children. However, such a tool represents a good chance for the teachers to early identifying language-related problems even in the school environment.

Keywords: assessment, computer-based, early identification, language-related skills

Procedia PDF Downloads 179
4223 Passive Voice in SLA: Armenian Learners’ Case Study

Authors: Emma Nemishalyan

Abstract:

It is believed that learners’ mother tongue (L1 hereafter) has a huge impact on their second language acquisition (L2 hereafter). This hypothesis has been exposed to both positive and negative criticism. Based on research results of a wide range of learners’ corpora (Chinese, Japanese, Spanish among others) the hypothesis has either been proved or disproved. However, no such study has been conducted on the Armenian learners. The aim of this paper is to understand the implication of the hypothesis on the Armenian learners’ corpus in terms of the use of the passive voice. To this end, the method of Contrastive Interlanguage Analysis (hereafter CIA) has been used on native speakers’ corpus (Louvain Corpus of Native English Essays (LOCNESS)) and Armenian learners’ corpus which has been compiled by me in compliance with International Corpus of Learner English (ICLE) guidelines. CIA compares the interlanguage (the language produced by learners) with the one produced by native speakers. With the help of this method, it is possible not only to highlight the mistakes that learners make, but also to underline the under or overuses. The choice of the grammar issue (passive voice) is conditioned by the fact that typologically Armenian and English are drastically different as they belong to different branches. Moreover, the passive voice is considered to be one of the most problematic grammar topics to be acquired by learners of the English language. Based on this difference, we hypothesized that Armenian learners would either overuse or underuse some types of the passive voice. With the help of Lancsbox software, we have identified the frequency rates of passive voice usage in LOCNESS and Armenian learners’ corpus to understand whether the latter have the same usage pattern of the passive voice as the native speakers. Secondly, we have identified the types of the passive voice used by the Armenian leaners trying to track down the reasons in their mother tongue. The results of the study showed that Armenian learners underused the passive voices in contrast to native speakers. Furthermore, the hypothesis that learners’ L1 has an impact on learners’ L2 acquisition and production was proved.

Keywords: corpus linguistics, applied linguistics, second language acquisition, corpus compilation

Procedia PDF Downloads 100
4222 Implementing a Database from a Requirement Specification

Authors: M. Omer, D. Wilson

Abstract:

Creating a database scheme is essentially a manual process. From a requirement specification, the information contained within has to be analyzed and reduced into a set of tables, attributes and relationships. This is a time-consuming process that has to go through several stages before an acceptable database schema is achieved. The purpose of this paper is to implement a Natural Language Processing (NLP) based tool to produce a from a requirement specification. The Stanford CoreNLP version 3.3.1 and the Java programming were used to implement the proposed model. The outcome of this study indicates that the first draft of a relational database schema can be extracted from a requirement specification by using NLP tools and techniques with minimum user intervention. Therefore, this method is a step forward in finding a solution that requires little or no user intervention.

Keywords: information extraction, natural language processing, relation extraction

Procedia PDF Downloads 256
4221 Unmasking Theatrical Language: Exploring Ideological Connections in American Theater

Authors: Gizem Barreto Martins

Abstract:

This paper explores the subversive potential inherent in the theatrical language employed within Arthur Miller's The Crucible. The research argues that this play intricately weaves ideological connections with its audience and the historical epoch it represents, effectively serving as a channel for ideological and cultural interaction potentially exerting subversive influences on social and political realms. Using a historical-materialist methodology that situates the play within its historical and political context, all while examining its connections with theater and literary theories, the paper raises a fundamental query: How does this dramatic work embody subversion, presenting a style unburdened by the performative conventions of daily life and prevailing codes and systems of representation? In response to this inquiry, the study asserts that theatrical language has the capacity to function as a subversive catalyst against prevailing ideologies, actively contributing to the process of social transformation. To substantiate this claim, the research conducts a detailed analysis of the selected play, employing the semiotic framework pioneered by Gilles Deleuze and Felix Guattari.

Keywords: arthur miller, The crucible, gilles deleuze, felix guattari, theater and literary theories

Procedia PDF Downloads 59
4220 Teacher Training for Bilingual Education of Deaf Students in Brazil

Authors: Mara Aparecida De Castilho Lopes. Maria Eliza Mattosinho Bernardes

Abstract:

The education of deaf individuals in Brazil is grounded in the bilingual approach, which presupposes Brazilian Sign Language (Libras) as the first language for these students. In this perspective, Portuguese should be taught as a second language in its written form, ensuring that deaf students also have access to various academic subjects in sign language. Brazilian legislation (Federal Decree No. 5626 of 2005) mandates the teaching of Brazilian Sign Language in university teacher training programs, but there is no pre-established minimum workload. As a result, there is a significant disparity in the teaching and quality of teacher education across the Brazilian territory. Added to this fact is the general lack of awareness within society regarding the linguistic status of Libras, leading to a shortage of competent teachers for its use and instruction, particularly in higher education. Recently, Federal Law No. 14191 of 2021 established bilingual education for the deaf as a mode of instruction, indicating the need for adjustments in teacher training within higher education teacher preparation programs. Given this context, the objective of the present study was to analyze the teaching proposals for Brazilian Sign Language for students in teacher training programs at public universities in Brazil, presenting alternatives to overcome the current models and academic pathways of teaching and learning. In addition to analyzing Brazilian teaching models, an analysis of a continuing education model for teachers in a French institution was also conducted - considering the historical Franco-Brazilian path of deaf education in Brazil. The analysis of the current teacher training model for deaf education in Brazil revealed that initial exposure to sign language and its linguistic structure is not sufficient to provide future teachers with opportunities to reflect on bilingual teaching methods and practices, as seen in other definitions of bilingualism - bilingual education for proficient listeners in two oral languages. As a result, a training proposal was developed for an experimental interdisciplinary course, integrating the curriculum of an initial and continuing teacher training program alongside the Alfredo Bossi Chair at the University of São Paulo. This proposal is structured into three disciplines, which constitute consecutive moments in teacher education: Fundamental Aspects of Brazilian Sign Language, Bilingual Teaching Methodology, and Teaching Investigation Project - interdisciplinary engagement in the field of deafness. The last offered discipline represents an interdisciplinary supervised internship proposal, considering the multi-professional context that constitutes deaf education within a bilingual approach. In interdisciplinary work within the field of deafness, dialogue between teachers and other professionals who work with deaf students from different perspectives - teachers, speech therapists, and sign language interpreters - is frequently necessary. Through alternative avenues, these actions aim to direct the linguistic development of deaf students within their learning processes. Based on the innovative curriculum proposal described here, the intention is to contribute to the enhancement of teacher education in Brazil, with the goal of ensuring bilingual education for deaf students.

Keywords: bilingual education, teacher training, historical-cultural approach, interdisciplinary education, inclusive education

Procedia PDF Downloads 84
4219 Multilingualism in Medieval Romance: A French Case Study

Authors: Brindusa Grigoriu

Abstract:

Inscribing itself in the field of the history of multilingual communities with a focus on the evolution of language didactics, our paper aims at providing a pragmatic-interactional approach on a corpus proposing to scholars of the international scientific community a relevant text of early modern European literature: the first romance in French, The Conte of Flore and Blanchefleur by Robert d’Orbigny (1150). The multicultural context described by the romance is one in which an Arab-speaking prince, Floire, and his Francophone protégée, Blanchefleur, learn Latin together at the court of Spain and become fluent enough to turn it into the language of their love. This learning process is made up of interactional patterns of affective relevance, in which the proficiency of the protagonists in the domain of emotive acts becomes a matter of linguistic and pragmatic emulation. From five to ten years old, the pupils are efficiently stimulated by their teacher of Latin, Gaidon – a Moorish scholar of the royal entourage – to cultivate their competencies of oral expression and reading comprehension (of Antiquity classics), while enjoying an ever greater freedom of written expression, including the composition of love poems in this second language of culture and emotional education. Another relevant parameter of the educational process at court is that Latin shares its prominent role as a language of culture with French, whose exemplary learner is the (Moorish) queen herself. Indeed, the adult 'First lady' strives to become a pupil benefitting from lifelong learning provided by a fortuitous slave-teacher with little training, her anonymous chambermaid and Blanchefleur’s mother, who, despite her status of a war trophy, enjoys her Majesty’s confidence as a cultural agent of change in linguistic and theological fields. Thus, the two foreign languages taught at Spains’s court, Latin and French – as opposed to Arabic -, suggest a spiritual authority allowing the mutual enrichment of intercultural pioneers of cross-linguistic communication, in the aftermath of religious wars. Durably, and significantly – if not everlastingly – the language of physical violence rooted in intra-cultural solipsism is replaced by two Romance languages which seem to embody, together and yet distinctly, the parlance of peace-making.

Keywords: multilingualism, history of European language learning, French and Latin learners, multicultural context of medieval romance

Procedia PDF Downloads 134
4218 Effect of Distance Education Students Motivation with the Turkish Language and Literature Course

Authors: Meva Apaydin, Fatih Apaydin

Abstract:

Role of education in the development of society is great. Teaching and training started with the beginning of the history and different methods and techniques which have been applied as the time passed and changed everything with the aim of raising the level of learning. In addition to the traditional teaching methods, technology has been used in recent years. With the beginning of the use of internet in education, some problems which could not be soluted till that time has been dealt and it is inferred that it is possible to educate the learners by using contemporary methods as well as traditional methods. As an advantage of technological developments, distance education is a system which paves the way for the students to be educated individually wherever and whenever they like without the needs of physical school environment. Distance education has become prevalent because of the physical inadequacies in education institutions, as a result; disadvantageous circumstances such as social complexities, individual differences and especially geographical distance disappear. What’s more, the high-speed of the feedbacks between teachers and learners, improvement in student motivation because there is no limitation of time, low-cost, the objective measuring and evaluation are on foreground. In spite of the fact that there is teaching beneficences in distance education, there are also limitations. Some of the most important problems are that : Some problems which are highly possible to come across may not be solved in time, lack of eye-contact between the teacher and the learner, so trust-worthy feedback cannot be got or the problems stemming from the inadequate technological background are merely some of them. Courses are conducted via distance education in many departments of the universities in our country. In recent years, giving lectures such as Turkish Language, English, and History in the first grades of the academic departments in the universities is an application which is constantly becoming prevalent. In this study, the application of Turkish Language course via distance education system by analyzing advantages and disadvantages of the distance education system which is based on internet.

Keywords: distance education, Turkish language, motivation, benefits

Procedia PDF Downloads 432
4217 Gender Differences in Communication Styles: An Analysis of the Language of Earnings Conference Calls

Authors: Chiara De Amicis, Sonia Falconieri, Mesut Tastan

Abstract:

In this study, we analyze the language employed by Chief Executive Officers (CEOs) and Chief Financial Officers (CFOs) during earnings conference calls from a gender perspective. We find evidences that conference calls held by female CEOs and/or CFOs exhibit a higher level of optimism compared to conference calls held by male CEOs and/or CFOs. Moreover, female managers tend to present and discuss firm performances with less vagueness as compared to their male colleagues. We then observe the market reaction around each earnings conference call: while manager optimism is perceived as a good signal by investors, manager vagueness significantly dampens the market reaction around the call. Whether the gender of the CEO and/or the CFO delivering the conference call affects investors’ perceptions about the firm performance is still an open question. Some evidences show that the language employed by female managers conveys more valuable information for market participants as compared to the language employed by their male counterparts. This study contributes to a growing literature in finance and accounting that uses textual analysis to assess the informativeness of corporate disclosure. To our knowledge, this is the first paper that aims at answering the question whether the gender of firm’s top managers does matter when it comes to assess the informativeness of corporate spoken communication. We believe that our results will be of relevance for future research in the field. Moreover, our evidence may be used in support of the debate if a larger participation by women in the management of companies should be encouraged or not.

Keywords: conference calls, even study, gender, market reaction, textual analysis

Procedia PDF Downloads 192
4216 Ambiguity-Identification Prompting for Large Language Model to Better Understand Complex Legal Texts

Authors: Haixu Yu, Wenhui Cao

Abstract:

Tailoring Large Language Models (LLMs) to perform legal reasoning has been a popular trend in the study of AI and law. Researchers have mainly employed two methods to unlock the potential of LLMs, namely by finetuning the LLMs to expand their knowledge of law and by restructuring the prompts (In-Context Learning) to optimize the LLMs’ understanding of the legal questions. Although claiming the finetuning and renovated prompting can make LLMs more competent in legal reasoning, most state-of-the-art studies show quite limited improvements of practicability. In this paper, drawing on the study of the complexity and low interpretability of legal texts, we propose a prompting strategy based on the Chain of Thought (CoT) method. Instead of merely instructing the LLM to reason “step by step”, the prompting strategy requires the tested LLM to identify the ambiguity in the questions as the first step and then allows the LLM to generate corresponding answers in line with different understandings of the identified terms as the following step. The proposed prompting strategy attempts to encourage LLMs to "interpret" the given text from various aspects. Experiments that require the LLMs to answer “case analysis” questions of bar examination with general LLMs such as GPT 4 and legal LLMs such as LawGPT show that the prompting strategy can improve LLMs’ ability to better understand complex legal texts.

Keywords: ambiguity-identification, prompt, large language model, legal text understanding

Procedia PDF Downloads 54
4215 Language Development and Growing Spanning Trees in Children Semantic Network

Authors: Somayeh Sadat Hashemi Kamangar, Fatemeh Bakouie, Shahriar Gharibzadeh

Abstract:

In this study, we target to exploit Maximum Spanning Trees (MST) of children's semantic networks to investigate their language development. To do so, we examine the graph-theoretic properties of word-embedding networks. The networks are made of words children learn prior to the age of 30 months as the nodes and the links which are built from the cosine vector similarity of words normatively acquired by children prior to two and a half years of age. These networks are weighted graphs and the strength of each link is determined by the numerical similarities of the two words (nodes) on the sides of the link. To avoid changing the weighted networks to the binaries by setting a threshold, constructing MSTs might present a solution. MST is a unique sub-graph that connects all the nodes in such a way that the sum of all the link weights is maximized without forming cycles. MSTs as the backbone of the semantic networks are suitable to examine developmental changes in semantic network topology in children. From these trees, several parameters were calculated to characterize the developmental change in network organization. We showed that MSTs provides an elegant method sensitive to capture subtle developmental changes in semantic network organization.

Keywords: maximum spanning trees, word-embedding, semantic networks, language development

Procedia PDF Downloads 141
4214 Power Energy Management For A Grid-Connected PV System Using Rule-Base Fuzzy Logic

Authors: Nousheen Hashmi, Shoab Ahmad Khan

Abstract:

Active collaboration among the green energy sources and the load demand leads to serious issues related to power quality and stability. The growing number of green energy resources and Distributed-Generators need newer strategies to be incorporated for their operations to keep the power energy stability among green energy resources and micro-grid/Utility Grid. This paper presents a novel technique for energy power management in Grid-Connected Photovoltaic with energy storage system under set of constraints including weather conditions, Load Shedding Hours, Peak pricing Hours by using rule-based fuzzy smart grid controller to schedule power coming from multiple Power sources (photovoltaic, grid, battery) under the above set of constraints. The technique fuzzifies all the inputs and establishes fuzzify rule set from fuzzy outputs before defuzzification. Simulations are run for 24 hours period and rule base power scheduler is developed. The proposed fuzzy controller control strategy is able to sense the continuous fluctuations in Photovoltaic power generation, Load Demands, Grid (load Shedding patterns) and Battery State of Charge in order to make correct and quick decisions.The suggested Fuzzy Rule-based scheduler can operate well with vague inputs thus doesn’t not require any exact numerical model and can handle nonlinearity. This technique provides a framework for the extension to handle multiple special cases for optimized working of the system.

Keywords: photovoltaic, power, fuzzy logic, distributed generators, state of charge, load shedding, membership functions

Procedia PDF Downloads 476
4213 Optimal Seismic Design of Reinforced Concrete Shear Wall-Frame Structure

Authors: H. Nikzad, S. Yoshitomi

Abstract:

In this paper, the optimal seismic design of reinforced concrete shear wall-frame building structures was done using structural optimization. The optimal section sizes were generated through structural optimization based on linear static analysis conforming to American Concrete Institute building design code (ACI 318-14). An analytical procedure was followed to validate the accuracy of the proposed method by comparing stresses on structural members through output files of MATLAB and ETABS. In order to consider the difference of stresses in structural elements by ETABS and MATLAB, and to avoid over-stress members by ETABS, a stress constraint ratio of MATLAB to ETABS was modified and introduced for the most critical load combinations and structural members. Moreover, seismic design of the structure was done following the International Building Code (IBC 2012), American Concrete Institute Building Code (ACI 318-14) and American Society of Civil Engineering (ASCE 7-10) standards. Typical reinforcement requirements for the structural wall, beam and column were discussed and presented using ETABS structural analysis software. The placement and detailing of reinforcement of structural members were also explained and discussed. The outcomes of this study show that the modification of section sizes play a vital role in finding an optimal combination of practical section sizes. In contrast, the optimization problem with size constraints has a higher cost than that of without size constraints. Moreover, the comparison of optimization problem with that of ETABS program shown to be satisfactory and governed ACI 318-14 building design code criteria.

Keywords: structural optimization, seismic design, linear static analysis, etabs, matlab, rc shear wall-frame structures

Procedia PDF Downloads 171
4212 Unsupervised Part-of-Speech Tagging for Amharic Using K-Means Clustering

Authors: Zelalem Fantahun

Abstract:

Part-of-speech tagging is the process of assigning a part-of-speech or other lexical class marker to each word into naturally occurring text. Part-of-speech tagging is the most fundamental and basic task almost in all natural language processing. In natural language processing, the problem of providing large amount of manually annotated data is a knowledge acquisition bottleneck. Since, Amharic is one of under-resourced language, the availability of tagged corpus is the bottleneck problem for natural language processing especially for POS tagging. A promising direction to tackle this problem is to provide a system that does not require manually tagged data. In unsupervised learning, the learner is not provided with classifications. Unsupervised algorithms seek out similarity between pieces of data in order to determine whether they can be characterized as forming a group. This paper explicates the development of unsupervised part-of-speech tagger using K-Means clustering for Amharic language since large amount of data is produced in day-to-day activities. In the development of the tagger, the following procedures are followed. First, the unlabeled data (raw text) is divided into 10 folds and tokenization phase takes place; at this level, the raw text is chunked at sentence level and then into words. The second phase is feature extraction which includes word frequency, syntactic and morphological features of a word. The third phase is clustering. Among different clustering algorithms, K-means is selected and implemented in this study that brings group of similar words together. The fourth phase is mapping, which deals with looking at each cluster carefully and the most common tag is assigned to a group. This study finds out two features that are capable of distinguishing one part-of-speech from others these are morphological feature and positional information and show that it is possible to use unsupervised learning for Amharic POS tagging. In order to increase performance of the unsupervised part-of-speech tagger, there is a need to incorporate other features that are not included in this study, such as semantic related information. Finally, based on experimental result, the performance of the system achieves a maximum of 81% accuracy.

Keywords: POS tagging, Amharic, unsupervised learning, k-means

Procedia PDF Downloads 444