Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 27986

Search results for: sentence analysis

27926 Speaker Identification by Atomic Decomposition of Learned Features Using Computational Auditory Scene Analysis Principals in Noisy Environments

Authors: Thomas Bryan, Veton Kepuska, Ivica Kostanic

Abstract:

Speaker recognition is performed in high Additive White Gaussian Noise (AWGN) environments using principals of Computational Auditory Scene Analysis (CASA). CASA methods often classify sounds from images in the time-frequency (T-F) plane using spectrograms or cochleargrams as the image. In this paper atomic decomposition implemented by matching pursuit performs a transform from time series speech signals to the T-F plane. The atomic decomposition creates a sparsely populated T-F vector in “weight space” where each populated T-F position contains an amplitude weight. The weight space vector along with the atomic dictionary represents a denoised, compressed version of the original signal. The arraignment or of the atomic indices in the T-F vector are used for classification. Unsupervised feature learning implemented by a sparse autoencoder learns a single dictionary of basis features from a collection of envelope samples from all speakers. The approach is demonstrated using pairs of speakers from the TIMIT data set. Pairs of speakers are selected randomly from a single district. Each speak has 10 sentences. Two are used for training and 8 for testing. Atomic index probabilities are created for each training sentence and also for each test sentence. Classification is performed by finding the lowest Euclidean distance between then probabilities from the training sentences and the test sentences. Training is done at a 30dB Signal-to-Noise Ratio (SNR). Testing is performed at SNR’s of 0 dB, 5 dB, 10 dB and 30dB. The algorithm has a baseline classification accuracy of ~93% averaged over 10 pairs of speakers from the TIMIT data set. The baseline accuracy is attributable to short sequences of training and test data as well as the overall simplicity of the classification algorithm. The accuracy is not affected by AWGN and produces ~93% accuracy at 0dB SNR.

Keywords: time-frequency plane, atomic decomposition, envelope sampling, Gabor atoms, matching pursuit, sparse dictionary learning, sparse autoencoder

Procedia PDF Downloads 289

27925 Experimenting the Influence of Input Modality on Involvement Load Hypothesis

Authors: Mohammad Hassanzadeh

Abstract:

As far as incidental vocabulary learning is concerned, the basic contention of the Involvement Load Hypothesis (ILH) is that retention of unfamiliar words is, generally, conditional upon the degree of involvement in processing them. This study examined input modality and incidental vocabulary uptake in a task-induced setting whereby three variously loaded task types (marginal glosses, fill-in-task, and sentence-writing) were alternately assigned to one group of students at Allameh Tabataba’i University (n=2l) during six classroom sessions. While one round of exposure was comprised of the audiovisual medium (TV talk shows), the second round consisted of textual materials with approximately similar subject matter (reading texts). In both conditions, however, the tasks were equivalent to one another. Taken together, the study pursued the dual objectives of establishing a litmus test for the ILH and its proposed values of ‘need’, ‘search’ and ‘evaluation’ in the first place. Secondly, it sought to bring to light the superiority issue of exposure to audiovisual input versus the written input as far as the incorporation of tasks is concerned. At the end of each treatment session, a vocabulary active recall test was administered to measure their incidental gains. Running a one-way analysis of variance revealed that the audiovisual intervention yielded higher gains than the written version even when differing tasks were included. Meanwhile, task 'three' (sentence-writing) turned out the most efficient in tapping learners' active recall of the target vocabulary items. In addition to shedding light on the superiority of audiovisual input over the written input when circumstances are relatively held constant, this study for the most part, did support the underlying tenets of ILH.

Keywords: Keywords— Evaluation, incidental vocabulary learning, input mode, Involvement Load Hypothesis, need, search.

Procedia PDF Downloads 279

27924 Syntactic Ambiguity and Syntactic Analysis: Transformational Grammar Approach

Authors: Olufemi Olupe

Abstract:

Within linguistics, various approaches have been adopted to the study of language. One of such approaches is the syntax. The syntax is an aspect of the grammar of the language which deals with how words are put together to form phrases and sentences and how such structures are interpreted in language. Ambiguity, which is also germane in this discourse is about the uncertainty of meaning as a result of the possibility of a phrase or sentence being understood and interpreted in more than one way. In the light of the above, this paper attempts a syntactic study of syntactic ambiguities in The English Language, using the Transformational Generative Grammar (TGG) Approach. In doing this, phrases and sentences were raised with each description followed by relevant analysis. Finding in the work reveals that ambiguity cannot always be disambiguated by the means of syntactic analysis alone without recourse to semantic interpretation. The further finding shows that some syntactical ambiguities structures cannot be analysed on two surface structures in spite of the fact that there are more than one deep structures. The paper concludes that in as much as ambiguity remains in language; it will continue to pose a problem of understanding to a second language learner. Users of English as a second language, must, however, make a conscious effort to avoid its usage to achieve effective communication.

Keywords: language, syntax, semantics, morphology, ambiguity

Procedia PDF Downloads 394

27923 Correction of Frequent English Writing Errors by Using Coded Indirect Corrective Feedback and Error Treatment

Authors: Chaiwat Tantarangsee

Abstract:

The purposes of this study are: 1) to study the frequent English writing errors of students registering the course: Reading and Writing English for Academic Purposes II, and 2) to find out the results of writing error correction by using coded indirect corrective feedback and writing error treatments. Samples include 28 2nd year English Major students, Faculty of Education, Suan Sunandha Rajabhat University. Tool for experimental study includes the lesson plan of the course; Reading and Writing English for Academic Purposes II, and tool for data collection includes 4 writing tests of short texts. The research findings disclose that frequent English writing errors found in this course comprise 7 types of grammatical errors, namely Fragment sentence, Subject-verb agreement, Wrong form of verb tense, Singular or plural noun endings, Run-ons sentence, Wrong form of verb pattern and Lack of parallel structure. Moreover, it is found that the results of writing error correction by using coded indirect corrective feedback and error treatment reveal the overall reduction of the frequent English writing errors and the increase of students’ achievement in the writing of short texts with the significance at .05.

Keywords: coded indirect corrective feedback, error correction, error treatment, frequent English writing errors

Procedia PDF Downloads 237

27922 The Impact of Breast Cancer Diagnosis on Omani Women

Authors: H. Al-Awaisi, M. H. Al-Azri, S. Al-Rasbi, M. Al-Moundhri

Abstract:

Breast cancer is the most common cancer among females worldwide. It is also the most common cancer among females in Oman with 100 new breast cancer cases diagnosed every year. It has been found that breast cancer have a devastating effect on women’s life. Women diagnosed with breast cancer might develop negative attitudes towards the illness and their bodies. They might also suffer from psychological ailments such as depression. Despite the evidence on the impact of breast cancer diagnosis on women, there was no study found to explore the impact of breast cancer diagnosis among women in Oman. A phenomenological qualitative study was conducted to explore the impact of breast cancer diagnosis on Omani women. Data was collected through semi-structured individual interviews with 11 Omani women diagnosed with breast cancer. Interviews were transcribed verbatim and data were analyzed thematically. From the data, there are four main themes identified in relation to the impact of cancer diagnosis on Omani women. These are 'shock and disbelieve', 'a death sentence', “uncertain future” and “social stigma”. At the time of interviews, all participants had advanced breast cancer with some participants having metastatic disease. The impact of the word “cancer” had a profound and catastrophic effect on the women and their close relatives. In conclusion, breast cancer diagnosis was shocking and mainly perceived as a death sentence by Omani women with uncertain future and social stigma. Regardless of age, maternal status and education level, it is evident that Omani women participated in this study lacked awareness about breast cancer diagnosis, treatment and prognosis.

Keywords: breast cancer, coping, diagnosis, Oman, women

Procedia PDF Downloads 506

27921 Reviewing Special Education Preservice Teachers' Reflective Practices over Two Field Experiences: Topics and Changes in Reflection

Authors: Laurie U. deBettencourt

Abstract:

During pre-service field experiences teacher candidates are often asked to reflect as part of their training and in this investigation candidates’ reflective journal entries were reviewed, coded and analyzed with results suggesting teacher candidates need more direct instruction on how to describe, analyze, and make judgements on their instructional practices so that their practices improve over time. Teacher education programs often incorporate reflective-based activities during field experiences. The purpose of this investigation was to determine if special education teacher candidate’s reflective practices changed as they completed their two supervised field experiences and to determine what topics the candidates focused on in their reflections. The six females graduate students were completing two field experiences in special education classrooms within one academic year as part of their coursework leading to a master’s degree and special education teacher state certification. Each candidate wrote 15 reflection journal entries (approximately 200 words each) per field experience. Each of the journal entries were reviewed sentence by sentence to determine a reflective practice score and to determine the topics discussed. The reflective practice score was calculated using four dimensions of reflection (describe, analyze, judge, and apply) in order to create a continuous variable representing their reflective practice across four points of time. A One-way Repeated Measures Analysis of Variance (ANOVA) suggested that special education teacher candidates did not change their reflective practices over time (i.e., at time-point one the practitioner’s mean score was 56.0 out of 100 (SD = 7.6), 53.8 (SD = 4.3) at time-point two, 51.2 (SD = 4.5) at time-point three, and 57.7 (SD = 8.2) at time-point four). Qualitative findings suggest candidates focused mostly on themselves in their reflections. Conclusions suggest the need for teacher preparation programs to provide more direct instruction on how a teacher should reflect. Specific implications are provided for teacher training and future research.

Keywords: field experiences, reflective practices, special educators, teacher preparation

Procedia PDF Downloads 350

27920 Floating Quantifiers in Hijazi Arabic

Authors: Tagreed Alzahrani

Abstract:

The syntax of quantifiers has received much attention by linguists, philosophers and logicians within different frameworks and in various languages. However, the syntax of Arabic quantifiers has received limited attention in the literature, especially in relation to floating quantifiers. There have been a few discussions of floating quantifiers in Modern Standard Arabic (henceforth, MSA), although the analysis and the properties of their counterparts in other Saudi dialects are rare. Therefore, the aim of the paper is to provide a clear description of floating quantifiers (FQs) in Hijazi dialect (henceforth, HA) by utilising the following approaches: the adverbial approach, and the derivational (stranding) analysis. For a long time, Linguists have tried to explain the floating quantifiers’ phenomenon, as exemplified in the following sentences: 1. All the friends have watched the movie. 2. The friends have all watched the movie. The adverbial approach assumes that the floating quantifier is a type of adverb, because it occupies the adverbial position next to the verb. Thus, the subject in the first example is all the friends and the subject in the second example is the friends with all becoming an adverb, as it is located in an adverbial position. However, in stranding analysis, it is argued that the floating quantifier becomes stranded when its complement has moved to a higher position in the sentence [SPEC, TP]. Therefore, both sentences have the same subject all the friends, although in second example the friends has moved to a higher position and has stranded the quantifier all. The paper will investigate the floating quantifiers in HA using both approaches. The analysis will show that neither view is entirely successful in providing a unified account for FQs in HA.

Keywords: floating quantifier, adverbial analysis, stranding approach, universal quantifier

Procedia PDF Downloads 351

27919 Correction of Frequent English Writing Errors by Using Coded Indirect Corrective Feedback and Error Treatment: The Case of Reading and Writing English for Academic Purposes II

Authors: Chaiwat Tantarangsee

Abstract:

The purposes of this study are 1) to study the frequent English writing errors of students registering the course: Reading and Writing English for Academic Purposes II, and 2) to find out the results of writing error correction by using coded indirect corrective feedback and writing error treatments. Samples include 28 2nd year English Major students, Faculty of Education, Suan Sunandha Rajabhat University. Tool for experimental study includes the lesson plan of the course; Reading and Writing English for Academic Purposes II, and tool for data collection includes 4 writing tests of short texts. The research findings disclose that frequent English writing errors found in this course comprise 7 types of grammatical errors, namely Fragment sentence, Subject-verb agreement, Wrong form of verb tense, Singular or plural noun endings, Run-ons sentence, Wrong form of verb pattern and Lack of parallel structure. Moreover, it is found that the results of writing error correction by using coded indirect corrective feedback and error treatment reveal the overall reduction of the frequent English writing errors and the increase of students’ achievement in the writing of short texts with the significance at .05.

Keywords: coded indirect corrective feedback, error correction, error treatment, English writing

Procedia PDF Downloads 306

27918 Perceiving Casual Speech: A Gating Experiment with French Listeners of L2 English

Authors: Naouel Zoghlami

Abstract:

Spoken-word recognition involves the simultaneous activation of potential word candidates which compete with each other for final correct recognition. In continuous speech, the activation-competition process gets more complicated due to speech reductions existing at word boundaries. Lexical processing is more difficult in L2 than in L1 because L2 listeners often lack phonetic, lexico-semantic, syntactic, and prosodic knowledge in the target language. In this study, we investigate the on-line lexical segmentation hypotheses that French listeners of L2 English form and then revise as subsequent perceptual evidence is revealed. Our purpose is to shed further light on the processes of L2 spoken-word recognition in context and better understand L2 listening difficulties through a comparison of skilled and unskilled reactions at the point where their working hypothesis is rejected. We use a variant of the gating experiment in which subjects transcribe an English sentence presented in increments of progressively greater duration. The spoken sentence was “And this amazing athlete has just broken another world record”, chosen mainly because it included common reductions and phonetic features in English, such as elision and assimilation. Our preliminary results show that there is an important difference in the manner in which proficient and less-proficient L2 listeners handle connected speech. Less-proficient listeners delay recognition of words as they wait for lexical and syntactic evidence to appear in the gates. Further statistical results are currently being undertaken.

Keywords: gating paradigm, spoken word recognition, online lexical segmentation, L2 listening

Procedia PDF Downloads 464

27917 The Phonology and Phonetics of Second Language Intonation in Case of “Downstep”

Authors: Tayebeh Norouzi

Abstract:

This study aims to investigate the acquisition process of intonation. It examines the intonation structure of Tokyo Japanese and its realization by Iranian learners of Japanese. Seven Iranian learners of Japanese, differing in fluency, and two Japanese speakers participated in the experiment. Two sentences were used to test the phonological and phonetic characteristics of lexical pitch-accent as well as the intonation patterns produced by the speakers. Both sentences consisted of similar words with the same number of syllables and lexical pitch-accents but different syntactic structure. Speakers were asked to read each sentence three times at normal speed, and the data were analyzed by Praat. The results show that lexical pitch-accent, Accentual Phrase (AP) and AP boundary tone realization vary depending on sentence type. For sentences of type XdeYwo, the lexical pitch-accent is realized properly. However, there is a rise in AP boundary tone regardless of speakers’ level of fluency. In contrast, in sentences of type XnoYwo, the lexical pitch-accent and AP boundary tone vary depending on the speakers’ fluency level. Advanced speakers are better at grouping words into phrases and produce more native-like intonation patterns, though they are not able to realize downstep properly. The non-native speakers tried to realize proper intonation patterns by making changes in lexical accent and boundary tone.

Keywords: intonation, Iranian learners, Japanese prosody, lexical accent, second language acquisition.

Procedia PDF Downloads 169

27916 Culture of Writing and Writing of Culture: Organizational Connections and Pedagogical Implications of ESL Writing in Multilingual Philippine Setting

Authors: Randy S. Magdaluyo, Lea M. Cabar, Jefferson Q. Correa

Abstract:

One recurring issue in ESL writing is the confusing differences in the writing conventions of the first language and the target language. Culture may play an intriguing role in specifying writing features and structures that ESL writers have to follow. Although writing is typically organized in a three-part structure with introduction, body, and conclusion, it is important to analyze the complex nature of ESL writing. This study investigated the organizational features and structures of argumentative essays written in English by thirty college ESL students from three linguistic backgrounds (Cebuano, Chavacao, and Tausug) in a Philippine university. The nature of word order and sentence construction in the students’ essays and the specific components of the introduction, body, and conclusion were quantitatively and qualitatively analyzed based on ESL writing models. Focus group discussions were also conducted to help clarify the possible influence of students’ first language on the ways their essays were conceptualized and organized. Results indicate that while there was no significant difference in the overall introduction, body, and conclusion in all essays, the sentence length was interestingly different for each linguistic group of ESL students, and the word order was notably inconsistent with the S-V-O pattern of the target language. The first language was also revealed to have a facilitative role in the cognitive translation process of these ESL students. As such, implications for a multicultural writing pedagogy was discussed and recommended considering both the students’ native resources in their first language and the ESL writing models in their target language.

Keywords: community funds of knowledge, contrastive rhetoric, ESL writing, multicultural writing pedagogy

Procedia PDF Downloads 138

27915 A Novel Machine Learning Approach to Aid Agrammatism in Non-fluent Aphasia

Authors: Rohan Bhasin

Abstract:

Agrammatism in non-fluent Aphasia Cases can be defined as a language disorder wherein a patient can only use content words ( nouns, verbs and adjectives ) for communication and their speech is devoid of functional word types like conjunctions and articles, generating speech of with extremely rudimentary grammar . Past approaches involve Speech Therapy of some order with conversation analysis used to analyse pre-therapy speech patterns and qualitative changes in conversational behaviour after therapy. We describe this approach as a novel method to generate functional words (prepositions, articles, ) around content words ( nouns, verbs and adjectives ) using a combination of Natural Language Processing and Deep Learning algorithms. The applications of this approach can be used to assist communication. The approach the paper investigates is : LSTMs or Seq2Seq: A sequence2sequence approach (seq2seq) or LSTM would take in a sequence of inputs and output sequence. This approach needs a significant amount of training data, with each training data containing pairs such as (content words, complete sentence). We generate such data by starting with complete sentences from a text source, removing functional words to get just the content words. However, this approach would require a lot of training data to get a coherent input. The assumptions of this approach is that the content words received in the inputs of both text models are to be preserved, i.e, won't alter after the functional grammar is slotted in. This is a potential limit to cases of severe Agrammatism where such order might not be inherently correct. The applications of this approach can be used to assist communication mild Agrammatism in non-fluent Aphasia Cases. Thus by generating these function words around the content words, we can provide meaningful sentence options to the patient for articulate conversations. Thus our project translates the use case of generating sentences from content-specific words into an assistive technology for non-Fluent Aphasia Patients.

Keywords: aphasia, expressive aphasia, assistive algorithms, neurology, machine learning, natural language processing, language disorder, behaviour disorder, sequence to sequence, LSTM

Procedia PDF Downloads 164

27914 Chatbots as Language Teaching Tools for L2 English Learners

Authors: Feiying Wu

Abstract:

Chatbots are computer programs that attempt to engage a human in a dialogue, which originated in the 1960s with MIT's Eliza. However, they have become widespread more recently as advances in language technology have produced chatbots with increasing linguistic quality and sophistication, leading to their potential to serve as a tool for Computer-Assisted Language Learning(CALL). The aim of this article is to assess the feasibility of using two chatbots, Mitsuku and CleverBot, as pedagogical tools for learning English as a second language by stimulating L2 learners with distinct English proficiencies. Speaking of the input of stimulated learners, they are measured by AntWordProfiler to match the user's expected vocabulary proficiency. Totally, there are four chat sessions as each chatbot will converse with both beginners and advanced learners. For evaluation, it focuses on chatbots' responses from a linguistic standpoint, encompassing vocabulary and sentence levels. The vocabulary level is determined by the vocabulary range and the reaction to misspelled words. Grammatical accuracy and responsiveness to poorly formed sentences are assessed for the sentence level. In addition, the assessment of this essay sets 25% lexical and grammatical incorrect input to determine chatbots' corrective ability towards different linguistic forms. Based on statistical evidence and illustration of examples, despite the small sample size, neither Mitsuku nor CleverBot is ideal as educational tools based on their performance through word range, grammatical accuracy, topic range, and corrective feedback for incorrect words and sentences, but rather as a conversational tool for beginners of L2 English.

Keywords: chatbots, CALL, L2, corrective feedback

Procedia PDF Downloads 78

27913 The Test of Memory Malingering and Offence Severity

Authors: Kenji Gwee

Abstract:

In Singapore, the death penalty remains in active use for murder and drug trafficking of controlled drugs such as heroin. As such, the psychological assessment of defendants can often be of high stakes. The Test of Memory Malingering (TOMM) is employed by government psychologists to determine the degree of effort invested by defendants, which in turn inform on the veracity of overall psychological findings that can invariably determine the life and death of defendants. The purpose of this study was to find out if defendants facing the death penalty were more likely to invest less effort during psychological assessment (to fake bad in hopes of escaping the death sentence) compared to defendants facing lesser penalties. An archival search of all forensic cases assessed in 2012-2013 by Singapore’s designated forensic psychiatric facility yielded 186 defendants’ TOMM scores. Offence severity, coded into 6 rank-ordered categories, was analyzed in a one-way ANOVA with TOMM score as the dependent variable. There was a statistically significant difference (F(5,87) = 2.473, p = 0.038). A Tukey post-hoc test with Bonferroni correction revealed that defendants facing lower charges (Theft, shoplifting, criminal breach of trust) invested less test-taking effort (TOMM = 37.4±12.3, p = 0.033) compared to those facing the death penalty (TOMM = 46.2±8.1). The surprising finding that those facing death penalties actually invested more test taking effort than those facing relatively minor charges could be due to higher levels of cooperation when faced with death. Alternatively, other legal avenues to escape the death sentence may have been preferred over the mitigatory chance of a psychiatric defence.

Keywords: capital sentencing, offence severity, Singapore, Test of Memory Malingering

Procedia PDF Downloads 434

27912 Cross-Dialect Sentence Transformation: A Comparative Analysis of Language Models for Adapting Sentences to British English

Authors: Shashwat Mookherjee, Shruti Dutta

Abstract:

This study explores linguistic distinctions among American, Indian, and Irish English dialects and assesses various Language Models (LLMs) in their ability to generate British English translations from these dialects. Using cosine similarity analysis, the study measures the linguistic proximity between original British English translations and those produced by LLMs for each dialect. The findings reveal that Indian and Irish English translations maintain notably high similarity scores, suggesting strong linguistic alignment with British English. In contrast, American English exhibits slightly lower similarity, reflecting its distinct linguistic traits. Additionally, the choice of LLM significantly impacts translation quality, with Llama-2-70b consistently demonstrating superior performance. The study underscores the importance of selecting the right model for dialect translation, emphasizing the role of linguistic expertise and contextual understanding in achieving accurate translations.

Keywords: cross-dialect translation, language models, linguistic similarity, multilingual NLP

Procedia PDF Downloads 75

27911 The Redundant Kana: A Pragmatic Reading

Authors: Manal Mohammed Hisham Said Najjar

Abstract:

The Arab Grammarians shed light on the redundant kana (was) and gave it a considerable attention. However, their considerations and interpretations pertaining to using this verb varied: is it used to determine tense? Or used for further emphasis or for another function? Does it have a syntactic function? Morphologically, could it be used in other forms than the past? In addition, Arab Grammarians discussed the possibility of using kana to locate itself in between the syntactic constructs of a sentence, a phrase, or a collocation. Others questioned its position whether it is in initial or final. This study found out that the redundant kana (was) is cited in Quran and was used by the Arabs in their speech and poetry. This redundant kana, whether used in initial position or in a final position, or in between the constructs of a sentence, a phrase, or a collocation, implies pragmatic meanings intended by the speaker or the poet to serve different functions, such as to indicate the past tense, to provide emphasis, and to refer to the continuity of the effect and meaning of a verb or adjective. The study concludes that this verb kana can be utilized in different contexts to achieve a specific effect as did the old Arabs who used it to add specific shades of meanings. Kana as a redundant word could be added to further highlight the meaning aimed at in a specific utterance. In addition, this verb can be used in both the past and the present morphological form; and its availability in an utterance could be functional and could not be. In other words, the study found out that the redundant kana can be used in various positions in an utterance, initial, final, or in between a syntactic structure, provided that this use is pragmatically functional. In conclusion, this paper seeks to invite the scholars of the Arabic language to coin a new term which is the “pragmatic kana” to replace the term “kana alzae’da (redundant kana)” which might mean that its use is redundant and void of significance – a fact that is illogical due to its recurrent use in the Holy Quran. NOTE: Please take this study not the other one (sent by mistake) and titled kana alnaqisa

Keywords: redundan, kana, grammarians, quran

Procedia PDF Downloads 130

27910 Passivization: as Syntactic Argument Decreasing Parameter in Boro

Authors: Ganga Brahma

Abstract:

Boro employs verbs hooked up with morphemes which lead verbs to adjust with their arguments and hence, affecting the whole of sentence structures. This paper is based on few such syntactic parameters which are usually considered as argument decreasing parameters in linguistic works. Passivizing of few transitive clauses which are usually construed from the verbs occurring with certain morphemes and representation in middle constructions are few of such strategies which lead to conceptualizing of decreasing of syntactic arguments from a sentence. This paper focuses on the mentioned linguistic strategies and attempts to describe the linguistic processes as for how these parameters work in languages especially by concentrating on a particular Tibeto-Burman language i.e. Boro. Boro is a Tibeto-Burman language widely spoken in parts of the north-eastern regions of India. It has an agglutinative nature in forming words as well as clauses. There is a morpheme ‘za’ which means ‘to happen, become’ in Boro whose appearances with verb roots denotes an idea of the subject being passivized. Passivization, usually has notions that it is a reversed representation of its active sentence forms in the terms of argument placements. (However, it is not accountably true as passives and actives have some distinct features of their own and independent of one and the other.) This particular work will concentrate on the semantics of passivization at the same time along with its syntactic reality. The verb khɑo meaning ‘to steal’ offers a sense of passivization with the appearance of the morpheme zɑ which means ‘to happen, become’ (e.g Zunu-ɑ lama-ɑo phɯisɑ khɑo-zɑ-bɑi; Junu-NOM road-LOC money steal-PASS-PRES: Junu got her money stolen on the road). The focus, here, is more on the argument placed at the subject position (i.e. Zunu) and the event taken place. The semantics of such construction asks for the agent because without an agent the event could not have taken place. However, the syntactic elements fill the slots of relegated or temporarily deleted agent which, infact, is the actual subject cum agent in its active representation. Due to the event marker ‘zɑ’ in this presentation it affords to reduce one participant from such a situation which in actual is made up of three participants. Hence, the structure of di-transitive construction here reduces to mono-transitive structure. Unlike passivization, middle construction does not allow relegation of the agents. It permanently deletes agents. However, it also focuses on the fore-grounded subject and highlighting on the changed states on the subjects which happens to be the underlying objects of their respective transitive structures (with agents). This work intends to describe how these two parameters which are different at their semantic realization can meet together at a syntactic level in order to create a linguistic parameter that decreases participants from their actual structures which are with more than one participant.

Keywords: argument-decrease, middle-construction, passivization, transitivity-intransitivity

Procedia PDF Downloads 237

27909 Challenge of the Credibility of Witnesses in the International Criminal Court and the Precondition to Establish the Truth

Authors: Romina Beqiri

Abstract:

In the context of the prosecution of those responsible for the commission of the most hideous crimes and the fight against impunity, a fundamental role is played by witnesses of the crimes who contribute to ascertaining the ‘procedural truth’. This article examines recent decisions and legislation of the Hague-based International Criminal Court in terms of the endangerment of the integrity of the criminal proceedings in consequence of witness tampering. The analysis focuses on the new developments in the courtroom and the academia, in particular, on the first-ever sentence confirming the charges of corruptly influencing witnesses, interpretation of presenting false evidence and giving false testimony when under an obligation to tell the truth. Confronted with recent tampering with witnesses and their credibility at stake in the ongoing cases, the research explores different Court’s decisions and scholars’ legal disputes concerning the deterrence approach to punish the authors of offences against the administration of justice when committed intentionally. Therefore, the analysis concludes that the Court cannot tolerate any witness false testimony and should enhance consistency and severity of sanctions for the sake of fair trial and end impunity.

Keywords: International Criminal Court, administration of justice, credibility of witness, fair trial, false testimony, witness tampering

Procedia PDF Downloads 170

27908 Structured-Ness and Contextual Retrieval Underlie Language Comprehension

Authors: Yao-Ying Lai, Maria Pinango, Ashwini Deo

Abstract:

While grammatical devices are essential to language processing, how comprehension utilizes cognitive mechanisms is less emphasized. This study addresses this issue by probing the complement coercion phenomenon: an entity-denoting complement following verbs like begin and finish receives an eventive interpretation. For example, (1) “The queen began the book” receives an agentive reading like (2) “The queen began [reading/writing/etc.…] the book.” Such sentences engender additional processing cost in real-time comprehension. The traditional account attributes this cost to an operation that coerces the entity-denoting complement to an event, assuming that these verbs require eventive complements. However, in closer examination, examples like “Chapter 1 began the book” undermine this assumption. An alternative, Structured Individual (SI) hypothesis, proposes that the complement following aspectual verbs (AspV; e.g. begin, finish) is conceptualized as a structured individual, construed as an axis along various dimensions (e.g. spatial, eventive, temporal, informational). The composition of an animate subject and an AspV such as (1) engenders an ambiguity between an agentive reading along the eventive dimension like (2), and a constitutive reading along the informational/spatial dimension like (3) “[The story of the queen] began the book,” in which the subject is interpreted as a subpart of the complement denotation. Comprehenders need to resolve the ambiguity by searching contextual information, resulting in additional cost. To evaluate the SI hypothesis, a questionnaire was employed. Method: Target AspV sentences such as “Shakespeare began the volume.” were preceded by one of the following types of context sentence: (A) Agentive-biasing, in which an event was mentioned (…writers often read…), (C) Constitutive-biasing, in which a constitutive meaning was hinted (Larry owns collections of Renaissance literature.), (N) Neutral context, which allowed both interpretations. Thirty-nine native speakers of English were asked to (i) rate each context-target sentence pair from a 1~5 scale (5=fully understandable), and (ii) choose possible interpretations for the target sentence given the context. The SI hypothesis predicts that comprehension is harder for the Neutral condition, as compared to the biasing conditions because no contextual information is provided to resolve an ambiguity. Also, comprehenders should obtain the specific interpretation corresponding to the context type. Results: (A) Agentive-biasing and (C) Constitutive-biasing were rated higher than (N) Neutral conditions (p< .001), while all conditions were within the acceptable range (> 3.5 on the 1~5 scale). This suggests that when lacking relevant contextual information, semantic ambiguity decreases comprehensibility. The interpretation task shows that the participants selected the biased agentive/constitutive reading for condition (A) and (C) respectively. For the Neutral condition, the agentive and constitutive readings were chosen equally often. Conclusion: These findings support the SI hypothesis: the meaning of AspV sentences is conceptualized as a parthood relation involving structured individuals. We argue that semantic representation makes reference to spatial structured-ness (abstracted axis). To obtain an appropriate interpretation, comprehenders utilize contextual information to enrich the conceptual representation of the sentence in question. This study connects semantic structure to human’s conceptual structure, and provides a processing model that incorporates contextual retrieval.

Keywords: ambiguity resolution, contextual retrieval, spatial structured-ness, structured individual

Procedia PDF Downloads 333

27907 Spelling Errors of EFL Students: An Insight into Curriculum Development

Authors: Sheikha Ali Salim Al-Breiki

Abstract:

The purpose of this study was to explore the types of the spelling errors students of grade ten make and to find out whether there were any significant differences between males and females with respect to the types of the spelling errors made. The sample of the study included 90 grade ten students from four different schools in North Batinah. The researcher manipulated the use of a test that consisted of two questions: an oral dictation test of 70 words with a contextualizing sentence and a free writing task. The misspellings were classified into nine different types. The findings revealed that the most common spelling errors among Omani grade ten students were vowel substitution, then came vowel omission in the second place and consonant substitution in the third place. Male students omitted more vowels than female students while females made more true word errors than their male counterparts. In light of the findings, the study presents some recommendations and suggestions for further studies.

Keywords: types of spelling errors, errors, ESL/EFL, error analysis

Procedia PDF Downloads 372

27906 Music Reading Expertise Facilitates Implicit Statistical Learning of Sentence Structures in a Novel Language: Evidence from Eye Movement Behavior

Authors: Sara T. K. Li, Belinda H. J. Chung, Jeffery C. N. Yip, Janet H. Hsiao

Abstract:

Music notation and text reading both involve statistical learning of music or linguistic structures. However, it remains unclear how music reading expertise influences text reading behavior. The present study examined this issue through an eye-tracking study. Chinese-English bilingual musicians and non-musicians read English sentences, Chinese sentences, musical phrases, and sentences in Tibetan, a language novel to the participants, with their eye movement recorded. Each set of stimuli consisted of two conditions in terms of structural regularity: syntactically correct and syntactically incorrect musical phrases/sentences. They then completed a sentence comprehension (for syntactically correct sentences) or a musical segment/word recognition task afterwards to test their comprehension/recognition abilities. The results showed that in reading musical phrases, as compared with non-musicians, musicians had a higher accuracy in the recognition task, and had shorter reading time, fewer fixations, and shorter fixation duration when reading syntactically correct (i.e., in diatonic key) than incorrect (i.e., in non-diatonic key/atonal) musical phrases. This result reflects their expertise in music reading. Interestingly, in reading Tibetan sentences, which was novel to both participant groups, while non-musicians did not show any behavior differences between reading syntactically correct or incorrect Tibetan sentences, musicians showed a shorter reading time and had marginally fewer fixations when reading syntactically correct sentences than syntactically incorrect ones. However, none of the musicians reported discovering any structural regularities in the Tibetan stimuli after the experiment when being asked explicitly, suggesting that they may have implicitly acquired the structural regularities in Tibetan sentences. This group difference was not observed when they read English or Chinese sentences. This result suggests that music reading expertise facilities reading texts in a novel language (i.e., Tibetan), but not in languages that the readers are already familiar with (i.e., English and Chinese). This phenomenon may be due to the similarities between reading music notations and reading texts in a novel language, as in both cases the stimuli follow particular statistical structures but do not involve semantic or lexical processing. Thus, musicians may transfer their statistical learning skills stemmed from music notation reading experience to implicitly discover structures of sentences in a novel language. This speculation is consistent with a recent finding showing that music reading expertise modulates the processing of English nonwords (i.e., words that do not follow morphological or orthographic rules) but not pseudo- or real words. These results suggest that the modulation of music reading expertise on language processing depends on the similarities in the cognitive processes involved. It also has important implications for the benefits of music education on language and cognitive development.

Keywords: eye movement behavior, eye-tracking, music reading expertise, sentence reading, structural regularity, visual processing

Procedia PDF Downloads 380

27905 Verbal Working Memory in Sequential and Simultaneous Bilinguals: An Exploratory Study

Authors: Archana Rao R., Deepak P., Chayashree P. D., Darshan H. S.

Abstract:

Cognitive abilities in bilinguals have been widely studied over the last few decades. Bilingualism has been found to extensively facilitate the ability to store and manipulate information in Working Memory (WM). The mechanism of WM includes primary memory, attentional control, and secondary memory, each of which makes a contribution to WM. Many researches have been done in an attempt to measure WM capabilities through both verbal (phonological) and nonverbal tasks (visuospatial). Since there is a lot of speculations regarding the relationship between WM and bilingualism, further investigation is required to understand the nature of WM in bilinguals, i.e., with respect to sequential and simultaneous bilinguals. Hence the present study aimed to highlight the verbal working memory abilities in sequential and simultaneous bilinguals with respect to the processing and recall abilities of nouns and verbs. Two groups of bilinguals aged between 18-30 years were considered for the study. Group 1 consisted of 20 (10 males and 10 females) sequential bilinguals who had acquired L1 (Kannada) before the age of 3 and had exposure to L2 (English) for a period of 8-10 years. Group 2 consisted of 20 (10 males and 10 females) simultaneous bilinguals who have acquired both L1 and L2 before the age of 3. Working memory abilities were assessed using two tasks, and a set of stimuli which was presented in gradation of complexity and the stimuli was inclusive of frequent and infrequent nouns and verbs. The tasks involved the participants to judge the correctness of the sentence and simultaneously remember the last word of each sentence and the participants are instructed to recall the words at the end of each set. The results indicated no significant difference between sequential and simultaneous bilinguals in processing the nouns and verbs, and this could be attributed to the proficiency level of the participants in L1 and the alike cognitive abilities between the groups. And recall of nouns was better compared to verbs, maybe because of the complex argument structure involved in verbs. Similarly, authors found a frequency of occurrence of nouns and verbs also had an effect on WM abilities. The difference was also found across gradation due to the load imposed on the central executive function and phonological loop.

Keywords: bilinguals, nouns, verbs, working memory

Procedia PDF Downloads 129

27904 Exploring Bidirectional Encoder Representations from the Transformers’ Capabilities to Detect English Preposition Errors

Authors: Dylan Elliott, Katya Pertsova

Abstract:

Preposition errors are some of the most common errors created by L2 speakers. In addition, improving error correction and detection methods remains an open issue in the realm of Natural Language Processing (NLP). This research investigates whether the bidirectional encoder representations from the transformers model (BERT) have the potential to correct preposition errors accurately enough to be useful in error correction software. This research finds that BERT performs strongly when the scope of its error correction is limited to preposition choice. The researchers used an open-source BERT model and over three hundred thousand edited sentences from Wikipedia, tagged for part of speech, where only a preposition edit had occurred. To test BERT’s ability to detect errors, a technique known as multi-level masking was used to generate suggestions based on sentence context for every prepositional environment in the test data. These suggestions were compared with the original errors in the data and their known corrections to evaluate BERT’s performance. The suggestions were further analyzed to determine if BERT more often agreed with the judgements of the Wikipedia editors. Both the untrained and fined-tuned models were compared. Finetuning led to a greater rate of error-detection which significantly improved recall, but lowered precision due to an increase in false positives or falsely flagged errors. However, in most cases, these false positives were not errors in preposition usage but merely cases where more than one preposition was possible. Furthermore, when BERT correctly identified an error, the model largely agreed with the Wikipedia editors, suggesting that BERT’s ability to detect misused prepositions is better than previously believed. To evaluate to what extent BERT’s false positives were grammatical suggestions, we plan to do a further crowd-sourcing study to test the grammaticality of BERT’s suggested sentence corrections against native speakers’ judgments.

Keywords: BERT, grammatical error correction, preposition error detection, prepositions

Procedia PDF Downloads 147

27903 A Relationship Extraction Method from Literary Fiction Considering Korean Linguistic Features

Authors: Hee-Jeong Ahn, Kee-Won Kim, Seung-Hoon Kim

Abstract:

The knowledge of the relationship between characters can help readers to understand the overall story or plot of the literary fiction. In this paper, we present a method for extracting the specific relationship between characters from a Korean literary fiction. Generally, methods for extracting relationships between characters in text are statistical or computational methods based on the sentence distance between characters without considering Korean linguistic features. Furthermore, it is difficult to extract the relationship with direction from text, such as one-sided love, because they consider only the weight of relationship, without considering the direction of the relationship. Therefore, in order to identify specific relationships between characters, we propose a statistical method considering linguistic features, such as syntactic patterns and speech verbs in Korean. The result of our method is represented by a weighted directed graph of the relationship between the characters. Furthermore, we expect that proposed method could be applied to the relationship analysis between characters of other content like movie or TV drama.

Keywords: data mining, Korean linguistic feature, literary fiction, relationship extraction

Procedia PDF Downloads 380

27902 Topic-Specific Differences and Lexical Variations in the Use of Violence Metaphors: A Cognitive Linguistic Study of YouTube Breast Cancer Discourse in New Zealand and Pakistan

Authors: Sara Malik, Andreea. S. Calude, Joseph Ulatowski

Abstract:

This paper explores how speakers from New Zealand and Pakistan with breast cancer use violence metaphors to communicate the intensity of their experiences during various stages of illness. With the theoretical foundation in Conceptual Metaphor Theory and the use of Metaphor Identification Procedure for metaphor analysis, this study investigates how speakers with breast cancer use violence metaphors in different cultural contexts. it collected a corpus of forty-six personal narratives from New Zealand and thirty-six from Pakistan, posted between 2011 and 2023 on YouTube by breast cancer organisations, such as ‘NZ Breast Cancer Foundation’ and ‘Pink Ribbon Pakistan’. The data was transcribed using the Whisper AI tool and then curated to include only patients’ discourse, further organised into eight narrative topics: testing phase, treatment phase, remission phase, family support, campaigns and awareness efforts, government support and funding, general information and religious discourse. In this talk, it discuss two aspects of the use of violence metaphors, a) differences in the use of violence metaphors across various narrative topics, and b) lexical variations in the choice of such metaphors. The findings suggest that violence metaphors were used differently across various stages of illness experience. For instance, during the ‘testing phase,’ violence metaphors were employed to convey a sense of punishment as reflected in statements like, ‘Feeling like it was a death sentence, an immediate death sentence’ (NZ Example) and ‘Jese hi aap ko na breast cancer ka pata chalta hai logon ko yeh hona shuru ho jata hai ke oh bas ab to moat ka parwana mil gaya hai’ (Because as soon as you find out you have breast cancer people start to feel that you have received a death warrant) (PK Example). On the other hand, violence metaphor during the ‘treatment phase’ highlighted negative experiences related to chemotherapy as seen in statements like ‘The first lot of chemo I had was disastrous’ (NZ Example) and ‘...chemotherapy ke to, it's the worst of all, it's like a healing poison’ (chemotherapy, it's the worst of all, it's like a healing poison) (PK Example). Second, lexical variations revealed how ‘sunburn’ (a common phenomenon in the NZ) was used as a metaphor to describe the effects of radiotherapy, whereas in the discourse from Pakistan, a more general term, 'burn,' was used instead. In this talk, we will explore the possible reasons behind the different word choices made by speakers from both countries to describe the same process. This study contributes to understanding the use of violence metaphors across various narrative topics of the illness experience and explains how and why speakers from two different countries use lexical variations to describe the same process.

Keywords: metaphors, breast cancer discourse, cognitive linguistics, lexical variations, New zealand english, pakistani urdu

Procedia PDF Downloads 31

27901 Effect of Spelling on Communicative Competence: A Case Study of Registry Staff of the University of Ibadan, Nigeria

Authors: Lukman Omobola Adisa

Abstract:

Spelling is rule bound in a written discourse. It, however, calls into question, when such convention is grossly contravened in a formal setting revered as citadel of learning, despite availability of computer spell-checker, human knowledge, and lexicon. The foregoing reveals the extent of decadence pervading education sector in Nigeria. It is on this premise that this study reviews the effect of spelling on communicative competence of the University of Ibadan Registry Staff. The theoretical framework basically evaluates diverse scholars’ views on communicative competence and how spelling influences the intended meaning of a word/ sentence as a result of undue infringement on grammatical (spelling) rule. Newsletter, bulletin, memo, and letter are four print materials purposively selected while the methodology adopted is content analysis. Similarly, five categories, though not limited to, through which spelling blunders are committed are considered: effect of spelling (omission, addition, and substitution); sound ( homophone); transposition (heading/body: content) and ambiguity (capitalisation, space, and acronym). Subsequently, the analyses, findings, and recommendations are equally looked into. Summarily, the study x-rays effective role(s) plays by spelling in enhancing communicative competence through appropriate usage of linguistic registers.

Keywords: communicative competence, content analysis, effect of spelling, linguistics registers

Procedia PDF Downloads 218

27900 Developing Writing Skills of Learners with Persistent Literacy Difficulties through the Explicit Teaching of Grammar in Context: Action Research in a Welsh Secondary School

Authors: Jean Ware, Susan W. Jones

Abstract:

Background: The benefits of grammar instruction in the teaching of writing is contested in most English speaking countries. A majority of Anglophone countries abandoned the teaching of grammar in the 1950s based on the conclusions that it had no positive impact on learners’ development of reading, writing, and language. Although the decontextualised teaching of grammar is not helpful in improving writing, a curriculum with a focus on grammar in an embedded and meaningful way can help learners develop their understanding of the mechanisms of language. Although British learners are generally not taught grammar rules explicitly, learners in schools in France, the Netherlands, and Germany are taught explicitly about the structure of their own language. Exposing learners to grammatical analysis can help them develop their understanding of language. Indeed, if learners are taught that each part of speech has an identified role in the sentence. This means that rather than have to memorise lists of words or spelling patterns, they can focus on determining each word or phrase’s task in the sentence. These processes of categorisation and deduction are higher order thinking skills. When considering definitions of dyslexia available in Great Britain, the explicit teaching of grammar in context could help learners with persistent literacy difficulties. Indeed, learners with dyslexia often develop strengths in problem solving; the teaching of grammar could, therefore, help them develop their understanding of language by using analytical and logical thinking. Aims: This study aims at gaining a further understanding of how the explicit teaching of grammar in context can benefit learners with persistent literacy difficulties. The project is designed to identify ways of adapting existing grammar focussed teaching materials so that learners with specific learning difficulties such as dyslexia can use them to further develop their writing skills. It intends to improve educational practice through action, analysis and reflection. Research Design/Methods: The project, therefore, uses an action research design and multiple sources of evidence. The data collection tools used were standardised test data, teacher assessment data, semi-structured interviews, learners’ before and after attempts at a writing task at the beginning and end of the cycle, documentary data and lesson observation carried out by a specialist teacher. Existing teaching materials were adapted for use with five Year 9 learners who had experienced persistent literacy difficulties from primary school onwards. The initial adaptations included reducing the amount of content to be taught in each lesson, and pre teaching some of the metalanguage needed. Findings: Learners’ before and after attempts at the writing task were scored by a colleague who did not know the order of the attempts. All five learners’ scores were higher on the second writing task. Learners reported that they had enjoyed the teaching approach. They also made suggestions to be included in the second cycle, as did the colleague who carried out observations. Conclusions: Although this is a very small exploratory study, these results suggest that adapting grammar focused teaching materials shows promise for helping learners with persistent literacy difficulties develop their writing skills.

Keywords: explicit teaching of grammar in context, literacy acquisition, persistent literacy difficulties, writing skills

Procedia PDF Downloads 156

27899 A Discourse Analysis of Menopause for Thai Women

Authors: Prapaipan Phingchim

Abstract:

The number of women approaching menopausal age in Thailand is increasing, making menopause an important health topic. In order to understand Thai women's different ways of interpreting menopausal experiences and the way they construct meaning relating to menopause, it is necessary to include the context in which meaning is constructed as well as the background of cultural attitudes to menopause existing in the Thai society. The aim of this study was to describe different discourses on menopause in Thailand that present themselves to menopausal women through the use of language and to analyze linguistic strategies used to represent such identity. This study adopts discourse theory and a close pragmatic analysis to examine the discursive construction of menopause for Thai women. Two hundreds and fifteen pieces of text under the heading or subject of `menopause' or `becoming a middle-aged woman', published from 2010 to 2019, were included. All material was addressed to Thai women, and consisted of booklets and informational material, articles from newspapers and magazines and popular science books. Five different discourses on menopause were identified: the biomedical discourse; the health-promotion discourse; the consumer discourse; the alternative discourse; and the feminist/ critical discourse. The biomedical discourse on menopause was found to be dominant, but was expanded or challenged by other discourses by offering different scopes of action and/or resting on different fundamental values. The discourses constructed and positioned individual women differently; thus, the women's position varied noticeably from one discourse to another. There are seven major linguistic strategies used to construct those identities. That is, lexical selection, presupposition manipulation, presupposition denial, the use of implication, the use of passive construction, using the cause and effect sentence structure, and rhetoric questions.

Keywords: discourse analysis, discursive construction, menopause, Thai women

Procedia PDF Downloads 145

27898 Corpus-Based Neural Machine Translation: Empirical Study Multilingual Corpus for Machine Translation of Opaque Idioms - Cloud AutoML Platform

Authors: Khadija Refouh

Abstract:

Culture bound-expressions have been a bottleneck for Natural Language Processing (NLP) and comprehension, especially in the case of machine translation (MT). In the last decade, the field of machine translation has greatly advanced. Neural machine translation NMT has recently achieved considerable development in the quality of translation that outperformed previous traditional translation systems in many language pairs. Neural machine translation NMT is an Artificial Intelligence AI and deep neural networks applied to language processing. Despite this development, there remain some serious challenges that face neural machine translation NMT when translating culture bounded-expressions, especially for low resources language pairs such as Arabic-English and Arabic-French, which is not the case with well-established language pairs such as English-French. Machine translation of opaque idioms from English into French are likely to be more accurate than translating them from English into Arabic. For example, Google Translate Application translated the sentence “What a bad weather! It runs cats and dogs.” to “يا له من طقس سيء! تمطر القطط والكلاب” into the target language Arabic which is an inaccurate literal translation. The translation of the same sentence into the target language French was “Quel mauvais temps! Il pleut des cordes.” where Google Translate Application used the accurate French corresponding idioms. This paper aims to perform NMT experiments towards better translation of opaque idioms using high quality clean multilingual corpus. This Corpus will be collected analytically from human generated idiom translation. AutoML translation, a Google Neural Machine Translation Platform, is used as a custom translation model to improve the translation of opaque idioms. The automatic evaluation of the custom model will be compared to the Google NMT using Bilingual Evaluation Understudy Score BLEU. BLEU is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. Human evaluation is integrated to test the reliability of the Blue Score. The researcher will examine syntactical, lexical, and semantic features using Halliday's functional theory.

Keywords: multilingual corpora, natural language processing (NLP), neural machine translation (NMT), opaque idioms

Procedia PDF Downloads 149

27897 Self-Supervised Learning for Hate-Speech Identification

Authors: Shrabani Ghosh

Abstract:

Automatic offensive language detection in social media has become a stirring task in today's NLP. Manual Offensive language detection is tedious and laborious work where automatic methods based on machine learning are only alternatives. Previous works have done sentiment analysis over social media in different ways such as supervised, semi-supervised, and unsupervised manner. Domain adaptation in a semi-supervised way has also been explored in NLP, where the source domain and the target domain are different. In domain adaptation, the source domain usually has a large amount of labeled data, while only a limited amount of labeled data is available in the target domain. Pretrained transformers like BERT, RoBERTa models are fine-tuned to perform text classification in an unsupervised manner to perform further pre-train masked language modeling (MLM) tasks. In previous work, hate speech detection has been explored in Gab.ai, which is a free speech platform described as a platform of extremist in varying degrees in online social media. In domain adaptation process, Twitter data is used as the source domain, and Gab data is used as the target domain. The performance of domain adaptation also depends on the cross-domain similarity. Different distance measure methods such as L2 distance, cosine distance, Maximum Mean Discrepancy (MMD), Fisher Linear Discriminant (FLD), and CORAL have been used to estimate domain similarity. Certainly, in-domain distances are small, and between-domain distances are expected to be large. The previous work finding shows that pretrain masked language model (MLM) fine-tuned with a mixture of posts of source and target domain gives higher accuracy. However, in-domain performance of the hate classifier on Twitter data accuracy is 71.78%, and out-of-domain performance of the hate classifier on Gab data goes down to 56.53%. Recently self-supervised learning got a lot of attention as it is more applicable when labeled data are scarce. Few works have already been explored to apply self-supervised learning on NLP tasks such as sentiment classification. Self-supervised language representation model ALBERTA focuses on modeling inter-sentence coherence and helps downstream tasks with multi-sentence inputs. Self-supervised attention learning approach shows better performance as it exploits extracted context word in the training process. In this work, a self-supervised attention mechanism has been proposed to detect hate speech on Gab.ai. This framework initially classifies the Gab dataset in an attention-based self-supervised manner. On the next step, a semi-supervised classifier trained on the combination of labeled data from the first step and unlabeled data. The performance of the proposed framework will be compared with the results described earlier and also with optimized outcomes obtained from different optimization techniques.

Keywords: attention learning, language model, offensive language detection, self-supervised learning

Procedia PDF Downloads 105