Search results for: Arabic natural language processing
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 12345

Search results for: Arabic natural language processing

12105 Improving Subjective Bias Detection Using Bidirectional Encoder Representations from Transformers and Bidirectional Long Short-Term Memory

Authors: Ebipatei Victoria Tunyan, T. A. Cao, Cheol Young Ock

Abstract:

Detecting subjectively biased statements is a vital task. This is because this kind of bias, when present in the text or other forms of information dissemination media such as news, social media, scientific texts, and encyclopedias, can weaken trust in the information and stir conflicts amongst consumers. Subjective bias detection is also critical for many Natural Language Processing (NLP) tasks like sentiment analysis, opinion identification, and bias neutralization. Having a system that can adequately detect subjectivity in text will boost research in the above-mentioned areas significantly. It can also come in handy for platforms like Wikipedia, where the use of neutral language is of importance. The goal of this work is to identify the subjectively biased language in text on a sentence level. With machine learning, we can solve complex AI problems, making it a good fit for the problem of subjective bias detection. A key step in this approach is to train a classifier based on BERT (Bidirectional Encoder Representations from Transformers) as upstream model. BERT by itself can be used as a classifier; however, in this study, we use BERT as data preprocessor as well as an embedding generator for a Bi-LSTM (Bidirectional Long Short-Term Memory) network incorporated with attention mechanism. This approach produces a deeper and better classifier. We evaluate the effectiveness of our model using the Wiki Neutrality Corpus (WNC), which was compiled from Wikipedia edits that removed various biased instances from sentences as a benchmark dataset, with which we also compare our model to existing approaches. Experimental analysis indicates an improved performance, as our model achieved state-of-the-art accuracy in detecting subjective bias. This study focuses on the English language, but the model can be fine-tuned to accommodate other languages.

Keywords: subjective bias detection, machine learning, BERT–BiLSTM–Attention, text classification, natural language processing

Procedia PDF Downloads 130
12104 Investigating Universals of Rhetoric

Authors: Nasreddin Ahmed

Abstract:

Despite the ostensible extant differences amongst world languages’ structures that have culminated in the divergence in orthographic, phonological, morphological, and syntactic systems that each language has, research in cognitive linguistic strives to establish the claim that such differences are merely prima facie of a totalized universal system of signification.Linguists , since Chomsky, have never given up on the attempt to establish linguistic descriptive model that espouses a perspective in which every human language has a slot . Concurring with claim that the so-called rhetorical devices are pervasive phenomena and not literary-specific , the present paper aspires to voice the claim that rhetorical devices not only ubiquitous in all levels of a particular language but also a universal linguistic phenomena. Using illustrations from Arabic and Englishthe paper intend to provide data-supported evidence that human beings are universally using similar rhetorical, albeit given different appellations.

Keywords: language, rhetoric, syntax, stylistics

Procedia PDF Downloads 96
12103 Creating Energy Sustainability in an Enterprise

Authors: John Lamb, Robert Epstein, Vasundhara L. Bhupathi, Sanjeev Kumar Marimekala

Abstract:

As we enter the new era of Artificial Intelligence (AI) and Cloud Computing, we mostly rely on the Machine and Natural Language Processing capabilities of AI, and Energy Efficient Hardware and Software Devices in almost every industry sector. In these industry sectors, much emphasis is on developing new and innovative methods for producing and conserving energy and sustaining the depletion of natural resources. The core pillars of sustainability are economic, environmental, and social, which is also informally referred to as the 3 P's (People, Planet and Profits). The 3 P's play a vital role in creating a core Sustainability Model in the Enterprise. Natural resources are continually being depleted, so there is more focus and growing demand for renewable energy. With this growing demand, there is also a growing concern in many industries on how to reduce carbon emissions and conserve natural resources while adopting sustainability in corporate business models and policies. In our paper, we would like to discuss the driving forces such as Climate changes, Natural Disasters, Pandemic, Disruptive Technologies, Corporate Policies, Scaled Business Models and Emerging social media and AI platforms that influence the 3 main pillars of Sustainability (3P’s). Through this paper, we would like to bring an overall perspective on enterprise strategies and the primary focus on bringing cultural shifts in adapting energy-efficient operational models. Overall, many industries across the globe are incorporating core sustainability principles such as reducing energy costs, reducing greenhouse gas (GHG) emissions, reducing waste and increasing recycling, adopting advanced monitoring and metering infrastructure, reducing server footprint and compute resources (Shared IT services, Cloud computing, and Application Modernization) with the vision for a sustainable environment.

Keywords: climate change, pandemic, disruptive technology, government policies, business model, machine learning and natural language processing, AI, social media platform, cloud computing, advanced monitoring, metering infrastructure

Procedia PDF Downloads 111
12102 Morpheme Based Parts of Speech Tagger for Kannada Language

Authors: M. C. Padma, R. J. Prathibha

Abstract:

Parts of speech tagging is the process of assigning appropriate parts of speech tags to the words in a given text. The critical or crucial information needed for tagging a word come from its internal structure rather from its neighboring words. The internal structure of a word comprises of its morphological features and grammatical information. This paper presents a morpheme based parts of speech tagger for Kannada language. This proposed work uses hierarchical tag set for assigning tags. The system is tested on some Kannada words taken from EMILLE corpus. Experimental result shows that the performance of the proposed system is above 90%.

Keywords: hierarchical tag set, morphological analyzer, natural language processing, paradigms, parts of speech

Procedia PDF Downloads 296
12101 The Theology of a Muslim Artist: Tawfiq al-Hakim

Authors: Abdul Rahman Chamseddine

Abstract:

Tawfiq al-Hakim remains one of the most prominent playwrights in his native in Egypt, and in the broader Arab world. His works, at the time of their release, drew international attention and acclaim. His first 1933 masterpiece Ahl al-Kahf (The People of the Cave) especially, garnered fame and recognition in both Europe and the Arab world. Borrowing its title from the Qur’anic Sura, al-Hakim’s play relays the untold story of the life of those 'three saints' after they wake up from their prolonged sleep. The playwright’s selection of topics upon which to base his works displays a deep appreciation of Arabic and Islamic heritage. Al-Hakim was clearly influenced by Islam, to such a degree that he wrote the biography of the Prophet Muhammad in 1936 very early in his career. Knowing that Al-Hakim was preceded by many poets and creative writers in writing the Prophet Muhammad’s biography. Notably like Al-Barudi, Ahmad Shawqi, Haykal, Al-‘Aqqad, and Taha Husayn who have had their own ways in expressing their views of the Prophet Muhammad. The attempt to understand the concern of all those renaissance men and others in the person of the Prophet would be indispensable in this study. This project will examine the reasons behind al-Hakim’s choice to draw upon these particular texts, embedded as they are in the context of Arabic and Islamic heritage, and how the use of traditional texts serves his contemporary goals. The project will also analyze the image of Islam in al-Hakim’s imagination. Elsewhere, he envisions letters or conversations between God and himself, which offers a window into understanding the powerful impact of the Divine on Tawfiq al-Hakim, one that informs his literature and merits further scholarly attention. His works occupying a major rank in Arabic literature, does not reveal Al-Hakim solely but the unquestioned assumptions operative in the life of his community, its mental make-up and its attitudes. Furthermore, studying the reception of works that touch on sensitive issues, like writing a letter to God, in Al-Hakim’s historical context would be of a great significance in the process of comprehending the mentality of the Muslim community at that time.

Keywords: Arabic language, Arabic literature, Arabic theology, modern Arabic literature

Procedia PDF Downloads 366
12100 Crossover Memories and Code-Switching in the Narratives of Arabic-Hebrew and Hebrew-English Bilingual Adults in Israel

Authors: Amani Jaber-Awida

Abstract:

This study examines two bilingual phenomena in the narratives of Arabic Hebrew and Hebrew-English bilingual adults in Israel: CO memories and code-switching (CS). The study examined these phenomena in the context of autobiographical memory, using a cue word technique. Student experimenters held two sessions in the homes of the participants. In separate language sessions, the participant was asked to look first at each of 16 cue words and then to state a concrete memory. After stating the memory, participants reported whether their memories were in the same language of the experiment session or different. Memories were classified as ‘Crossovers’ (CO) or ‘Same Language’ (SL) according to participants' self-reports. Participants were also required to elaborate about the setting, interlocutors and other languages involved in the specific memory. Beyond replicating the procedure of cuing technique, one memory from a specific lifespan period was chosen per participant, and the participant was required to provide further details about it. For the more detailed memories, CS count was conducted. Both bilingual groups confirmed the Reminiscence Bump phenomenon, retrieving more memories in the 10-30 age period. CO memories prevailed in second language sessions (L2). Same language memories were more abundant in first language sessions (L1). Higher CS frequency was found in L2 sessions. Finally, as predicted, 'individual' CS was prevalent in L2 sessions, but 'community-based' CS was not higher in L1 sessions. The two bilingual measures in this study, crossovers, and CS came from different research traditions, the former from an experimental paradigm in the psychology of autobiographical memory based on self-reported judgments, the latter a behavioral measure from linguistics. This merger of approaches offers new insight into the field of bilingual autobiographical memory. In addition, the study attempted to shed light on the investigation of motivations for CS, beginning with Walters’ SPPL Model and concluding with a distinction between ‘community-based’ and individual motivations.

Keywords: bilinguals, code-switching, crossover memories, narratives

Procedia PDF Downloads 171
12099 Transportation Language Register as One of Language Community

Authors: Diyah Atiek Mustikawati

Abstract:

Language register refers to a variety of a language used for particular purpose or in a particular social setting. Language register also means as a concept of adapting one’s use of language to conform to standards or tradition in a given professional or social situation. This descriptive study tends to discuss about the form of language register in transportation aspect, factors, also the function of use it. Mostly, language register in transportation aspect uses short sentences in form of informal register. The factor caused language register used are speaker, word choice, background of language. The functions of language register in transportations aspect are to make communication between crew easily, also to keep safety when they were in bad condition. Transportation language register developed naturally as one of variety of language used.

Keywords: language register, language variety, communication, transportation

Procedia PDF Downloads 487
12098 Revolutionizing Healthcare Communication: The Transformative Role of Natural Language Processing and Artificial Intelligence

Authors: Halimat M. Ajose-Adeogun, Zaynab A. Bello

Abstract:

Artificial Intelligence (AI) and Natural Language Processing (NLP) have transformed computer language comprehension, allowing computers to comprehend spoken and written language with human-like cognition. NLP, a multidisciplinary area that combines rule-based linguistics, machine learning, and deep learning, enables computers to analyze and comprehend human language. NLP applications in medicine range from tackling issues in electronic health records (EHR) and psychiatry to improving diagnostic precision in orthopedic surgery and optimizing clinical procedures with novel technologies like chatbots. The technology shows promise in a variety of medical sectors, including quicker access to medical records, faster decision-making for healthcare personnel, diagnosing dysplasia in Barrett's esophagus, boosting radiology report quality, and so on. However, successful adoption requires training for healthcare workers, fostering a deep understanding of NLP components, and highlighting the significance of validation before actual application. Despite prevailing challenges, continuous multidisciplinary research and collaboration are critical for overcoming restrictions and paving the way for the revolutionary integration of NLP into medical practice. This integration has the potential to improve patient care, research outcomes, and administrative efficiency. The research methodology includes using NLP techniques for Sentiment Analysis and Emotion Recognition, such as evaluating text or audio data to determine the sentiment and emotional nuances communicated by users, which is essential for designing a responsive and sympathetic chatbot. Furthermore, the project includes the adoption of a Personalized Intervention strategy, in which chatbots are designed to personalize responses by merging NLP algorithms with specific user profiles, treatment history, and emotional states. The synergy between NLP and personalized medicine principles is critical for tailoring chatbot interactions to each user's demands and conditions, hence increasing the efficacy of mental health care. A detailed survey corroborated this synergy, revealing a remarkable 20% increase in patient satisfaction levels and a 30% reduction in workloads for healthcare practitioners. The poll, which focused on health outcomes and was administered to both patients and healthcare professionals, highlights the improved efficiency and favorable influence on the broader healthcare ecosystem.

Keywords: natural language processing, artificial intelligence, healthcare communication, electronic health records, patient care

Procedia PDF Downloads 76
12097 Poetic Music by the Poet, Commander of the Faithful, Muhammad Bello: Prosodical Study

Authors: Sirajo Muhammad Sokoto

Abstract:

The Commander of the Faithful, Muhammad Bello, is considered one of the most distinguished scholars and poetic geniuses who is famous for reciting poetry in the classical vertical style. He is also represented by pre-Islamic poets such as Imru’ al-Qays and Alqamah and among the Islamists such as Hassan bin Thabit, Amr bin Abi Rabi’ah, and others. The poet drew from the seas of the Arabic language and its styles at the hands of His father, Sheikh Othman Bin Fodio, and his uncle, Sheikh Abdullah Bin Fodio, are both things that made Muhammad Bello conversant with the Arabic language until he was able to write poetry in a beautiful format and good style. The Commander of the Faithful, Muhammad Bello, did not deviate from what the Arabs know of poetic elements, such as taking into account its meanings and music; Muhammadu Bello has used every Bahr of prosody and its technicals in many of his poems. This article prepares the reader for the efforts made by the poet Muhammad Bello in composing poems on poetic seas, taking into account musical tones for different purposes according to his desire. The article will also discuss the poet’s talent, skill, and eloquence.

Keywords: music, Muhammad Bello, poetry, performances

Procedia PDF Downloads 76
12096 Mask-Prompt-Rerank: An Unsupervised Method for Text Sentiment Transfer

Authors: Yufen Qin

Abstract:

Text sentiment transfer is an important branch of text style transfer. The goal is to generate text with another sentiment attribute based on a text with a specific sentiment attribute while maintaining the content and semantic information unrelated to sentiment unchanged in the process. There are currently two main challenges in this field: no parallel corpus and text attribute entanglement. In response to the above problems, this paper proposed a novel solution: Mask-Prompt-Rerank. Use the method of masking the sentiment words and then using prompt regeneration to transfer the sentence sentiment. Experiments on two sentiment benchmark datasets and one formality transfer benchmark dataset show that this approach makes the performance of small pre-trained language models comparable to that of the most advanced large models, while consuming two orders of magnitude less computing and memory.

Keywords: language model, natural language processing, prompt, text sentiment transfer

Procedia PDF Downloads 82
12095 User Guidance for Effective Query Interpretation in Natural Language Interfaces to Ontologies

Authors: Aliyu Isah Agaie, Masrah Azrifah Azmi Murad, Nurfadhlina Mohd Sharef, Aida Mustapha

Abstract:

Natural Language Interfaces typically support a restricted language and also have scopes and limitations that naïve users are unaware of, resulting in errors when the users attempt to retrieve information from ontologies. To overcome this challenge, an auto-suggest feature is introduced into the querying process where users are guided through the querying process using interactive query construction system. Guiding users to formulate their queries, while providing them with an unconstrained (or almost unconstrained) way to query the ontology results in better interpretation of the query and ultimately lead to an effective search. The approach described in this paper is unobtrusive and subtly guides the users, so that they have a choice of either selecting from the suggestion list or typing in full. The user is not coerced into accepting system suggestions and can express himself using fragments or full sentences.

Keywords: auto-suggest, expressiveness, habitability, natural language interface, query interpretation, user guidance

Procedia PDF Downloads 474
12094 A Word-to-Vector Formulation for Word Representation

Authors: Sandra Rizkallah, Amir F. Atiya

Abstract:

This work presents a novel word to vector representation that is based on embedding the words into a sphere, whereby the dot product of the corresponding vectors represents the similarity between any two words. Embedding the vectors into a sphere enabled us to take into consideration the antonymity between words, not only the synonymity, because of the suitability to handle the polarity nature of words. For example, a word and its antonym can be represented as a vector and its negative. Moreover, we have managed to extract an adequate vocabulary. The obtained results show that the proposed approach can capture the essence of the language, and can be generalized to estimate a correct similarity of any new pair of words.

Keywords: natural language processing, word to vector, text similarity, text mining

Procedia PDF Downloads 275
12093 The Formation of the Diminutive in Colloquial Jordanian Arabic

Authors: Yousef Barahmeh

Abstract:

This paper is a linguistic and pragmatic analysis of the use of the diminutive in Colloquial Jordanian Arabic (CJA). It demonstrates a peculiar form of the diminutive in CJA inflected by means of feminine plural ends with -aat suffix. The analysis shows that the pragmatic function(s) of the diminutive in CJA refers primarily to ‘littleness’ while the morphological inflection conveys the message of ‘the plethora’. Examples of this linguistic phenomenon are intelligible and often include a large number of words that are culture-specific to the rural dialect in the north of Jordan. In both cases, the diminutive in CJA is an adaptive strategy relative to its pragmatic and social contexts.

Keywords: Colloquial Jordanian Arabic, diminutive, morphology, pragmatics

Procedia PDF Downloads 267
12092 Tracking and Classifying Client Interactions with Personal Coaches

Authors: Kartik Thakore, Anna-Roza Tamas, Adam Cole

Abstract:

The world health organization (WHO) reports that by 2030 more than 23.7 million deaths annually will be caused by Cardiovascular Diseases (CVDs); with a 2008 economic impact of $3.76 T. Metabolic syndrome is a disorder of multiple metabolic risk factors strongly indicated in the development of cardiovascular diseases. Guided lifestyle intervention driven by live coaching has been shown to have a positive impact on metabolic risk factors. Individuals’ path to improved (decreased) metabolic risk factors are driven by personal motivation and personalized messages delivered by coaches and augmented by technology. Using interactions captured between 400 individuals and 3 coaches over a program period of 500 days, a preliminary model was designed. A novel real time event tracking system was created to track and classify clients based on their genetic profile, baseline questionnaires and usage of a mobile application with live coaching sessions. Classification of clients and coaches was done using a support vector machines application build on Apache Spark, Stanford Natural Language Processing Library (SNLPL) and decision-modeling.

Keywords: guided lifestyle intervention, metabolic risk factors, personal coaching, support vector machines application, Apache Spark, natural language processing

Procedia PDF Downloads 433
12091 Convolutional Neural Networks-Optimized Text Recognition with Binary Embeddings for Arabic Expiry Date Recognition

Authors: Mohamed Lotfy, Ghada Soliman

Abstract:

Recognizing Arabic dot-matrix digits is a challenging problem due to the unique characteristics of dot-matrix fonts, such as irregular dot spacing and varying dot sizes. This paper presents an approach for recognizing Arabic digits printed in dot matrix format. The proposed model is based on Convolutional Neural Networks (CNN) that take the dot matrix as input and generate embeddings that are rounded to generate binary representations of the digits. The binary embeddings are then used to perform Optical Character Recognition (OCR) on the digit images. To overcome the challenge of the limited availability of dotted Arabic expiration date images, we developed a True Type Font (TTF) for generating synthetic images of Arabic dot-matrix characters. The model was trained on a synthetic dataset of 3287 images and 658 synthetic images for testing, representing realistic expiration dates from 2019 to 2027 in the format of yyyy/mm/dd. Our model achieved an accuracy of 98.94% on the expiry date recognition with Arabic dot matrix format using fewer parameters and less computational resources than traditional CNN-based models. By investigating and presenting our findings comprehensively, we aim to contribute substantially to the field of OCR and pave the way for advancements in Arabic dot-matrix character recognition. Our proposed approach is not limited to Arabic dot matrix digit recognition but can also be extended to text recognition tasks, such as text classification and sentiment analysis.

Keywords: computer vision, pattern recognition, optical character recognition, deep learning

Procedia PDF Downloads 95
12090 AI-Based Techniques for Online Social Media Network Sentiment Analysis: A Methodical Review

Authors: A. M. John-Otumu, M. M. Rahman, O. C. Nwokonkwo, M. C. Onuoha

Abstract:

Online social media networks have long served as a primary arena for group conversations, gossip, text-based information sharing and distribution. The use of natural language processing techniques for text classification and unbiased decision-making has not been far-fetched. Proper classification of this textual information in a given context has also been very difficult. As a result, we decided to conduct a systematic review of previous literature on sentiment classification and AI-based techniques that have been used in order to gain a better understanding of the process of designing and developing a robust and more accurate sentiment classifier that can correctly classify social media textual information of a given context between hate speech and inverted compliments with a high level of accuracy by assessing different artificial intelligence techniques. We evaluated over 250 articles from digital sources like ScienceDirect, ACM, Google Scholar, and IEEE Xplore and whittled down the number of research to 31. Findings revealed that Deep learning approaches such as CNN, RNN, BERT, and LSTM outperformed various machine learning techniques in terms of performance accuracy. A large dataset is also necessary for developing a robust sentiment classifier and can be obtained from places like Twitter, movie reviews, Kaggle, SST, and SemEval Task4. Hybrid Deep Learning techniques like CNN+LSTM, CNN+GRU, CNN+BERT outperformed single Deep Learning techniques and machine learning techniques. Python programming language outperformed Java programming language in terms of sentiment analyzer development due to its simplicity and AI-based library functionalities. Based on some of the important findings from this study, we made a recommendation for future research.

Keywords: artificial intelligence, natural language processing, sentiment analysis, social network, text

Procedia PDF Downloads 115
12089 Algerian Case Study of Age Effect and Cross Linguistic Influence in Third Language Phonology Acquisition

Authors: Zouleykha Belabbes

Abstract:

Learning foreign languages is sine qua non in the era of globalization, mobility, and communications, which grants access and connectedness to the world. This urgent need is highlighted in monolingual settings, however, in multilingual contexts the case is, to some extent, complicated. In effect, research on bilingualism and multilingualism lead to the issue of Cross Linguistic Influence (CLI) which seeks to explain how and under which conditions prior linguistic knowledge of first language (L1) and / or second language (L2) influences the production, comprehension and development of a third language (L3) or additional language (Ln). Moreover, the issue of age is also one of the persistent topics in the field of language acquisition. This paper aims to scrutinize the effect of age and two previously known languages: Arabic (L1) and French (L2) in acquiring English (L3) phonology in Algerian context. The study consisted of 20 participants of different age range who were presented with recorded samples of English (L3). The findings confirm the results of some previous studies on the issue of Critical Period Hypothesis (CPH) and demonstrate a tendency for the L2 phonological transfer in L3 production at the initial stages of acquisition within young and later learners that for some circumstances diminished as L3 proficiency develop.

Keywords: acquisition, age effect, cross linguistic influence, L3 phonology

Procedia PDF Downloads 236
12088 One-Shot Text Classification with Multilingual-BERT

Authors: Hsin-Yang Wang, K. M. A. Salam, Ying-Jia Lin, Daniel Tan, Tzu-Hsuan Chou, Hung-Yu Kao

Abstract:

Detecting user intent from natural language expression has a wide variety of use cases in different natural language processing applications. Recently few-shot training has a spike of usage on commercial domains. Due to the lack of significant sample features, the downstream task performance has been limited or leads to an unstable result across different domains. As a state-of-the-art method, the pre-trained BERT model gathering the sentence-level information from a large text corpus shows improvement on several NLP benchmarks. In this research, we are proposing a method to change multi-class classification tasks into binary classification tasks, then use the confidence score to rank the results. As a language model, BERT performs well on sequence data. In our experiment, we change the objective from predicting labels into finding the relations between words in sequence data. Our proposed method achieved 71.0% accuracy in the internal intent detection dataset and 63.9% accuracy in the HuffPost dataset. Acknowledgment: This work was supported by NCKU-B109-K003, which is the collaboration between National Cheng Kung University, Taiwan, and SoftBank Corp., Tokyo.

Keywords: OSML, BERT, text classification, one shot

Procedia PDF Downloads 101
12087 Language Activation Theory: Unlocking Bilingual Language Processing

Authors: Leorisyl D. Siarot

Abstract:

It is conventional to see and hear Filipinos, in general, speak two or more languages. This phenomenon brings us to a closer look on how our minds process the input and produce an output with a specific chosen language. This study aimed to generate a theoretical model which explained the interaction of the first and the second languages in the human mind. After a careful analysis of the gathered data, a theoretical prototype called Language Activation Model was generated. For every string, there are three specialized banks: lexico-semantics, morphono-syntax, and pragmatics. These banks are interrelated to other banks of other language strings. As the bilingual learns more languages, a new string is replicated and is filled up with the information of the new language learned. The principles of the first and second languages' interaction are drawn; these are expressed in laws, namely: law of dominance, law of availability, law of usuality and law of preference. Furthermore, difficulties encountered in the learning of second languages were also determined.

Keywords: bilingualism, psycholinguistics, second language learning, languages

Procedia PDF Downloads 513
12086 Automated User Story Driven Approach for Web-Based Functional Testing

Authors: Mahawish Masud, Muhammad Iqbal, M. U. Khan, Farooque Azam

Abstract:

Manual writing of test cases from functional requirements is a time-consuming task. Such test cases are not only difficult to write but are also challenging to maintain. Test cases can be drawn from the functional requirements that are expressed in natural language. However, manual test case generation is inefficient and subject to errors.  In this paper, we have presented a systematic procedure that could automatically derive test cases from user stories. The user stories are specified in a restricted natural language using a well-defined template.  We have also presented a detailed methodology for writing our test ready user stories. Our tool “Test-o-Matic” automatically generates the test cases by processing the restricted user stories. The generated test cases are executed by using open source Selenium IDE.  We evaluate our approach on a case study, which is an open source web based application. Effectiveness of our approach is evaluated by seeding faults in the open source case study using known mutation operators.  Results show that the test case generation from restricted user stories is a viable approach for automated testing of web applications.

Keywords: automated testing, natural language, restricted user story modeling, software engineering, software testing, test case specification, transformation and automation, user story, web application testing

Procedia PDF Downloads 387
12085 Hate Speech Detection in Tunisian Dialect

Authors: Helmi Baazaoui, Mounir Zrigui

Abstract:

This study addresses the challenge of hate speech detection in Tunisian Arabic text, a critical issue for online safety and moderation. Leveraging the strengths of the AraBERT model, we fine-tuned and evaluated its performance against the Bi-LSTM model across four distinct datasets: T-HSAB, TNHS, TUNIZI-Dataset, and a newly compiled dataset with diverse labels such as Offensive Language, Racism, and Religious Intolerance. Our experimental results demonstrate that AraBERT significantly outperforms Bi-LSTM in terms of Recall, Precision, F1-Score, and Accuracy across all datasets. The findings underline the robustness of AraBERT in capturing the nuanced features of Tunisian Arabic and its superior capability in classification tasks. This research not only advances the technology for hate speech detection but also provides practical implications for social media moderation and policy-making in Tunisia. Future work will focus on expanding the datasets and exploring more sophisticated architectures to further enhance detection accuracy, thus promoting safer online interactions.

Keywords: hate speech detection, Tunisian Arabic, AraBERT, Bi-LSTM, Gemini annotation tool, social media moderation

Procedia PDF Downloads 13
12084 Recognition of Voice Commands of Mentor Robot in Noisy Environment Using Hidden Markov Model

Authors: Khenfer Koummich Fatma, Hendel Fatiha, Mesbahi Larbi

Abstract:

This paper presents an approach based on Hidden Markov Models (HMM: Hidden Markov Model) using HTK tools. The goal is to create a human-machine interface with a voice recognition system that allows the operator to teleoperate a mentor robot to execute specific tasks as rotate, raise, close, etc. This system should take into account different levels of environmental noise. This approach has been applied to isolated words representing the robot commands pronounced in two languages: French and Arabic. The obtained recognition rate is the same in both speeches, Arabic and French in the neutral words. However, there is a slight difference in favor of the Arabic speech when Gaussian white noise is added with a Signal to Noise Ratio (SNR) equals 30 dB, in this case; the Arabic speech recognition rate is 69%, and the French speech recognition rate is 80%. This can be explained by the ability of phonetic context of each speech when the noise is added.

Keywords: Arabic speech recognition, Hidden Markov Model (HMM), HTK, noise, TIMIT, voice command

Procedia PDF Downloads 388
12083 On the Weightlessness of Vowel Lengthening: Insights from Arabic Dialect of Yemen and Contribution to Psychoneurolinguistics

Authors: Sadeq Al Yaari, Muhammad Alkhunayn, Montaha Al Yaari, Ayman Al Yaari, Aayah Al Yaari, Adham Al Yaari, Sajedah Al Yaari, Fatehi Eissa

Abstract:

Introduction: It is well established that lengthening (longer duration) is considered one of the correlates of lexical and phrasal prominence. However, it is unexplored whether the scope of vowel lengthening in the Arabic dialect of Yemen (ADY) is differently affected by educated and/or uneducated speakers from different dialectal backgrounds. Specifically, the research aims to examine whether or not linguistic background acquired through different educational channels makes a difference in the speech of the speaker and how that is reflected in related psychoneurolinguistic impairments. Methods: For the above mentioned purpose, we conducted an articulatory experiment wherein a set of words from ADY were examined in the dialectal speech of thousand and seven hundred Yemeni educated and uneducated speakers aged 19-61 years growing up in five regions of the country: Northern, southern, eastern, western and central and were, accordingly, assigned into five dialectal groups. A seven-minute video clip was shown to the participants, who have been asked to spontaneously describe the scene they had just watched before the researchers linguistically and statistically analyzed recordings to weigh vowel lengthening in the speech of the participants. Results: The results show that vowels (monophthongs and diphthongs) are lengthened by all participants. Unexpectedly, educated and uneducated speakers from northern and central dialects lengthen vowels. Compared with uneducated speakers from the same dialect, educated speakers lengthen fewer vowels in their dialectal speech. Conclusions: These findings support the notion that extensive exposure to dialects on account of standard language can cause changes to the patterns of dialects themselves, and this can be seen in the speech of educated and uneducated speakers of these dialects. Further research is needed to clarify the phonemic distinctive features and frequency of lengthening in other open class systems (i.e., nouns, adjectives, and adverbs). Phonetic and phonological report measures are needed as well as validation of existing measures for assessing phonemic vowel length in the Arabic population in general and Arabic individuals with voice, speech, and language impairments in particular.

Keywords: vowel lengthening, Arabic dialect of Yemen, phonetics, phonology, impairment, distinctive features

Procedia PDF Downloads 42
12082 Acoustic Characteristics of Ḫijaiyaḫ Letters Pronunciation by Indonesian Native Speaker

Authors: Romi Hardiyansyah, Raden Sugeng Joko Sarwono, Agus Samsi

Abstract:

Indonesian people have a mother language but not Arabic. Meanwhile, they must be able to pronounce the Arabic because Islam is the biggest religion in Indonesia. Arabic is composed by ḫijaiyaḫ letters which has its own pronunciation. Sound production process in humans can be divided into three physiological processes, namely: the formation of airflow from the lungs, the change in airflow from the lungs into the sound, and articulation (the modulation/sound setting into a specific sound). Ḫijaiyaḫ letters has its own articulation, some of which seem strange for most people in Indonesia. Those letters come out from the middle and upper throat so that the letters has its own acoustic characteristics. Acoustic characteristics of voice can be observed by source-filter approach that has parameters: pitch, formant, and formant bandwidth. Pitch is the basic tone in every human being. Formant is the resonance frequency of the human voice. Formant bandwidth is the time-width of a formant. After recording the sound from 21 subjects, data is processed by software Praat version 5.3.39. The analysis showed that each pronunciation, syakal (vowel changer), and the place of discharge letters has the same timbre which are determined by third and fourth formant.

Keywords: ḫijaiyaḫ, articulation, pitch, formant, formant bandwidth, timbre

Procedia PDF Downloads 396
12081 Modeling False Statements in Texts

Authors: Francielle A. Vargas, Thiago A. S. Pardo

Abstract:

According to the standard philosophical definition, lying is saying something that you believe to be false with the intent to deceive. For deception detection, the FBI trains its agents in a technique named statement analysis, which attempts to detect deception based on parts of speech (i.e., linguistics style). This method is employed in interrogations, where the suspects are first asked to make a written statement. In this poster, we model false statements using linguistics style. In order to achieve this, we methodically analyze linguistic features in a corpus of fake news in the Portuguese language. The results show that they present substantial lexical, syntactic and semantic variations, as well as punctuation and emotion distinctions.

Keywords: deception detection, linguistics style, computational linguistics, natural language processing

Procedia PDF Downloads 218
12080 Machine Learning Strategies for Data Extraction from Unstructured Documents in Financial Services

Authors: Delphine Vendryes, Dushyanth Sekhar, Baojia Tong, Matthew Theisen, Chester Curme

Abstract:

Much of the data that inform the decisions of governments, corporations and individuals are harvested from unstructured documents. Data extraction is defined here as a process that turns non-machine-readable information into a machine-readable format that can be stored, for instance, in a database. In financial services, introducing more automation in data extraction pipelines is a major challenge. Information sought by financial data consumers is often buried within vast bodies of unstructured documents, which have historically required thorough manual extraction. Automated solutions provide faster access to non-machine-readable datasets, in a context where untimely information quickly becomes irrelevant. Data quality standards cannot be compromised, so automation requires high data integrity. This multifaceted task is broken down into smaller steps: ingestion, table parsing (detection and structure recognition), text analysis (entity detection and disambiguation), schema-based record extraction, user feedback incorporation. Selected intermediary steps are phrased as machine learning problems. Solutions leveraging cutting-edge approaches from the fields of computer vision (e.g. table detection) and natural language processing (e.g. entity detection and disambiguation) are proposed.

Keywords: computer vision, entity recognition, finance, information retrieval, machine learning, natural language processing

Procedia PDF Downloads 113
12079 Eco-Friendly Polymeric Corrosion Inhibitor for Sour Oilfield Environment

Authors: Alireza Rahimi, Abdolreza Farhadian, Arash Tajik, Elaheh Sadeh, Avni Berisha, Esmaeil Akbari Nezhad

Abstract:

Although natural polymers have been shown to have some inhibitory properties on sour corrosion, they are not considered very effective green corrosion inhibitors. Accordingly, effective corrosion inhibitors should be developed based on natural resources to mitigate sour corrosion in the oil and gas industry. Here, Arabic gum was employed as an eco-friendly precursor for the synthesis of innovative polyurethanes designed as highly efficient corrosion inhibitors for sour oilfield solutions. A comprehensive assessment, combining experimental and computational analyses, was conducted to evaluate the inhibitory performance of the inhibitor. Electrochemical measurements demonstrated that a concentration of 200 mM of the inhibitor offered substantial protection to mild steel against sour corrosion, yielding inhibition efficiencies of 98% and 95% at 25 ºC and 60 ºC, respectively. Additionally, the presence of the inhibitor led to a smoother steel surface, indicating the adsorption of polyurethane molecules onto the metal surface. X-ray photoelectron spectroscopy results further validated the chemical adsorption of the inhibitor on mild steel surfaces. Scanning Kelvin probe microscopy revealed a shift in the potential distribution of the steel surface towards negative values, indicating inhibitor adsorption and corrosion process inhibition. Molecular dynamic simulation indicated high adsorption energy values for the inhibitor, suggesting its spontaneous adsorption onto the Fe (110) surface. These findings underscore the potential of Arabic gum as a viable resource for the development of polyurethanes under mild conditions, serving as effective corrosion inhibitors for sour solutions.

Keywords: environmental effect, Arabic gum, corrosion inhibitor, sour corrosion, molecular dynamics simulation

Procedia PDF Downloads 62
12078 Arabic Literature of Nigerian Authorship and the Spread of Values and Morality in Society: A Study from Isa Abukar Alabi’s “Ar-Riyaadh”

Authors: Tajudeen Yusuf

Abstract:

Arabic Literature of Nigerian Authorship, like others, has contributed widely to the spread of morality and values in human Society. There is no doubt that the relationship between literature and society has been widely conceived, for it reflects society and serves as a means of social control. Indeed, the influence of literature on attitude and human behaviors cannot be underestimated. Focused on some selected themes and verses in a literary work of Isa Abubakar Alabi known as (Ar-Ryaadh), the paper aims to reveal the contributions of the Arabic literary icon of Nigerian origin in spreading values and morality in society through his literary works. The study employs a descriptive method. Isa Abubakar Alabi, a Nigerian Arabic scholar, is known as a wise and famous poet not only in Nigeria but throughout West Africa and Arab countries at large. He has produced a sort of poetry that is distinguished with superiority in spreading peace, harmony, societal values and morality. Indeed, his literary works address humanism, kindness, honesty, law, justice, truthfulness, and patriotism, which may positively influence humans.

Keywords: Arabic, literature, moral, Nigeria, values

Procedia PDF Downloads 83
12077 Interaction between Cognitive Control and Language Processing in Non-Fluent Aphasia

Authors: Izabella Szollosi, Klara Marton

Abstract:

Aphasia can be defined as a weakness in accessing linguistic information. Accessing linguistic information is strongly related to information processing, which in turn is associated with the cognitive control system. According to the literature, a deficit in the cognitive control system interferes with language processing and contributes to non-fluent speech performance. The aim of our study was to explore this hypothesis by investigating how cognitive control interacts with language performance in participants with non-fluent aphasia. Cognitive control is a complex construct that includes working memory (WM) and the ability to resist proactive interference (PI). Based on previous research, we hypothesized that impairments in domain-general (DG) cognitive control abilities have negative effects on language processing. In contrast, better DG cognitive control functioning supports goal-directed behavior in language-related processes as well. Since stroke itself might slow down information processing, it is important to examine its negative effects on both cognitive control and language processing. Participants (N=52) in our study were individuals with non-fluent Broca’s aphasia (N = 13), with transcortical motor aphasia (N=13), individuals with stroke damage without aphasia (N=13), and unimpaired speakers (N = 13). All participants performed various computer-based tasks targeting cognitive control functions such as WM and resistance to PI in both linguistic and non-linguistic domains. Non-linguistic tasks targeted primarily DG functions, while linguistic tasks targeted more domain specific (DS) processes. The results showed that participants with Broca’s aphasia differed from the other three groups in the non-linguistic tasks. They performed significantly worse even in the baseline conditions. In contrast, we found a different performance profile in the linguistic domain, where the control group differed from all three stroke-related groups. The three groups with impairment performed more poorly than the controls but similar to each other in the verbal baseline condition. In the more complex verbal PI condition, however, participants with Broca’s aphasia performed significantly worse than all the other groups. Participants with Broca’s aphasia demonstrated the most severe language impairment and the highest vulnerability in tasks measuring DG cognitive control functions. Results support the notion that the more severe the cognitive control impairment, the more severe the aphasia. Thus, our findings suggest a strong interaction between cognitive control and language. Individuals with the most severe and most general cognitive control deficit - participants with Broca’s aphasia - showed the most severe language impairment. Individuals with better DG cognitive control functions demonstrated better language performance. While all participants with stroke damage showed impaired cognitive control functions in the linguistic domain, participants with better language skills performed also better in tasks that measured non-linguistic cognitive control functions. The overall results indicate that the level of cognitive control deficit interacts with the language functions in individuals along with the language spectrum (from severe to no impairment). However, future research is needed to determine any directionality.

Keywords: cognitive control, information processing, language performance, non-fluent aphasia

Procedia PDF Downloads 123
12076 English Syllabus in the Iranian Education System

Authors: Shaghayegh Mirshekari, Atiyeh Ghorbani

Abstract:

EThe Iranian system of education has been politically influenced by the thoughts of the governing religious party. It has brought many religious books into the educational system from grade one up to graduation from high school, and therefore, teaching English as a non-Islamic language has been put aside the system, focusing on the Islamic language of Arabic. Teaching English has been widely talked about in international academia, but the Iranian educational system has not brought in any of its outcomes due to the general policy of keeping people away from international Western thoughts. Because of the increasing interest among Iranians in learning English, this language is being taught and studied in public and private schools, commercial and adult schools, language institutes, colleges, universities, and numerous homes throughout the country. Methods and techniques of teaching English, the attitude of the teachers and learners towards the language, and the availability of textbooks and other language materials are quite different in any one of the different institutions. This paper has evaluated the outcome of the Iranian educational system in teaching English in terms of their methods of teaching, as well as the policies regarding the educational system. The results show that not only has there been no progress in the system in terms of teaching English, rather there is backwardness in this regard due to the political policy of preventing people from learning English. Therefore, we see the majority of the youth not speaking English properly at the age where they need to enter the international arena.

Keywords: English, public school, language, Iran, teaching

Procedia PDF Downloads 66