Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 7041

Search results for: language processing

7041 Resource Creation Using Natural Language Processing Techniques for Malay Translated Qur'an

Authors: Nor Diana Ahmad, Eric Atwell, Brandon Bennett

Abstract:

Text processing techniques for English have been developed for several decades. But for the Malay language, text processing methods are still far behind. Moreover, there are limited resources, tools for computational linguistic analysis available for the Malay language. Therefore, this research presents the use of natural language processing (NLP) in processing Malay translated Qur’an text. As the result, a new language resource for Malay translated Qur’an was created. This resource will help other researchers to build the necessary processing tools for the Malay language. This research also develops a simple question-answer prototype to demonstrate the use of the Malay Qur’an resource for text processing. This prototype has been developed using Python. The prototype pre-processes the Malay Qur’an and an input query using a stemming algorithm and then searches for occurrences of the query word stem. The result produced shows improved matching likelihood between user query and its answer. A POS-tagging algorithm has also been produced. The stemming and tagging algorithms can be used as tools for research related to other Malay texts and can be used to support applications such as information retrieval, question answering systems, ontology-based search and other text analysis tasks.

Keywords: language resource, Malay translated Qur'an, natural language processing (NLP), text processing

Procedia PDF Downloads 318

7040 Role of Natural Language Processing in Information Retrieval; Challenges and Opportunities

Authors: Khaled M. Alhawiti

Abstract:

This paper aims to analyze the role of natural language processing (NLP). The paper will discuss the role in the context of automated data retrieval, automated question answer, and text structuring. NLP techniques are gaining wider acceptance in real life applications and industrial concerns. There are various complexities involved in processing the text of natural language that could satisfy the need of decision makers. This paper begins with the description of the qualities of NLP practices. The paper then focuses on the challenges in natural language processing. The paper also discusses major techniques of NLP. The last section describes opportunities and challenges for future research.

Keywords: data retrieval, information retrieval, natural language processing, text structuring

Procedia PDF Downloads 340

7039 Impact of Natural Language Processing in Educational Setting: An Effective Approach towards Improved Learning

Authors: Khaled M. Alhawiti

Abstract:

Natural Language Processing (NLP) is an effective approach for bringing improvement in educational setting. This involves initiating the process of learning through the natural acquisition in the educational systems. It is based on following effective approaches for providing the solution for various problems and issues in education. Natural Language Processing provides solution in a variety of different fields associated with the social and cultural context of language learning. It is based on involving various tools and techniques such as grammar, syntax, and structure of text. It is effective approach for teachers, students, authors, and educators for providing assistance for writing, analysis, and assessment procedure. Natural Language Processing is widely integrated in the large number of educational contexts such as research, science, linguistics, e-learning, evaluations system, and various other educational settings such as schools, higher education system, and universities. Natural Language Processing is based on applying scientific approach in the educational settings. In the educational settings, NLP is an effective approach to ensure that students can learn easily in the same way as they acquired language in the natural settings.

Keywords: natural language processing, education, application, e-learning, scientific studies, educational system

Procedia PDF Downloads 503

7038 A Review of Research on Pre-training Technology for Natural Language Processing

Authors: Moquan Gong

Abstract:

In recent years, with the rapid development of deep learning, pre-training technology for natural language processing has made great progress. The early field of natural language processing has long used word vector methods such as Word2Vec to encode text. These word vector methods can also be regarded as static pre-training techniques. However, this context-free text representation brings very limited improvement to subsequent natural language processing tasks and cannot solve the problem of word polysemy. ELMo proposes a context-sensitive text representation method that can effectively handle polysemy problems. Since then, pre-training language models such as GPT and BERT have been proposed one after another. Among them, the BERT model has significantly improved its performance on many typical downstream tasks, greatly promoting the technological development in the field of natural language processing, and has since entered the field of natural language processing. The era of dynamic pre-training technology. Since then, a large number of pre-trained language models based on BERT and XLNet have continued to emerge, and pre-training technology has become an indispensable mainstream technology in the field of natural language processing. This article first gives an overview of pre-training technology and its development history, and introduces in detail the classic pre-training technology in the field of natural language processing, including early static pre-training technology and classic dynamic pre-training technology; and then briefly sorts out a series of enlightening technologies. Pre-training technology, including improved models based on BERT and XLNet; on this basis, analyze the problems faced by current pre-training technology research; finally, look forward to the future development trend of pre-training technology.

Keywords: natural language processing, pre-training, language model, word vectors

Procedia PDF Downloads 57

7037 Natural Language Processing; the Future of Clinical Record Management

Authors: Khaled M. Alhawiti

Abstract:

This paper investigates the future of medicine and the use of Natural language processing. The importance of having correct clinical information available online is remarkable; improving patient care at affordable costs could be achieved using automated applications to use the online clinical information. The major challenge towards the retrieval of such vital information is to have it appropriately coded. Majority of the online patient reports are not found to be coded and not accessible as its recorded in natural language text. The use of Natural Language processing provides a feasible solution by retrieving and organizing clinical information, available in text and transforming clinical data that is available for use. Systems used in NLP are rather complex to construct, as they entail considerable knowledge, however significant development has been made. Newly formed NLP systems have been tested and have established performance that is promising and considered as practical clinical applications.

Keywords: clinical information, information retrieval, natural language processing, automated applications

Procedia PDF Downloads 404

7036 The Output Fallacy: An Investigation into Input, Noticing, and Learners’ Mechanisms

Authors: Samantha Rix

Abstract:

The purpose of this research paper is to investigate the cognitive processing of learners who receive input but produce very little or no output, and who, when they do produce output, exhibit a similar language proficiency as do those learners who produced output more regularly in the language classroom. Previous studies have investigated the benefits of output (with somewhat differing results); therefore, the presentation will begin with an investigation of what may underlie gains in proficiency without output. Consequently, a pilot study was designed and conducted to gain insight into the cognitive processing of low-output language learners looking, for example, at quantity and quality of noticing. This will be carried out within the paradigm of action classroom research, observing and interviewing low-output language learners in an intensive English program at a small Midwest university. The results of the pilot study indicated that autonomy in language learning, specifically utilizing strategies such self-monitoring, self-talk, and thinking 'out-loud', were crucial in the development of language proficiency for academic-level performance. The presentation concludes with an examination of pedagogical implication for classroom use in order to aide students in their language development.

Keywords: cognitive processing, language learners, language proficiency, learning strategies

Procedia PDF Downloads 475

7035 Understanding the Heart of the Matter: A Pedagogical Framework for Apprehending Successful Second Language Development

Authors: Cinthya Olivares Garita

Abstract:

Untangling language processing in second language development has been either a taken-for-granted and overlooked task for some English language teaching (ELT) instructors or a considerable feat for others. From the most traditional language instruction to the most communicative methodologies, how to assist L2 learners in processing language in the classroom has become a challenging matter in second language teaching. Amidst an ample array of methods, strategies, and techniques to teach a target language, finding a suitable model to lead learners to process, interpret, and negotiate meaning to communicate in a second language has imposed a great responsibility on language teachers; committed teachers are those who are aware of their role in equipping learners with the appropriate tools to communicate in the target language in a 21stcentury society. Unfortunately, one might find some English language teachers convinced that their job is only to lecture students; others are advocates of textbook-based instruction that might hinder second language processing, and just a few might courageously struggle to facilitate second language learning effectively. Grounded on the most representative empirical studies on comprehensible input, processing instruction, and focus on form, this analysis aims to facilitate the understanding of how second language learners process and automatize input and propose a pedagogical framework for the successful development of a second language. In light of this, this paper is structured to tackle noticing and attention and structured input as the heart of processing instruction, comprehensible input as the missing link in second language learning, and form-meaning connections as opposed to traditional grammar approaches to language teaching. The author finishes by suggesting a pedagogical framework involving noticing-attention-comprehensible-input-form (NACIF based on their acronym) to support ELT instructors, teachers, and scholars on the challenging task of facilitating the understanding of effective second language development.

Keywords: second language development, pedagogical framework, noticing, attention, comprehensible input, form

Procedia PDF Downloads 28

7034 Prediction, Production, and Comprehension: Exploring the Influence of Salience in Language Processing

Authors: Andy H. Clark

Abstract:

This research looks into the relationship between language comprehension and production with a specific focus on the role of salience in shaping these processes. Salience, our most immediate perception of what is most probable out of all possible situations and outcomes strongly affects our perception and action in language production and comprehension. This study investigates the impact of geographic and emotional attachments to the target language on the differences in the learners’ comprehension and production abilities. Using quantitative research methods (Qualtrics, SPSS), this study examines preferential choices of two groups of Japanese English language learners: those residing in the United States and those in Japan. By comparing and contrasting these two groups, we hope to gain a better understanding of how salience of linguistics cues influences language processing.

Keywords: intercultural pragmatics, salience, production, comprehension, pragmatics, action, perception, cognition

Procedia PDF Downloads 73

7033 How Western Donors Allocate Official Development Assistance: New Evidence From a Natural Language Processing Approach

Authors: Daniel Benson, Yundan Gong, Hannah Kirk

Abstract:

Advancement in national language processing techniques has led to increased data processing speeds, and reduced the need for cumbersome, manual data processing that is often required when processing data from multilateral organizations for specific purposes. As such, using named entity recognition (NER) modeling and the Organisation of Economically Developed Countries (OECD) Creditor Reporting System database, we present the first geotagged dataset of OECD donor Official Development Assistance (ODA) projects on a global, subnational basis. Our resulting data contains 52,086 ODA projects geocoded to subnational locations across 115 countries, worth a combined $87.9bn. This represents the first global, OECD donor ODA project database with geocoded projects. We use this new data to revisit old questions of how ‘well’ donors allocate ODA to the developing world. This understanding is imperative for policymakers seeking to improve ODA effectiveness.

Keywords: international aid, geocoding, subnational data, natural language processing, machine learning

Procedia PDF Downloads 78

7032 An Event-Related Potentials Study on the Processing of English Subjunctive Mood by Chinese ESL Learners

Authors: Yan Huang

Abstract:

Event-related potentials (ERPs) technique helps researchers to make continuous measures on the whole process of language comprehension, with an excellent temporal resolution at the level of milliseconds. The research on sentence processing has developed from the behavioral level to the neuropsychological level, which brings about a variety of sentence processing theories and models. However, the applicability of these models to L2 learners is still under debate. Therefore, the present study aims to investigate the neural mechanisms underlying English subjunctive mood processing by Chinese ESL learners. To this end, English subject clauses with subjunctive moods are used as the stimuli, all of which follow the same syntactic structure, “It is + adjective + that … + (should) do + …” Besides, in order to examine the role that language proficiency plays on L2 processing, this research deals with two groups of Chinese ESL learners (18 males and 22 females, mean age=21.68), namely, high proficiency group (Group H) and low proficiency group (Group L). Finally, the behavioral and neurophysiological data analysis reveals the following findings: 1) Syntax and semantics interact with each other on the SECOND phase (300-500ms) of sentence processing, which is partially in line with the Three-phase Sentence Model; 2) Language proficiency does affect L2 processing. Specifically, for Group H, it is the syntactic processing that plays the dominant role in sentence processing while for Group L, semantic processing also affects the syntactic parsing during the THIRD phase of sentence processing (500-700ms). Besides, Group H, compared to Group L, demonstrates a richer native-like ERPs pattern, which further demonstrates the role of language proficiency in L2 processing. Based on the research findings, this paper also provides some enlightenment for the L2 pedagogy as well as the L2 proficiency assessment.

Keywords: Chinese ESL learners, English subjunctive mood, ERPs, L2 processing

Procedia PDF Downloads 131

7031 Resume Ranking Using Custom Word2vec and Rule-Based Natural Language Processing Techniques

Authors: Subodh Chandra Shakya, Rajendra Sapkota, Aakash Tamang, Shushant Pudasaini, Sujan Adhikari, Sajjan Adhikari

Abstract:

Lots of efforts have been made in order to measure the semantic similarity between the text corpora in the documents. Techniques have been evolved to measure the similarity of two documents. One such state-of-art technique in the field of Natural Language Processing (NLP) is word to vector models, which converts the words into their word-embedding and measures the similarity between the vectors. We found this to be quite useful for the task of resume ranking. So, this research paper is the implementation of the word2vec model along with other Natural Language Processing techniques in order to rank the resumes for the particular job description so as to automate the process of hiring. The research paper proposes the system and the findings that were made during the process of building the system.

Keywords: chunking, document similarity, information extraction, natural language processing, word2vec, word embedding

Procedia PDF Downloads 158

7030 Language Processing of Seniors with Alzheimer’s Disease: From the Perspective of Temporal Parameters

Authors: Lai Yi-Hsiu

Abstract:

The present paper aims to examine the language processing of Chinese-speaking seniors with Alzheimer’s disease (AD) from the perspective of temporal cues. Twenty healthy adults, 17 healthy seniors, and 13 seniors with AD in Taiwan participated in this study to tell stories based on two sets of pictures. Nine temporal cues were fetched and analyzed. Oral productions in Mandarin Chinese were compared and discussed to examine to what extent and in what way these three groups of participants performed with significant differences. Results indicated that the age effects were significant in filled pauses. The dementia effects were significant in mean duration of pauses, empty pauses, filled pauses, lexical pauses, normalized mean duration of filled pauses and lexical pauses. The findings reported in the current paper help characterize the nature of language processing in seniors with or without AD, and contribute to the interactions between the AD neural mechanism and their temporal parameters.

Keywords: language processing, Alzheimer’s disease, Mandarin Chinese, temporal cues

Procedia PDF Downloads 446

7029 Application of Natural Language Processing in Education

Authors: Khaled M. Alhawiti

Abstract:

Reading capability is a major segment of language competency. On the other hand, discovering topical writings at a fitting level for outside and second language learners is a test for educators. We address this issue utilizing natural language preparing innovation to survey reading level and streamline content. In the connection of outside and second-language learning, existing measures of reading level are not appropriate to this errand. Related work has demonstrated the profit of utilizing measurable language preparing procedures; we expand these thoughts and incorporate other potential peculiarities to measure intelligibility. In the first piece of this examination, we join characteristics from measurable language models, customary reading level measures and other language preparing apparatuses to deliver a finer technique for recognizing reading level. We examine the execution of human annotators and assess results for our finders concerning human appraisals. A key commitment is that our identifiers are trainable; with preparing and test information from the same space, our finders beat more general reading level instruments (Flesch-Kincaid and Lexile). Trainability will permit execution to be tuned to address the needs of specific gatherings or understudies.

Keywords: natural language processing, trainability, syntactic simplification tools, education

Procedia PDF Downloads 490

7028 Gender Bias in Natural Language Processing: Machines Reflect Misogyny in Society

Authors: Irene Yi

Abstract:

Machine learning, natural language processing, and neural network models of language are becoming more and more prevalent in the fields of technology and linguistics today. Training data for machines are at best, large corpora of human literature and at worst, a reflection of the ugliness in society. Machines have been trained on millions of human books, only to find that in the course of human history, derogatory and sexist adjectives are used significantly more frequently when describing females in history and literature than when describing males. This is extremely problematic, both as training data, and as the outcome of natural language processing. As machines start to handle more responsibilities, it is crucial to ensure that they do not take with them historical sexist and misogynistic notions. This paper gathers data and algorithms from neural network models of language having to deal with syntax, semantics, sociolinguistics, and text classification. Results are significant in showing the existing intentional and unintentional misogynistic notions used to train machines, as well as in developing better technologies that take into account the semantics and syntax of text to be more mindful and reflect gender equality. Further, this paper deals with the idea of non-binary gender pronouns and how machines can process these pronouns correctly, given its semantic and syntactic context. This paper also delves into the implications of gendered grammar and its effect, cross-linguistically, on natural language processing. Languages such as French or Spanish not only have rigid gendered grammar rules, but also historically patriarchal societies. The progression of society comes hand in hand with not only its language, but how machines process those natural languages. These ideas are all extremely vital to the development of natural language models in technology, and they must be taken into account immediately.

Keywords: gendered grammar, misogynistic language, natural language processing, neural networks

Procedia PDF Downloads 120

7027 Literacy in First and Second Language: Implication for Language Education

Authors: Inuwa Danladi Bawa

Abstract:

One of the challenges of African states in the development of education in the past and the present is the problem of literacy. Literacy in the first language is seen as a strong base for the development of second language; they are mostly the language of education. Language development is an offshoot of language planning; so the need to develop literacy in both first and second language affects language education and predicts the extent of achievement of the entire education sector. The need to balance literacy acquisition in first language for good conditioning the acquisition of second language is paramount. Likely constraints that includes; non-standardization, underdeveloped and undeveloped first languages are among many. Solutions to some of these include the development of materials and use of the stages and levels of literacy acquisition. This is with believed that a child writes well in second language if he has literacy in the first language.

Keywords: first language, second language, literacy, english language, linguistics

Procedia PDF Downloads 452

7026 Social-Cognitive Aspects of Interpretation: Didactic Approaches in Language Processing and English as a Second Language Difficulties in Dyslexia

Authors: Schnell Zsuzsanna

Abstract:

Background: The interpretation of written texts, language processing in the visual domain, in other words, atypical reading abilities, also known as dyslexia, is an ever-growing phenomenon in today’s societies and educational communities. The much-researched problem affects cognitive abilities and, coupled with normal intelligence normally manifests difficulties in the differentiation of sounds and orthography and in the holistic processing of written words. The factors of susceptibility are varied: social, cognitive psychological, and linguistic factors interact with each other. Methods: The research will explain the psycholinguistics of dyslexia on the basis of several empirical experiments and demonstrate how domain-general abilities of inhibition, retrieval from the mental lexicon, priming, phonological processing, and visual modality transfer affect successful language processing and interpretation. Interpretation of visual stimuli is hindered, and the problem seems to be embedded in a sociocultural, psycholinguistic, and cognitive background. This makes the picture even more complex, suggesting that the understanding and resolving of the issues of dyslexia has to be interdisciplinary, aided by several disciplines in the field of humanities and social sciences, and should be researched from an empirical approach, where the practical, educational corollaries can be analyzed on an applied basis. Aim and applicability: The lecture sheds light on the applied, cognitive aspects of interpretation, social cognitive traits of language processing, the mental underpinnings of cognitive interpretation strategies in different languages (namely, Hungarian and English), offering solutions with a few applied techniques for success in foreign language learning that can be useful advice for the developers of testing methodologies and measures across ESL teaching and testing platforms.

Keywords: dyslexia, social cognition, transparency, modalities

Procedia PDF Downloads 84

7025 Benchmarking Bert-Based Low-Resource Language: Case Uzbek NLP Models

Authors: Jamshid Qodirov, Sirojiddin Komolov, Ravilov Mirahmad, Olimjon Mirzayev

Abstract:

Nowadays, natural language processing tools play a crucial role in our daily lives, including various techniques with text processing. There are very advanced models in modern languages, such as English, Russian etc. But, in some languages, such as Uzbek, the NLP models have been developed recently. Thus, there are only a few NLP models in Uzbek language. Moreover, there is no such work that could show which Uzbek NLP model behaves in different situations and when to use them. This work tries to close this gap and compares the Uzbek NLP models existing as of the time this article was written. The authors try to compare the NLP models in two different scenarios: sentiment analysis and sentence similarity, which are the implementations of the two most common problems in the industry: classification and similarity. Another outcome from this work is two datasets for classification and sentence similarity in Uzbek language that we generated ourselves and can be useful in both industry and academia as well.

Keywords: NLP, benchmak, bert, vectorization

Procedia PDF Downloads 54

7024 Grounding Chinese Language Vocabulary Teaching and Assessment in the Working Memory Research

Authors: Chan Kwong Tung

Abstract:

Since Baddeley and Hitch’s seminal research in 1974 on working memory (WM), this topic has been of great interest to language educators. Although there are some variations in the definitions of WM, recent findings in WM have contributed vastly to our understanding of language learning, especially its effects on second language acquisition (SLA). For example, the phonological component of WM (PWM) and the executive component of WM (EWM) have been found to be positively correlated with language learning. This paper discusses two general, yet highly relevant WM findings that could directly affect the effectiveness of Chinese Language (CL) vocabulary teaching and learning, as well as the quality of its assessment. First, PWM is found to be critical for the long-term learning of phonological forms of new words. Second, EWM is heavily involved in interpreting the semantic characteristics of new words, which consequently affects the quality of learners’ reading comprehension. These two ideas are hardly discussed in the Chinese literature, both conceptual and empirical. While past vocabulary acquisition studies have mainly focused on the cognitive-processing approach, active processing, ‘elaborate processing’ (or lexical elaboration) and other effective learning tasks and strategies, it is high time to balance the spotlight to the WM (particularly PWM and EWM) to ensure an optimum control on the teaching and learning effectiveness of such approaches, as well as the validity of this language assessment. Given the unique phonological, orthographical and morphological properties of the CL, this discussion will shed some light on the vocabulary acquisition of this Sino-Tibetan language family member. Together, these two WM concepts could have crucial implications for the design, development, and planning of vocabularies and ultimately reading comprehension teaching and assessment in language education. Hopefully, this will raise an awareness and trigger a dialogue about the meaning of these findings for future language teaching, learning, and assessment.

Keywords: Chinese Language, working memory, vocabulary assessment, vocabulary teaching

Procedia PDF Downloads 344

7023 Transportation Language Register as One of Language Community

Authors: Diyah Atiek Mustikawati

Abstract:

Language register refers to a variety of a language used for particular purpose or in a particular social setting. Language register also means as a concept of adapting one’s use of language to conform to standards or tradition in a given professional or social situation. This descriptive study tends to discuss about the form of language register in transportation aspect, factors, also the function of use it. Mostly, language register in transportation aspect uses short sentences in form of informal register. The factor caused language register used are speaker, word choice, background of language. The functions of language register in transportations aspect are to make communication between crew easily, also to keep safety when they were in bad condition. Transportation language register developed naturally as one of variety of language used.

Keywords: language register, language variety, communication, transportation

Procedia PDF Downloads 487

7022 Language Activation Theory: Unlocking Bilingual Language Processing

Authors: Leorisyl D. Siarot

Abstract:

It is conventional to see and hear Filipinos, in general, speak two or more languages. This phenomenon brings us to a closer look on how our minds process the input and produce an output with a specific chosen language. This study aimed to generate a theoretical model which explained the interaction of the first and the second languages in the human mind. After a careful analysis of the gathered data, a theoretical prototype called Language Activation Model was generated. For every string, there are three specialized banks: lexico-semantics, morphono-syntax, and pragmatics. These banks are interrelated to other banks of other language strings. As the bilingual learns more languages, a new string is replicated and is filled up with the information of the new language learned. The principles of the first and second languages' interaction are drawn; these are expressed in laws, namely: law of dominance, law of availability, law of usuality and law of preference. Furthermore, difficulties encountered in the learning of second languages were also determined.

Keywords: bilingualism, psycholinguistics, second language learning, languages

Procedia PDF Downloads 512

7021 Morphological Processing of Punjabi Text for Sentiment Analysis of Farmer Suicides

Authors: Jaspreet Singh, Gurvinder Singh, Prabhsimran Singh, Rajinder Singh, Prithvipal Singh, Karanjeet Singh Kahlon, Ravinder Singh Sawhney

Abstract:

Morphological evaluation of Indian languages is one of the burgeoning fields in the area of Natural Language Processing (NLP). The evaluation of a language is an eminent task in the era of information retrieval and text mining. The extraction and classification of knowledge from text can be exploited for sentiment analysis and morphological evaluation. This study coalesce morphological evaluation and sentiment analysis for the task of classification of farmer suicide cases reported in Punjab state of India. The pre-processing of Punjabi text involves morphological evaluation and normalization of Punjabi word tokens followed by the training of proposed model using deep learning classification on Punjabi language text extracted from online Punjabi news reports. The class-wise accuracies of sentiment prediction for four negatively oriented classes of farmer suicide cases are 93.85%, 88.53%, 83.3%, and 95.45% respectively. The overall accuracy of sentiment classification obtained using proposed framework on 275 Punjabi text documents is found to be 90.29%.

Keywords: deep neural network, farmer suicides, morphological processing, punjabi text, sentiment analysis

Procedia PDF Downloads 326

7020 Genomic Sequence Representation Learning: An Analysis of K-Mer Vector Embedding Dimensionality

Authors: James Jr. Mashiyane, Risuna Nkolele, Stephanie J. Müller, Gciniwe S. Dlamini, Rebone L. Meraba, Darlington S. Mapiye

Abstract:

When performing language tasks in natural language processing (NLP), the dimensionality of word embeddings is chosen either ad-hoc or is calculated by optimizing the Pairwise Inner Product (PIP) loss. The PIP loss is a metric that measures the dissimilarity between word embeddings, and it is obtained through matrix perturbation theory by utilizing the unitary invariance of word embeddings. Unlike in natural language, in genomics, especially in genome sequence processing, unlike in natural language processing, there is no notion of a “word,” but rather, there are sequence substrings of length k called k-mers. K-mers sizes matter, and they vary depending on the goal of the task at hand. The dimensionality of word embeddings in NLP has been studied using the matrix perturbation theory and the PIP loss. In this paper, the sufficiency and reliability of applying word-embedding algorithms to various genomic sequence datasets are investigated to understand the relationship between the k-mer size and their embedding dimension. This is completed by studying the scaling capability of three embedding algorithms, namely Latent Semantic analysis (LSA), Word2Vec, and Global Vectors (GloVe), with respect to the k-mer size. Utilising the PIP loss as a metric to train embeddings on different datasets, we also show that Word2Vec outperforms LSA and GloVe in accurate computing embeddings as both the k-mer size and vocabulary increase. Finally, the shortcomings of natural language processing embedding algorithms in performing genomic tasks are discussed.

Keywords: word embeddings, k-mer embedding, dimensionality reduction

Procedia PDF Downloads 137

7019 Interaction between Cognitive Control and Language Processing in Non-Fluent Aphasia

Authors: Izabella Szollosi, Klara Marton

Abstract:

Aphasia can be defined as a weakness in accessing linguistic information. Accessing linguistic information is strongly related to information processing, which in turn is associated with the cognitive control system. According to the literature, a deficit in the cognitive control system interferes with language processing and contributes to non-fluent speech performance. The aim of our study was to explore this hypothesis by investigating how cognitive control interacts with language performance in participants with non-fluent aphasia. Cognitive control is a complex construct that includes working memory (WM) and the ability to resist proactive interference (PI). Based on previous research, we hypothesized that impairments in domain-general (DG) cognitive control abilities have negative effects on language processing. In contrast, better DG cognitive control functioning supports goal-directed behavior in language-related processes as well. Since stroke itself might slow down information processing, it is important to examine its negative effects on both cognitive control and language processing. Participants (N=52) in our study were individuals with non-fluent Broca’s aphasia (N = 13), with transcortical motor aphasia (N=13), individuals with stroke damage without aphasia (N=13), and unimpaired speakers (N = 13). All participants performed various computer-based tasks targeting cognitive control functions such as WM and resistance to PI in both linguistic and non-linguistic domains. Non-linguistic tasks targeted primarily DG functions, while linguistic tasks targeted more domain specific (DS) processes. The results showed that participants with Broca’s aphasia differed from the other three groups in the non-linguistic tasks. They performed significantly worse even in the baseline conditions. In contrast, we found a different performance profile in the linguistic domain, where the control group differed from all three stroke-related groups. The three groups with impairment performed more poorly than the controls but similar to each other in the verbal baseline condition. In the more complex verbal PI condition, however, participants with Broca’s aphasia performed significantly worse than all the other groups. Participants with Broca’s aphasia demonstrated the most severe language impairment and the highest vulnerability in tasks measuring DG cognitive control functions. Results support the notion that the more severe the cognitive control impairment, the more severe the aphasia. Thus, our findings suggest a strong interaction between cognitive control and language. Individuals with the most severe and most general cognitive control deficit - participants with Broca’s aphasia - showed the most severe language impairment. Individuals with better DG cognitive control functions demonstrated better language performance. While all participants with stroke damage showed impaired cognitive control functions in the linguistic domain, participants with better language skills performed also better in tasks that measured non-linguistic cognitive control functions. The overall results indicate that the level of cognitive control deficit interacts with the language functions in individuals along with the language spectrum (from severe to no impairment). However, future research is needed to determine any directionality.

Keywords: cognitive control, information processing, language performance, non-fluent aphasia

Procedia PDF Downloads 122

7018 Research on the Risks of Railroad Receiving and Dispatching Trains Operators: Natural Language Processing Risk Text Mining

Authors: Yangze Lan, Ruihua Xv, Feng Zhou, Yijia Shan, Longhao Zhang, Qinghui Xv

Abstract:

Receiving and dispatching trains is an important part of railroad organization, and the risky evaluation of operating personnel is still reflected by scores, lacking further excavation of wrong answers and operating accidents. With natural language processing (NLP) technology, this study extracts the keywords and key phrases of 40 relevant risk events about receiving and dispatching trains and reclassifies the risk events into 8 categories, such as train approach and signal risks, dispatching command risks, and so on. Based on the historical risk data of personnel, the K-Means clustering method is used to classify the risk level of personnel. The result indicates that the high-risk operating personnel need to strengthen the training of train receiving and dispatching operations towards essential trains and abnormal situations.

Keywords: receiving and dispatching trains, natural language processing, risk evaluation, K-means clustering

Procedia PDF Downloads 91

7017 Exploring Language Attrition Through Processing: The Case of Mising Language in Assam

Authors: Chumki Payun, Bidisha Som

Abstract:

The Mising language, spoken by the Mising community in Assam, belongs to the Tibeto-Burman family of languages. This is one of the smaller languages of the region and is facing endangerment due to the dominance of the larger languages, like Assamese. The language is spoken in close in-group scenarios and is gradually losing ground to the dominant languages, partly also due to the education setup where schools use only dominant languages. While there are a number of factors for the current contemporary status of the language, and those can be studied using sociolinguistic tools, the current work aims to contribute to the understanding of language attrition through language processing in order to establish if the effect of second language dominance is more than mere ‘usage’ patterns and has an impact on cognitive strategies. When bilingualism spreads widely in society and results in a language shift, speakers perform people often do better in their second language (L2) than in their first language (L1) across a variety of task settings, in both comprehension and production tasks. This phenomenon was investigated in the case of Mising-Assamese bilinguals, using a picture naming task, in two districts of Jorhat and Tinsukia in Assam, where the relative dominance of L2 is slightly different. This explorative study aimed to investigate if the L2 dominance is visible in their performance and also if the pattern is different in the two different places, thus pointing to the degree of language loss in this case. The findings would have implications for native language education, as education in one’s mother tongue can help reverse the effect of language attrition helping preserve the traditional knowledge system. The hypothesis was that due to the dominance of the L2, subjects’ performance in the task would be better in Assamese than that of Missing. The experiment: Mising-Assamese bilingual participants (age ranges 21-31; N= 20 each from both districts) had to perform a picture naming task in which participants were shown pictures of familiar objects and asked to name them in four scenarios: (a) only in Mising; (b) only in Assamese; (c) a cued mix block: an auditory cue determines the language in which to name the object, and (d) non-cued mix block: participants are not given any specific language cues, but instructed to name the pictures in whichever language they feel most comfortable. The experiment was designed and executed using E-prime 3.0 and was conducted responses were recorded using the help of a Chronos response box and was recorded with the help of a recorder. Preliminary analysis reveals the presence of dominance of L2 over L1. The paper will present a comparison of the response latency, error analysis, and switch cost in L1 and L2 and explain the same from the perspective of language attrition.

Keywords: bilingualism, language attrition, language processing, Mising language.

Procedia PDF Downloads 23

7016 Language Processing in Arabic: Writing Competence Across L1 (Arabic) and L2 (English)

Authors: Abdullah Khuwaileh

Abstract:

The central aim of this paper is to investigate writing skills in the two languages involved, English and Arabic, and to see whether there is an association between poor writing across languages. That is to say, and it is thought that learners might be excellent in their L1 (Language 1: Arabic) but not in L2 (language 2: English). However, our experimental research findings resulted in an interesting association between L1 and L2. Data were collected from 150 students (chosen randomly) who wrote about the same topic in English and Arabic. Topics needed no preparation as they were common and well-known. Scripts were assessed respectively by ELT (English Language Teaching) and Arabic specialists. The study confirms that poor writing in English correlates with similar deficiencies in the mother tongue (Arabic). Thus, the common assumption in ELT that all learners are fully competent in their first language skills is unfounded. Therefore, the criticism of ELT programs for speakers of Arabic, based on poor writing skills in English and good writing in Arabic is not justified. The findings of this paper can be extended to other learners of English who speak Arabic as a first language and English as a foreign and/or second language. The study is concluded with several research and practical recommendations

Keywords: language, writing, culture, l1

Procedia PDF Downloads 89

7015 Intelligent Chatbot Generating Dynamic Responses Through Natural Language Processing

Authors: Aarnav Singh, Jatin Moolchandani

Abstract:

The proposed research work aims to build a query-based AI chatbot that can answer any question related to any topic. A chatbot is software that converses with users via text messages. In the proposed system, we aim to build a chatbot that generates a response based on the user’s query. For this, we use natural language processing to analyze the query and some set of texts to form a concise answer. The texts are obtained through web-scrapping and filtering all the credible sources from a web search. The objective of this project is to provide a chatbot that is able to provide simple and accurate answers without the user having to read through a large number of articles and websites. Creating an AI chatbot that can answer a variety of user questions on a variety of topics is the goal of the proposed research project. This chatbot uses natural language processing to comprehend user inquiries and provides succinct responses by examining a collection of writings that were scraped from the internet. The texts are carefully selected from reliable websites that are found via internet searches. This project aims to provide users with a chatbot that provides clear and precise responses, removing the need to go through several articles and web pages in great detail. In addition to exploring the reasons for their broad acceptance and their usefulness across many industries, this article offers an overview of the interest in chatbots throughout the world.

Keywords: Chatbot, Artificial Intelligence, natural language processing, web scrapping

Procedia PDF Downloads 66

7014 Detecting Paraphrases in Arabic Text

Authors: Amal Alshahrani, Allan Ramsay

Abstract:

Paraphrasing is one of the important tasks in natural language processing; i.e. alternative ways to express the same concept by using different words or phrases. Paraphrases can be used in many natural language applications, such as Information Retrieval, Machine Translation, Question Answering, Text Summarization, or Information Extraction. To obtain pairs of sentences that are paraphrases we create a system that automatically extracts paraphrases from a corpus, which is built from different sources of news article since these are likely to contain paraphrases when they report the same event on the same day. There are existing simple standard approaches (e.g. TF-IDF vector space, cosine similarity) and alignment technique (e.g. Dynamic Time Warping (DTW)) for extracting paraphrase which have been applied to the English. However, the performance of these approaches could be affected when they are applied to another language, for instance Arabic language, due to the presence of phenomena which are not present in English, such as Free Word Order, Zero copula, and Pro-dropping. These phenomena will affect the performance of these algorithms. Thus, if we can analysis how the existing algorithms for English fail for Arabic then we can find a solution for Arabic. The results are promising.

Keywords: natural language processing, TF-IDF, cosine similarity, dynamic time warping (DTW)

Procedia PDF Downloads 386

7013 A Controlled Natural Language Assisted Approach for the Design and Automated Processing of Service Level Agreements

Authors: Christopher Schwarz, Katrin Riegler, Erwin Zinser

Abstract:

The management of outsourcing relationships between IT service providers and their customers proofs to be a critical issue that has to be stipulated by means of Service Level Agreements (SLAs). Since service requirements differ from customer to customer, SLA content and language structures vary largely, standardized SLA templates may not be used and an automated processing of SLA content is not possible. Hence, SLA management is usually a time-consuming and inefficient manual process. For overcoming these challenges, this paper presents an innovative and ITIL V3-conform approach for automated SLA design and management using controlled natural language in enterprise collaboration portals. The proposed novel concept is based on a self-developed controlled natural language that follows a subject-predicate-object approach to specify well-defined SLA content structures that act as templates for customized contracts and support automated SLA processing. The derived results eventually enable IT service providers to automate several SLA request, approval and negotiation processes by means of workflows and business rules within an enterprise collaboration portal. The illustrated prototypical realization gives evidence of the practical relevance in service-oriented scenarios as well as the high flexibility and adaptability of the presented model. Thus, the prototype enables the automated creation of well defined, customized SLA documents, providing a knowledge representation that is both human understandable and machine processable.

Keywords: automated processing, controlled natural language, knowledge representation, information technology outsourcing, service level management

Procedia PDF Downloads 432

7012 Leveraging Large Language Models to Build a Cutting-Edge French Word Sense Disambiguation Corpus

Authors: Mouheb Mehdoui, Amel Fraisse, Mounir Zrigui

Abstract:

With the increasing amount of data circulating over the Web, there is a growing need to develop and deploy tools aimed at unraveling semantic nuances within text or sentences. The challenges in extracting precise meanings arise from the complexity of natural language, while words usually have multiple interpretations depending on the context. The challenge of precisely interpreting words within a given context is what the task of Word Sense Disambiguation meets. It is a very old domain within the area of Natural Language Processing aimed at determining a word’s meaning that it is going to carry in a particular context, hence increasing the correctness of applications processing the language. Numerous linguistic resources are accessible online, including WordNet, thesauri, and dictionaries, enabling exploration of diverse contextual meanings. However, several limitations persist. These include the scarcity of resources for certain languages, a limited number of examples within corpora, and the challenge of accurately detecting the topic or context covered by text, which significantly impacts word sense disambiguation. This paper will discuss the different approaches to WSD and review corpora available for this task. We will contrast these approaches, highlighting the limitations, which will allow us to build a corpus in French, targeted for WSD.

Keywords: semantic enrichment, disambiguation, context fusion, natural language processing, multilingual applications

Procedia PDF Downloads 7