Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 12112

Search results for: natural language processing

12022 Short Answer Grading Using Multi-Context Features

Authors: S. Sharan Sundar, Nithish B. Moudhgalya, Nidhi Bhandari, Vineeth Vijayaraghavan

Abstract:

Automatic Short Answer Grading is one of the prime applications of artificial intelligence in education. Several approaches involving the utilization of selective handcrafted features, graphical matching techniques, concept identification and mapping, complex deep frameworks, sentence embeddings, etc. have been explored over the years. However, keeping in mind the real-world application of the task, these solutions present a slight overhead in terms of computations and resources in achieving high performances. In this work, a simple and effective solution making use of elemental features based on statistical, linguistic properties, and word-based similarity measures in conjunction with tree-based classifiers and regressors is proposed. The results for classification tasks show improvements ranging from 1%-30%, while the regression task shows a stark improvement of 35%. The authors attribute these improvements to the addition of multiple similarity scores to provide ensemble of scoring criteria to the models. The authors also believe the work could reinstate that classical natural language processing techniques and simple machine learning models can be used to achieve high results for short answer grading.

Keywords: artificial intelligence, intelligent systems, natural language processing, text mining

Procedia PDF Downloads 132

12021 Improving Subjective Bias Detection Using Bidirectional Encoder Representations from Transformers and Bidirectional Long Short-Term Memory

Authors: Ebipatei Victoria Tunyan, T. A. Cao, Cheol Young Ock

Abstract:

Detecting subjectively biased statements is a vital task. This is because this kind of bias, when present in the text or other forms of information dissemination media such as news, social media, scientific texts, and encyclopedias, can weaken trust in the information and stir conflicts amongst consumers. Subjective bias detection is also critical for many Natural Language Processing (NLP) tasks like sentiment analysis, opinion identification, and bias neutralization. Having a system that can adequately detect subjectivity in text will boost research in the above-mentioned areas significantly. It can also come in handy for platforms like Wikipedia, where the use of neutral language is of importance. The goal of this work is to identify the subjectively biased language in text on a sentence level. With machine learning, we can solve complex AI problems, making it a good fit for the problem of subjective bias detection. A key step in this approach is to train a classifier based on BERT (Bidirectional Encoder Representations from Transformers) as upstream model. BERT by itself can be used as a classifier; however, in this study, we use BERT as data preprocessor as well as an embedding generator for a Bi-LSTM (Bidirectional Long Short-Term Memory) network incorporated with attention mechanism. This approach produces a deeper and better classifier. We evaluate the effectiveness of our model using the Wiki Neutrality Corpus (WNC), which was compiled from Wikipedia edits that removed various biased instances from sentences as a benchmark dataset, with which we also compare our model to existing approaches. Experimental analysis indicates an improved performance, as our model achieved state-of-the-art accuracy in detecting subjective bias. This study focuses on the English language, but the model can be fine-tuned to accommodate other languages.

Keywords: subjective bias detection, machine learning, BERT–BiLSTM–Attention, text classification, natural language processing

Procedia PDF Downloads 129

12020 Creating Energy Sustainability in an Enterprise

Authors: John Lamb, Robert Epstein, Vasundhara L. Bhupathi, Sanjeev Kumar Marimekala

Abstract:

As we enter the new era of Artificial Intelligence (AI) and Cloud Computing, we mostly rely on the Machine and Natural Language Processing capabilities of AI, and Energy Efficient Hardware and Software Devices in almost every industry sector. In these industry sectors, much emphasis is on developing new and innovative methods for producing and conserving energy and sustaining the depletion of natural resources. The core pillars of sustainability are economic, environmental, and social, which is also informally referred to as the 3 P's (People, Planet and Profits). The 3 P's play a vital role in creating a core Sustainability Model in the Enterprise. Natural resources are continually being depleted, so there is more focus and growing demand for renewable energy. With this growing demand, there is also a growing concern in many industries on how to reduce carbon emissions and conserve natural resources while adopting sustainability in corporate business models and policies. In our paper, we would like to discuss the driving forces such as Climate changes, Natural Disasters, Pandemic, Disruptive Technologies, Corporate Policies, Scaled Business Models and Emerging social media and AI platforms that influence the 3 main pillars of Sustainability (3P’s). Through this paper, we would like to bring an overall perspective on enterprise strategies and the primary focus on bringing cultural shifts in adapting energy-efficient operational models. Overall, many industries across the globe are incorporating core sustainability principles such as reducing energy costs, reducing greenhouse gas (GHG) emissions, reducing waste and increasing recycling, adopting advanced monitoring and metering infrastructure, reducing server footprint and compute resources (Shared IT services, Cloud computing, and Application Modernization) with the vision for a sustainable environment.

Keywords: climate change, pandemic, disruptive technology, government policies, business model, machine learning and natural language processing, AI, social media platform, cloud computing, advanced monitoring, metering infrastructure

Procedia PDF Downloads 111

12019 ALEF: An Enhanced Approach to Arabic-English Bilingual Translation

Authors: Abdul Muqsit Abbasi, Ibrahim Chhipa, Asad Anwer, Saad Farooq, Hassan Berry, Sonu Kumar, Sundar Ali, Muhammad Owais Mahmood, Areeb Ur Rehman, Bahram Baloch

Abstract:

Accurate translation between structurally diverse languages, such as Arabic and English, presents a critical challenge in natural language processing due to significant linguistic and cultural differences. This paper investigates the effectiveness of Facebook’s mBART model, fine-tuned specifically for sequence-tosequence (seq2seq) translation tasks between Arabic and English, and enhanced through advanced refinement techniques. Our approach leverages the Alef Dataset, a meticulously curated parallel corpus spanning various domains to capture the linguistic richness, nuances, and contextual accuracy essential for high-quality translation. We further refine the model’s output using advanced language models such as GPT-3.5 and GPT-4, which improve fluency, coherence, and correct grammatical errors in translated texts. The fine-tuned model demonstrates substantial improvements, achieving a BLEU score of 38.97, METEOR score of 58.11, and TER score of 56.33, surpassing widely used systems such as Google Translate. These results underscore the potential of mBART, combined with refinement strategies, to bridge the translation gap between Arabic and English, providing a reliable, context-aware machine translation solution that is robust across diverse linguistic contexts.

Keywords: natural language processing, machine translation, fine-tuning, Arabic-English translation, transformer models, seq2seq translation, translation evaluation metrics, cross-linguistic communication

Procedia PDF Downloads 7

12018 Morpheme Based Parts of Speech Tagger for Kannada Language

Authors: M. C. Padma, R. J. Prathibha

Abstract:

Parts of speech tagging is the process of assigning appropriate parts of speech tags to the words in a given text. The critical or crucial information needed for tagging a word come from its internal structure rather from its neighboring words. The internal structure of a word comprises of its morphological features and grammatical information. This paper presents a morpheme based parts of speech tagger for Kannada language. This proposed work uses hierarchical tag set for assigning tags. The system is tested on some Kannada words taken from EMILLE corpus. Experimental result shows that the performance of the proposed system is above 90%.

Keywords: hierarchical tag set, morphological analyzer, natural language processing, paradigms, parts of speech

Procedia PDF Downloads 296

12017 Revolutionizing Healthcare Communication: The Transformative Role of Natural Language Processing and Artificial Intelligence

Authors: Halimat M. Ajose-Adeogun, Zaynab A. Bello

Abstract:

Artificial Intelligence (AI) and Natural Language Processing (NLP) have transformed computer language comprehension, allowing computers to comprehend spoken and written language with human-like cognition. NLP, a multidisciplinary area that combines rule-based linguistics, machine learning, and deep learning, enables computers to analyze and comprehend human language. NLP applications in medicine range from tackling issues in electronic health records (EHR) and psychiatry to improving diagnostic precision in orthopedic surgery and optimizing clinical procedures with novel technologies like chatbots. The technology shows promise in a variety of medical sectors, including quicker access to medical records, faster decision-making for healthcare personnel, diagnosing dysplasia in Barrett's esophagus, boosting radiology report quality, and so on. However, successful adoption requires training for healthcare workers, fostering a deep understanding of NLP components, and highlighting the significance of validation before actual application. Despite prevailing challenges, continuous multidisciplinary research and collaboration are critical for overcoming restrictions and paving the way for the revolutionary integration of NLP into medical practice. This integration has the potential to improve patient care, research outcomes, and administrative efficiency. The research methodology includes using NLP techniques for Sentiment Analysis and Emotion Recognition, such as evaluating text or audio data to determine the sentiment and emotional nuances communicated by users, which is essential for designing a responsive and sympathetic chatbot. Furthermore, the project includes the adoption of a Personalized Intervention strategy, in which chatbots are designed to personalize responses by merging NLP algorithms with specific user profiles, treatment history, and emotional states. The synergy between NLP and personalized medicine principles is critical for tailoring chatbot interactions to each user's demands and conditions, hence increasing the efficacy of mental health care. A detailed survey corroborated this synergy, revealing a remarkable 20% increase in patient satisfaction levels and a 30% reduction in workloads for healthcare practitioners. The poll, which focused on health outcomes and was administered to both patients and healthcare professionals, highlights the improved efficiency and favorable influence on the broader healthcare ecosystem.

Keywords: natural language processing, artificial intelligence, healthcare communication, electronic health records, patient care

Procedia PDF Downloads 75

12016 Transportation Language Register as One of Language Community

Authors: Diyah Atiek Mustikawati

Abstract:

Language register refers to a variety of a language used for particular purpose or in a particular social setting. Language register also means as a concept of adapting one’s use of language to conform to standards or tradition in a given professional or social situation. This descriptive study tends to discuss about the form of language register in transportation aspect, factors, also the function of use it. Mostly, language register in transportation aspect uses short sentences in form of informal register. The factor caused language register used are speaker, word choice, background of language. The functions of language register in transportations aspect are to make communication between crew easily, also to keep safety when they were in bad condition. Transportation language register developed naturally as one of variety of language used.

Keywords: language register, language variety, communication, transportation

Procedia PDF Downloads 484

12015 Mask-Prompt-Rerank: An Unsupervised Method for Text Sentiment Transfer

Authors: Yufen Qin

Abstract:

Text sentiment transfer is an important branch of text style transfer. The goal is to generate text with another sentiment attribute based on a text with a specific sentiment attribute while maintaining the content and semantic information unrelated to sentiment unchanged in the process. There are currently two main challenges in this field: no parallel corpus and text attribute entanglement. In response to the above problems, this paper proposed a novel solution: Mask-Prompt-Rerank. Use the method of masking the sentiment words and then using prompt regeneration to transfer the sentence sentiment. Experiments on two sentiment benchmark datasets and one formality transfer benchmark dataset show that this approach makes the performance of small pre-trained language models comparable to that of the most advanced large models, while consuming two orders of magnitude less computing and memory.

Keywords: language model, natural language processing, prompt, text sentiment transfer

Procedia PDF Downloads 79

12014 User Guidance for Effective Query Interpretation in Natural Language Interfaces to Ontologies

Authors: Aliyu Isah Agaie, Masrah Azrifah Azmi Murad, Nurfadhlina Mohd Sharef, Aida Mustapha

Abstract:

Natural Language Interfaces typically support a restricted language and also have scopes and limitations that naïve users are unaware of, resulting in errors when the users attempt to retrieve information from ontologies. To overcome this challenge, an auto-suggest feature is introduced into the querying process where users are guided through the querying process using interactive query construction system. Guiding users to formulate their queries, while providing them with an unconstrained (or almost unconstrained) way to query the ontology results in better interpretation of the query and ultimately lead to an effective search. The approach described in this paper is unobtrusive and subtly guides the users, so that they have a choice of either selecting from the suggestion list or typing in full. The user is not coerced into accepting system suggestions and can express himself using fragments or full sentences.

Keywords: auto-suggest, expressiveness, habitability, natural language interface, query interpretation, user guidance

Procedia PDF Downloads 473

12013 A Word-to-Vector Formulation for Word Representation

Authors: Sandra Rizkallah, Amir F. Atiya

Abstract:

This work presents a novel word to vector representation that is based on embedding the words into a sphere, whereby the dot product of the corresponding vectors represents the similarity between any two words. Embedding the vectors into a sphere enabled us to take into consideration the antonymity between words, not only the synonymity, because of the suitability to handle the polarity nature of words. For example, a word and its antonym can be represented as a vector and its negative. Moreover, we have managed to extract an adequate vocabulary. The obtained results show that the proposed approach can capture the essence of the language, and can be generalized to estimate a correct similarity of any new pair of words.

Keywords: natural language processing, word to vector, text similarity, text mining

Procedia PDF Downloads 273

12012 Tracking and Classifying Client Interactions with Personal Coaches

Authors: Kartik Thakore, Anna-Roza Tamas, Adam Cole

Abstract:

The world health organization (WHO) reports that by 2030 more than 23.7 million deaths annually will be caused by Cardiovascular Diseases (CVDs); with a 2008 economic impact of $3.76 T. Metabolic syndrome is a disorder of multiple metabolic risk factors strongly indicated in the development of cardiovascular diseases. Guided lifestyle intervention driven by live coaching has been shown to have a positive impact on metabolic risk factors. Individuals’ path to improved (decreased) metabolic risk factors are driven by personal motivation and personalized messages delivered by coaches and augmented by technology. Using interactions captured between 400 individuals and 3 coaches over a program period of 500 days, a preliminary model was designed. A novel real time event tracking system was created to track and classify clients based on their genetic profile, baseline questionnaires and usage of a mobile application with live coaching sessions. Classification of clients and coaches was done using a support vector machines application build on Apache Spark, Stanford Natural Language Processing Library (SNLPL) and decision-modeling.

Keywords: guided lifestyle intervention, metabolic risk factors, personal coaching, support vector machines application, Apache Spark, natural language processing

Procedia PDF Downloads 432

12011 AI-Based Techniques for Online Social Media Network Sentiment Analysis: A Methodical Review

Authors: A. M. John-Otumu, M. M. Rahman, O. C. Nwokonkwo, M. C. Onuoha

Abstract:

Online social media networks have long served as a primary arena for group conversations, gossip, text-based information sharing and distribution. The use of natural language processing techniques for text classification and unbiased decision-making has not been far-fetched. Proper classification of this textual information in a given context has also been very difficult. As a result, we decided to conduct a systematic review of previous literature on sentiment classification and AI-based techniques that have been used in order to gain a better understanding of the process of designing and developing a robust and more accurate sentiment classifier that can correctly classify social media textual information of a given context between hate speech and inverted compliments with a high level of accuracy by assessing different artificial intelligence techniques. We evaluated over 250 articles from digital sources like ScienceDirect, ACM, Google Scholar, and IEEE Xplore and whittled down the number of research to 31. Findings revealed that Deep learning approaches such as CNN, RNN, BERT, and LSTM outperformed various machine learning techniques in terms of performance accuracy. A large dataset is also necessary for developing a robust sentiment classifier and can be obtained from places like Twitter, movie reviews, Kaggle, SST, and SemEval Task4. Hybrid Deep Learning techniques like CNN+LSTM, CNN+GRU, CNN+BERT outperformed single Deep Learning techniques and machine learning techniques. Python programming language outperformed Java programming language in terms of sentiment analyzer development due to its simplicity and AI-based library functionalities. Based on some of the important findings from this study, we made a recommendation for future research.

Keywords: artificial intelligence, natural language processing, sentiment analysis, social network, text

Procedia PDF Downloads 115

12010 One-Shot Text Classification with Multilingual-BERT

Authors: Hsin-Yang Wang, K. M. A. Salam, Ying-Jia Lin, Daniel Tan, Tzu-Hsuan Chou, Hung-Yu Kao

Abstract:

Detecting user intent from natural language expression has a wide variety of use cases in different natural language processing applications. Recently few-shot training has a spike of usage on commercial domains. Due to the lack of significant sample features, the downstream task performance has been limited or leads to an unstable result across different domains. As a state-of-the-art method, the pre-trained BERT model gathering the sentence-level information from a large text corpus shows improvement on several NLP benchmarks. In this research, we are proposing a method to change multi-class classification tasks into binary classification tasks, then use the confidence score to rank the results. As a language model, BERT performs well on sequence data. In our experiment, we change the objective from predicting labels into finding the relations between words in sequence data. Our proposed method achieved 71.0% accuracy in the internal intent detection dataset and 63.9% accuracy in the HuffPost dataset. Acknowledgment: This work was supported by NCKU-B109-K003, which is the collaboration between National Cheng Kung University, Taiwan, and SoftBank Corp., Tokyo.

Keywords: OSML, BERT, text classification, one shot

Procedia PDF Downloads 100

12009 Language Activation Theory: Unlocking Bilingual Language Processing

Authors: Leorisyl D. Siarot

Abstract:

It is conventional to see and hear Filipinos, in general, speak two or more languages. This phenomenon brings us to a closer look on how our minds process the input and produce an output with a specific chosen language. This study aimed to generate a theoretical model which explained the interaction of the first and the second languages in the human mind. After a careful analysis of the gathered data, a theoretical prototype called Language Activation Model was generated. For every string, there are three specialized banks: lexico-semantics, morphono-syntax, and pragmatics. These banks are interrelated to other banks of other language strings. As the bilingual learns more languages, a new string is replicated and is filled up with the information of the new language learned. The principles of the first and second languages' interaction are drawn; these are expressed in laws, namely: law of dominance, law of availability, law of usuality and law of preference. Furthermore, difficulties encountered in the learning of second languages were also determined.

Keywords: bilingualism, psycholinguistics, second language learning, languages

Procedia PDF Downloads 510

12008 Automated User Story Driven Approach for Web-Based Functional Testing

Authors: Mahawish Masud, Muhammad Iqbal, M. U. Khan, Farooque Azam

Abstract:

Manual writing of test cases from functional requirements is a time-consuming task. Such test cases are not only difficult to write but are also challenging to maintain. Test cases can be drawn from the functional requirements that are expressed in natural language. However, manual test case generation is inefficient and subject to errors. In this paper, we have presented a systematic procedure that could automatically derive test cases from user stories. The user stories are specified in a restricted natural language using a well-defined template. We have also presented a detailed methodology for writing our test ready user stories. Our tool “Test-o-Matic” automatically generates the test cases by processing the restricted user stories. The generated test cases are executed by using open source Selenium IDE. We evaluate our approach on a case study, which is an open source web based application. Effectiveness of our approach is evaluated by seeding faults in the open source case study using known mutation operators. Results show that the test case generation from restricted user stories is a viable approach for automated testing of web applications.

Keywords: automated testing, natural language, restricted user story modeling, software engineering, software testing, test case specification, transformation and automation, user story, web application testing

Procedia PDF Downloads 387

12007 Modeling False Statements in Texts

Authors: Francielle A. Vargas, Thiago A. S. Pardo

Abstract:

According to the standard philosophical definition, lying is saying something that you believe to be false with the intent to deceive. For deception detection, the FBI trains its agents in a technique named statement analysis, which attempts to detect deception based on parts of speech (i.e., linguistics style). This method is employed in interrogations, where the suspects are first asked to make a written statement. In this poster, we model false statements using linguistics style. In order to achieve this, we methodically analyze linguistic features in a corpus of fake news in the Portuguese language. The results show that they present substantial lexical, syntactic and semantic variations, as well as punctuation and emotion distinctions.

Keywords: deception detection, linguistics style, computational linguistics, natural language processing

Procedia PDF Downloads 218

12006 Machine Learning Strategies for Data Extraction from Unstructured Documents in Financial Services

Authors: Delphine Vendryes, Dushyanth Sekhar, Baojia Tong, Matthew Theisen, Chester Curme

Abstract:

Much of the data that inform the decisions of governments, corporations and individuals are harvested from unstructured documents. Data extraction is defined here as a process that turns non-machine-readable information into a machine-readable format that can be stored, for instance, in a database. In financial services, introducing more automation in data extraction pipelines is a major challenge. Information sought by financial data consumers is often buried within vast bodies of unstructured documents, which have historically required thorough manual extraction. Automated solutions provide faster access to non-machine-readable datasets, in a context where untimely information quickly becomes irrelevant. Data quality standards cannot be compromised, so automation requires high data integrity. This multifaceted task is broken down into smaller steps: ingestion, table parsing (detection and structure recognition), text analysis (entity detection and disambiguation), schema-based record extraction, user feedback incorporation. Selected intermediary steps are phrased as machine learning problems. Solutions leveraging cutting-edge approaches from the fields of computer vision (e.g. table detection) and natural language processing (e.g. entity detection and disambiguation) are proposed.

Keywords: computer vision, entity recognition, finance, information retrieval, machine learning, natural language processing

Procedia PDF Downloads 110

12005 Interaction between Cognitive Control and Language Processing in Non-Fluent Aphasia

Authors: Izabella Szollosi, Klara Marton

Abstract:

Aphasia can be defined as a weakness in accessing linguistic information. Accessing linguistic information is strongly related to information processing, which in turn is associated with the cognitive control system. According to the literature, a deficit in the cognitive control system interferes with language processing and contributes to non-fluent speech performance. The aim of our study was to explore this hypothesis by investigating how cognitive control interacts with language performance in participants with non-fluent aphasia. Cognitive control is a complex construct that includes working memory (WM) and the ability to resist proactive interference (PI). Based on previous research, we hypothesized that impairments in domain-general (DG) cognitive control abilities have negative effects on language processing. In contrast, better DG cognitive control functioning supports goal-directed behavior in language-related processes as well. Since stroke itself might slow down information processing, it is important to examine its negative effects on both cognitive control and language processing. Participants (N=52) in our study were individuals with non-fluent Broca’s aphasia (N = 13), with transcortical motor aphasia (N=13), individuals with stroke damage without aphasia (N=13), and unimpaired speakers (N = 13). All participants performed various computer-based tasks targeting cognitive control functions such as WM and resistance to PI in both linguistic and non-linguistic domains. Non-linguistic tasks targeted primarily DG functions, while linguistic tasks targeted more domain specific (DS) processes. The results showed that participants with Broca’s aphasia differed from the other three groups in the non-linguistic tasks. They performed significantly worse even in the baseline conditions. In contrast, we found a different performance profile in the linguistic domain, where the control group differed from all three stroke-related groups. The three groups with impairment performed more poorly than the controls but similar to each other in the verbal baseline condition. In the more complex verbal PI condition, however, participants with Broca’s aphasia performed significantly worse than all the other groups. Participants with Broca’s aphasia demonstrated the most severe language impairment and the highest vulnerability in tasks measuring DG cognitive control functions. Results support the notion that the more severe the cognitive control impairment, the more severe the aphasia. Thus, our findings suggest a strong interaction between cognitive control and language. Individuals with the most severe and most general cognitive control deficit - participants with Broca’s aphasia - showed the most severe language impairment. Individuals with better DG cognitive control functions demonstrated better language performance. While all participants with stroke damage showed impaired cognitive control functions in the linguistic domain, participants with better language skills performed also better in tasks that measured non-linguistic cognitive control functions. The overall results indicate that the level of cognitive control deficit interacts with the language functions in individuals along with the language spectrum (from severe to no impairment). However, future research is needed to determine any directionality.

Keywords: cognitive control, information processing, language performance, non-fluent aphasia

Procedia PDF Downloads 120

12004 AI Tutor: A Computer Science Domain Knowledge Graph-Based QA System on JADE platform

Authors: Yingqi Cui, Changran Huang, Raymond Lee

Abstract:

In this paper, we proposed an AI Tutor using ontology and natural language process techniques to generate a computer science domain knowledge graph and answer users’ questions based on the knowledge graph. We define eight types of relation to extract relationships between entities according to the computer science domain text. The AI tutor is separated into two agents: learning agent and Question-Answer (QA) agent and developed on JADE (a multi-agent system) platform. The learning agent is responsible for reading text to extract information and generate a corresponding knowledge graph by defined patterns. The QA agent can understand the users’ questions and answer humans’ questions based on the knowledge graph generated by the learning agent.

Keywords: artificial intelligence, natural Language processing, knowledge graph, intelligent agents, QA system

Procedia PDF Downloads 186

12003 Exploring Language Attrition Through Processing: The Case of Mising Language in Assam

Authors: Chumki Payun, Bidisha Som

Abstract:

The Mising language, spoken by the Mising community in Assam, belongs to the Tibeto-Burman family of languages. This is one of the smaller languages of the region and is facing endangerment due to the dominance of the larger languages, like Assamese. The language is spoken in close in-group scenarios and is gradually losing ground to the dominant languages, partly also due to the education setup where schools use only dominant languages. While there are a number of factors for the current contemporary status of the language, and those can be studied using sociolinguistic tools, the current work aims to contribute to the understanding of language attrition through language processing in order to establish if the effect of second language dominance is more than mere ‘usage’ patterns and has an impact on cognitive strategies. When bilingualism spreads widely in society and results in a language shift, speakers perform people often do better in their second language (L2) than in their first language (L1) across a variety of task settings, in both comprehension and production tasks. This phenomenon was investigated in the case of Mising-Assamese bilinguals, using a picture naming task, in two districts of Jorhat and Tinsukia in Assam, where the relative dominance of L2 is slightly different. This explorative study aimed to investigate if the L2 dominance is visible in their performance and also if the pattern is different in the two different places, thus pointing to the degree of language loss in this case. The findings would have implications for native language education, as education in one’s mother tongue can help reverse the effect of language attrition helping preserve the traditional knowledge system. The hypothesis was that due to the dominance of the L2, subjects’ performance in the task would be better in Assamese than that of Missing. The experiment: Mising-Assamese bilingual participants (age ranges 21-31; N= 20 each from both districts) had to perform a picture naming task in which participants were shown pictures of familiar objects and asked to name them in four scenarios: (a) only in Mising; (b) only in Assamese; (c) a cued mix block: an auditory cue determines the language in which to name the object, and (d) non-cued mix block: participants are not given any specific language cues, but instructed to name the pictures in whichever language they feel most comfortable. The experiment was designed and executed using E-prime 3.0 and was conducted responses were recorded using the help of a Chronos response box and was recorded with the help of a recorder. Preliminary analysis reveals the presence of dominance of L2 over L1. The paper will present a comparison of the response latency, error analysis, and switch cost in L1 and L2 and explain the same from the perspective of language attrition.

Keywords: bilingualism, language attrition, language processing, Mising language.

Procedia PDF Downloads 21

12002 Enhancing Plant Throughput in Mineral Processing Through Multimodal Artificial Intelligence

Authors: Muhammad Bilal Shaikh

Abstract:

Mineral processing plants play a pivotal role in extracting valuable minerals from raw ores, contributing significantly to various industries. However, the optimization of plant throughput remains a complex challenge, necessitating innovative approaches for increased efficiency and productivity. This research paper investigates the application of Multimodal Artificial Intelligence (MAI) techniques to address this challenge, aiming to improve overall plant throughput in mineral processing operations. The integration of multimodal AI leverages a combination of diverse data sources, including sensor data, images, and textual information, to provide a holistic understanding of the complex processes involved in mineral extraction. The paper explores the synergies between various AI modalities, such as machine learning, computer vision, and natural language processing, to create a comprehensive and adaptive system for optimizing mineral processing plants. The primary focus of the research is on developing advanced predictive models that can accurately forecast various parameters affecting plant throughput. Utilizing historical process data, machine learning algorithms are trained to identify patterns, correlations, and dependencies within the intricate network of mineral processing operations. This enables real-time decision-making and process optimization, ultimately leading to enhanced plant throughput. Incorporating computer vision into the multimodal AI framework allows for the analysis of visual data from sensors and cameras positioned throughout the plant. This visual input aids in monitoring equipment conditions, identifying anomalies, and optimizing the flow of raw materials. The combination of machine learning and computer vision enables the creation of predictive maintenance strategies, reducing downtime and improving the overall reliability of mineral processing plants. Furthermore, the integration of natural language processing facilitates the extraction of valuable insights from unstructured textual data, such as maintenance logs, research papers, and operator reports. By understanding and analyzing this textual information, the multimodal AI system can identify trends, potential bottlenecks, and areas for improvement in plant operations. This comprehensive approach enables a more nuanced understanding of the factors influencing throughput and allows for targeted interventions. The research also explores the challenges associated with implementing multimodal AI in mineral processing plants, including data integration, model interpretability, and scalability. Addressing these challenges is crucial for the successful deployment of AI solutions in real-world industrial settings. To validate the effectiveness of the proposed multimodal AI framework, the research conducts case studies in collaboration with mineral processing plants. The results demonstrate tangible improvements in plant throughput, efficiency, and cost-effectiveness. The paper concludes with insights into the broader implications of implementing multimodal AI in mineral processing and its potential to revolutionize the industry by providing a robust, adaptive, and data-driven approach to optimizing plant operations. In summary, this research contributes to the evolving field of mineral processing by showcasing the transformative potential of multimodal artificial intelligence in enhancing plant throughput. The proposed framework offers a holistic solution that integrates machine learning, computer vision, and natural language processing to address the intricacies of mineral extraction processes, paving the way for a more efficient and sustainable future in the mineral processing industry.

Keywords: multimodal AI, computer vision, NLP, mineral processing, mining

Procedia PDF Downloads 65

12001 Profiling Risky Code Using Machine Learning

Authors: Zunaira Zaman, David Bohannon

Abstract:

This study explores the application of machine learning (ML) for detecting security vulnerabilities in source code. The research aims to assist organizations with large application portfolios and limited security testing capabilities in prioritizing security activities. ML-based approaches offer benefits such as increased confidence scores, false positives and negatives tuning, and automated feedback. The initial approach using natural language processing techniques to extract features achieved 86% accuracy during the training phase but suffered from overfitting and performed poorly on unseen datasets during testing. To address these issues, the study proposes using the abstract syntax tree (AST) for Java and C++ codebases to capture code semantics and structure and generate path-context representations for each function. The Code2Vec model architecture is used to learn distributed representations of source code snippets for training a machine-learning classifier for vulnerability prediction. The study evaluates the performance of the proposed methodology using two datasets and compares the results with existing approaches. The Devign dataset yielded 60% accuracy in predicting vulnerable code snippets and helped resist overfitting, while the Juliet Test Suite predicted specific vulnerabilities such as OS-Command Injection, Cryptographic, and Cross-Site Scripting vulnerabilities. The Code2Vec model achieved 75% accuracy and a 98% recall rate in predicting OS-Command Injection vulnerabilities. The study concludes that even partial AST representations of source code can be useful for vulnerability prediction. The approach has the potential for automated intelligent analysis of source code, including vulnerability prediction on unseen source code. State-of-the-art models using natural language processing techniques and CNN models with ensemble modelling techniques did not generalize well on unseen data and faced overfitting issues. However, predicting vulnerabilities in source code using machine learning poses challenges such as high dimensionality and complexity of source code, imbalanced datasets, and identifying specific types of vulnerabilities. Future work will address these challenges and expand the scope of the research.

Keywords: code embeddings, neural networks, natural language processing, OS command injection, software security, code properties

Procedia PDF Downloads 105

12000 Language Processing in Arabic: Writing Competence Across L1 (Arabic) and L2 (English)

Authors: Abdullah Khuwaileh

Abstract:

The central aim of this paper is to investigate writing skills in the two languages involved, English and Arabic, and to see whether there is an association between poor writing across languages. That is to say, and it is thought that learners might be excellent in their L1 (Language 1: Arabic) but not in L2 (language 2: English). However, our experimental research findings resulted in an interesting association between L1 and L2. Data were collected from 150 students (chosen randomly) who wrote about the same topic in English and Arabic. Topics needed no preparation as they were common and well-known. Scripts were assessed respectively by ELT (English Language Teaching) and Arabic specialists. The study confirms that poor writing in English correlates with similar deficiencies in the mother tongue (Arabic). Thus, the common assumption in ELT that all learners are fully competent in their first language skills is unfounded. Therefore, the criticism of ELT programs for speakers of Arabic, based on poor writing skills in English and good writing in Arabic is not justified. The findings of this paper can be extended to other learners of English who speak Arabic as a first language and English as a foreign and/or second language. The study is concluded with several research and practical recommendations

Keywords: language, writing, culture, l1

Procedia PDF Downloads 88

11999 Knowledge Graph Development to Connect Earth Metadata and Standard English Queries

Authors: Gabriel Montague, Max Vilgalys, Catherine H. Crawford, Jorge Ortiz, Dava Newman

Abstract:

There has never been so much publicly accessible atmospheric and environmental data. The possibilities of these data are exciting, but the sheer volume of available datasets represents a new challenge for researchers. The task of identifying and working with a new dataset has become more difficult with the amount and variety of available data. Datasets are often documented in ways that differ substantially from the common English used to describe the same topics. This presents a barrier not only for new scientists, but for researchers looking to find comparisons across multiple datasets or specialists from other disciplines hoping to collaborate. This paper proposes a method for addressing this obstacle: creating a knowledge graph to bridge the gap between everyday English language and the technical language surrounding these datasets. Knowledge graph generation is already a well-established field, although there are some unique challenges posed by working with Earth data. One is the sheer size of the databases – it would be infeasible to replicate or analyze all the data stored by an organization like The National Aeronautics and Space Administration (NASA) or the European Space Agency. Instead, this approach identifies topics from metadata available for datasets in NASA’s Earthdata database, which can then be used to directly request and access the raw data from NASA. By starting with a single metadata standard, this paper establishes an approach that can be generalized to different databases, but leaves the challenge of metadata harmonization for future work. Topics generated from the metadata are then linked to topics from a collection of English queries through a variety of standard and custom natural language processing (NLP) methods. The results from this method are then compared to a baseline of elastic search applied to the metadata. This comparison shows the benefits of the proposed knowledge graph system over existing methods, particularly in interpreting natural language queries and interpreting topics in metadata. For the research community, this work introduces an application of NLP to the ecological and environmental sciences, expanding the possibilities of how machine learning can be applied in this discipline. But perhaps more importantly, it establishes the foundation for a platform that can enable common English to access knowledge that previously required considerable effort and experience. By making this public data accessible to the full public, this work has the potential to transform environmental understanding, engagement, and action.

Keywords: earth metadata, knowledge graphs, natural language processing, question-answer systems

Procedia PDF Downloads 146

11998 Automatic Tagging and Accuracy in Assamese Text Data

Authors: Chayanika Hazarika Bordoloi

Abstract:

This paper is an attempt to work on a highly inflectional language called Assamese. This is also one of the national languages of India and very little has been achieved in terms of computational research. Building a language processing tool for a natural language is not very smooth as the standard and language representation change at various levels. This paper presents inflectional suffixes of Assamese verbs and how the statistical tools, along with linguistic features, can improve the tagging accuracy. Conditional random fields (CRF tool) was used to automatically tag and train the text data; however, accuracy was improved after linguistic featured were fed into the training data. Assamese is a highly inflectional language; hence, it is challenging to standardizing its morphology. Inflectional suffixes are used as a feature of the text data. In order to analyze the inflections of Assamese word forms, a list of suffixes is prepared. This list comprises suffixes, comprising of all possible suffixes that various categories can take is prepared. Assamese words can be classified into inflected classes (noun, pronoun, adjective and verb) and un-inflected classes (adverb and particle). The corpus used for this morphological analysis has huge tokens. The corpus is a mixed corpus and it has given satisfactory accuracy. The accuracy rate of the tagger has gradually improved with the modified training data.

Keywords: CRF, morphology, tagging, tagset

Procedia PDF Downloads 193

11997 Evaluation of Modern Natural Language Processing Techniques via Measuring a Company's Public Perception

Authors: Burak Oksuzoglu, Savas Yildirim, Ferhat Kutlu

Abstract:

Opinion mining (OM) is one of the natural language processing (NLP) problems to determine the polarity of opinions, mostly represented on a positive-neutral-negative axis. The data for OM is usually collected from various social media platforms. In an era where social media has considerable control over companies’ futures, it’s worth understanding social media and taking actions accordingly. OM comes to the fore here as the scale of the discussion about companies increases, and it becomes unfeasible to gauge opinion on individual levels. Thus, the companies opt to automize this process by applying machine learning (ML) approaches to their data. For the last two decades, OM or sentiment analysis (SA) has been mainly performed by applying ML classification algorithms such as support vector machines (SVM) and Naïve Bayes to a bag of n-gram representations of textual data. With the advent of deep learning and its apparent success in NLP, traditional methods have become obsolete. Transfer learning paradigm that has been commonly used in computer vision (CV) problems started to shape NLP approaches and language models (LM) lately. This gave a sudden rise to the usage of the pretrained language model (PTM), which contains language representations that are obtained by training it on the large datasets using self-supervised learning objectives. The PTMs are further fine-tuned by a specialized downstream task dataset to produce efficient models for various NLP tasks such as OM, NER (Named-Entity Recognition), Question Answering (QA), and so forth. In this study, the traditional and modern NLP approaches have been evaluated for OM by using a sizable corpus belonging to a large private company containing about 76,000 comments in Turkish: SVM with a bag of n-grams, and two chosen pre-trained models, multilingual universal sentence encoder (MUSE) and bidirectional encoder representations from transformers (BERT). The MUSE model is a multilingual model that supports 16 languages, including Turkish, and it is based on convolutional neural networks. The BERT is a monolingual model in our case and transformers-based neural networks. It uses a masked language model and next sentence prediction tasks that allow the bidirectional training of the transformers. During the training phase of the architecture, pre-processing operations such as morphological parsing, stemming, and spelling correction was not used since the experiments showed that their contribution to the model performance was found insignificant even though Turkish is a highly agglutinative and inflective language. The results show that usage of deep learning methods with pre-trained models and fine-tuning achieve about 11% improvement over SVM for OM. The BERT model achieved around 94% prediction accuracy while the MUSE model achieved around 88% and SVM did around 83%. The MUSE multilingual model shows better results than SVM, but it still performs worse than the monolingual BERT model.

Keywords: BERT, MUSE, opinion mining, pretrained language model, SVM, Turkish

Procedia PDF Downloads 146

11996 Generating Insights from Data Using a Hybrid Approach

Authors: Allmin Susaiyah, Aki Härmä, Milan Petković

Abstract:

Automatic generation of insights from data using insight mining systems (IMS) is useful in many applications, such as personal health tracking, patient monitoring, and business process management. Existing IMS face challenges in controlling insight extraction, scaling to large databases, and generalising to unseen domains. In this work, we propose a hybrid approach consisting of rule-based and neural components for generating insights from data while overcoming the aforementioned challenges. Firstly, a rule-based data 2CNL component is used to extract statistically significant insights from data and represent them in a controlled natural language (CNL). Secondly, a BERTSum-based CNL2NL component is used to convert these CNLs into natural language texts. We improve the model using task-specific and domain-specific fine-tuning. Our approach has been evaluated using statistical techniques and standard evaluation metrics. We overcame the aforementioned challenges and observed significant improvement with domain-specific fine-tuning.

Keywords: data mining, insight mining, natural language generation, pre-trained language models

Procedia PDF Downloads 117

11995 From the “Movement Language” to Communication Language

Authors: Mahmudjon Kuchkarov, Marufjon Kuchkarov

Abstract:

The origin of ‘Human Language’ is still a secret and the most interesting subject of historical linguistics. The core element is the nature of labeling or coding the things or processes with symbols and sounds. In this paper, we investigate human’s involuntary Paired Sounds and Shape Production (PSSP) and its contribution to the development of early human communication. Aimed at twenty-six volunteers who provided many physical movements with various difficulties, the research team investigated the natural, repeatable, and paired sounds and shape productions during human activities. The paper claims the involvement of Paired Sounds and Shape Production (PSSP) in the phonetic origin of some modern words and the existence of similarities between elements of PSSP with characters of the classic Latin alphabet. The results may be used not only as a supporting idea for existing theories but to create a closer look at some fundamental nature of the origin of the languages as well.

Keywords: body shape, body language, coding, Latin alphabet, merging method, movement language, movement sound, natural sound, origin of language, pairing, phonetics, sound and shape production, word origin, word semantic

Procedia PDF Downloads 248

11994 Avoiding Gas Hydrate Problems in Qatar Oil and Gas Industry: Environmentally Friendly Solvents for Gas Hydrate Inhibition

Authors: Nabila Mohamed, Santiago Aparicio, Bahman Tohidi, Mert Atilhan

Abstract:

Qatar's one of the biggest problem in processing its natural resource, which is natural gas, is the often occurring blockage in the pipelines caused due to uncontrolled gas hydrate formation in the pipelines. Several millions of dollars are being spent at the process site to dehydrate the blockage safely by using chemical inhibitors. We aim to establish national database, which addresses the physical conditions that promotes Qatari natural gas to form gas hydrates in the pipelines. Moreover, we aim to design and test novel hydrate inhibitors that are suitable for Qatari natural gas and its processing facilities. From these perspectives we are aiming to provide more effective and sustainable reservoir utilization and processing of Qatari natural gas. In this work, we present the initial findings of a QNRF funded project, which deals with the natural gas hydrate formation characteristics of Qatari type gas in both experimental (PVTx) and computational (molecular simulations) methods. We present the data from the two fully automated apparatus: a gas hydrate autoclave and a rocking cell. Hydrate equilibrium curves including growth/dissociation conditions for multi-component systems for several gas mixtures that represent Qatari type natural gas with and without the presence of well known kinetic and thermodynamic hydrate inhibitors. Ionic liquids were designed and used for testing their inhibition performance and their DFT and molecular modeling simulation results were also obtained and compared with the experimental results. Results showed significant performance of ionic liquids with up to 0.5 % in volume with up to 2 to 4 0C inhibition at high pressures.

Keywords: gas hydrates, natural gas, ionic liquids, inhibition, thermodynamic inhibitors, kinetic inhibitors

Procedia PDF Downloads 1319

11993 The Mother Tongue and Related Issues in Algeria

Authors: Farouk A.N. Bouhadiba

Abstract:

Based on Fishman’s Theoretical Paradigm (1991), we shall first discuss his three value positions for the case of the so called minority native languages in Algeria and how they may be included into a global language teaching program in Algeria. We shall then move on to his scale on language loss, language maintenance and language renewal with illustrating examples taken from the Algerian context. The second part of our talk relates to pedagogical issues on how to proceed for a smooth transition from mother tongue to school tongue, what methods or approaches suit best the teaching of mother tongue and school tongue (Immersion Programs, The Natural Approach, Applied Literacy Programs, The Berlitz Method, etc.). We shall end up our talk on how one may reshuffle the current issues on the “Arabic-only” movement and the abrupt transition from mother tongue to school tongue in use today by opting for teaching programs that involve pre-school language acquisition and in-school language acquisition grammars, and thus pave the way to effective language teaching programs and living curricula and pedagogies such as language nests, intergenerational continuity, communication and identity teaching programs, which result in better language teaching models that make language policies become a reality.

Keywords: native languages, language maintenance, mother tongue, school tongue, education, Algeria

Procedia PDF Downloads 29