Search results for: Arabic natural language processing
12083 AI Tutor: A Computer Science Domain Knowledge Graph-Based QA System on JADE platform
Authors: Yingqi Cui, Changran Huang, Raymond Lee
Abstract:
In this paper, we proposed an AI Tutor using ontology and natural language process techniques to generate a computer science domain knowledge graph and answer users’ questions based on the knowledge graph. We define eight types of relation to extract relationships between entities according to the computer science domain text. The AI tutor is separated into two agents: learning agent and Question-Answer (QA) agent and developed on JADE (a multi-agent system) platform. The learning agent is responsible for reading text to extract information and generate a corresponding knowledge graph by defined patterns. The QA agent can understand the users’ questions and answer humans’ questions based on the knowledge graph generated by the learning agent.Keywords: artificial intelligence, natural Language processing, knowledge graph, intelligent agents, QA system
Procedia PDF Downloads 18812082 Exploring Language Attrition Through Processing: The Case of Mising Language in Assam
Authors: Chumki Payun, Bidisha Som
Abstract:
The Mising language, spoken by the Mising community in Assam, belongs to the Tibeto-Burman family of languages. This is one of the smaller languages of the region and is facing endangerment due to the dominance of the larger languages, like Assamese. The language is spoken in close in-group scenarios and is gradually losing ground to the dominant languages, partly also due to the education setup where schools use only dominant languages. While there are a number of factors for the current contemporary status of the language, and those can be studied using sociolinguistic tools, the current work aims to contribute to the understanding of language attrition through language processing in order to establish if the effect of second language dominance is more than mere ‘usage’ patterns and has an impact on cognitive strategies. When bilingualism spreads widely in society and results in a language shift, speakers perform people often do better in their second language (L2) than in their first language (L1) across a variety of task settings, in both comprehension and production tasks. This phenomenon was investigated in the case of Mising-Assamese bilinguals, using a picture naming task, in two districts of Jorhat and Tinsukia in Assam, where the relative dominance of L2 is slightly different. This explorative study aimed to investigate if the L2 dominance is visible in their performance and also if the pattern is different in the two different places, thus pointing to the degree of language loss in this case. The findings would have implications for native language education, as education in one’s mother tongue can help reverse the effect of language attrition helping preserve the traditional knowledge system. The hypothesis was that due to the dominance of the L2, subjects’ performance in the task would be better in Assamese than that of Missing. The experiment: Mising-Assamese bilingual participants (age ranges 21-31; N= 20 each from both districts) had to perform a picture naming task in which participants were shown pictures of familiar objects and asked to name them in four scenarios: (a) only in Mising; (b) only in Assamese; (c) a cued mix block: an auditory cue determines the language in which to name the object, and (d) non-cued mix block: participants are not given any specific language cues, but instructed to name the pictures in whichever language they feel most comfortable. The experiment was designed and executed using E-prime 3.0 and was conducted responses were recorded using the help of a Chronos response box and was recorded with the help of a recorder. Preliminary analysis reveals the presence of dominance of L2 over L1. The paper will present a comparison of the response latency, error analysis, and switch cost in L1 and L2 and explain the same from the perspective of language attrition.Keywords: bilingualism, language attrition, language processing, Mising language.
Procedia PDF Downloads 2612081 Enhancing Plant Throughput in Mineral Processing Through Multimodal Artificial Intelligence
Authors: Muhammad Bilal Shaikh
Abstract:
Mineral processing plants play a pivotal role in extracting valuable minerals from raw ores, contributing significantly to various industries. However, the optimization of plant throughput remains a complex challenge, necessitating innovative approaches for increased efficiency and productivity. This research paper investigates the application of Multimodal Artificial Intelligence (MAI) techniques to address this challenge, aiming to improve overall plant throughput in mineral processing operations. The integration of multimodal AI leverages a combination of diverse data sources, including sensor data, images, and textual information, to provide a holistic understanding of the complex processes involved in mineral extraction. The paper explores the synergies between various AI modalities, such as machine learning, computer vision, and natural language processing, to create a comprehensive and adaptive system for optimizing mineral processing plants. The primary focus of the research is on developing advanced predictive models that can accurately forecast various parameters affecting plant throughput. Utilizing historical process data, machine learning algorithms are trained to identify patterns, correlations, and dependencies within the intricate network of mineral processing operations. This enables real-time decision-making and process optimization, ultimately leading to enhanced plant throughput. Incorporating computer vision into the multimodal AI framework allows for the analysis of visual data from sensors and cameras positioned throughout the plant. This visual input aids in monitoring equipment conditions, identifying anomalies, and optimizing the flow of raw materials. The combination of machine learning and computer vision enables the creation of predictive maintenance strategies, reducing downtime and improving the overall reliability of mineral processing plants. Furthermore, the integration of natural language processing facilitates the extraction of valuable insights from unstructured textual data, such as maintenance logs, research papers, and operator reports. By understanding and analyzing this textual information, the multimodal AI system can identify trends, potential bottlenecks, and areas for improvement in plant operations. This comprehensive approach enables a more nuanced understanding of the factors influencing throughput and allows for targeted interventions. The research also explores the challenges associated with implementing multimodal AI in mineral processing plants, including data integration, model interpretability, and scalability. Addressing these challenges is crucial for the successful deployment of AI solutions in real-world industrial settings. To validate the effectiveness of the proposed multimodal AI framework, the research conducts case studies in collaboration with mineral processing plants. The results demonstrate tangible improvements in plant throughput, efficiency, and cost-effectiveness. The paper concludes with insights into the broader implications of implementing multimodal AI in mineral processing and its potential to revolutionize the industry by providing a robust, adaptive, and data-driven approach to optimizing plant operations. In summary, this research contributes to the evolving field of mineral processing by showcasing the transformative potential of multimodal artificial intelligence in enhancing plant throughput. The proposed framework offers a holistic solution that integrates machine learning, computer vision, and natural language processing to address the intricacies of mineral extraction processes, paving the way for a more efficient and sustainable future in the mineral processing industry.Keywords: multimodal AI, computer vision, NLP, mineral processing, mining
Procedia PDF Downloads 6812080 Profiling Risky Code Using Machine Learning
Authors: Zunaira Zaman, David Bohannon
Abstract:
This study explores the application of machine learning (ML) for detecting security vulnerabilities in source code. The research aims to assist organizations with large application portfolios and limited security testing capabilities in prioritizing security activities. ML-based approaches offer benefits such as increased confidence scores, false positives and negatives tuning, and automated feedback. The initial approach using natural language processing techniques to extract features achieved 86% accuracy during the training phase but suffered from overfitting and performed poorly on unseen datasets during testing. To address these issues, the study proposes using the abstract syntax tree (AST) for Java and C++ codebases to capture code semantics and structure and generate path-context representations for each function. The Code2Vec model architecture is used to learn distributed representations of source code snippets for training a machine-learning classifier for vulnerability prediction. The study evaluates the performance of the proposed methodology using two datasets and compares the results with existing approaches. The Devign dataset yielded 60% accuracy in predicting vulnerable code snippets and helped resist overfitting, while the Juliet Test Suite predicted specific vulnerabilities such as OS-Command Injection, Cryptographic, and Cross-Site Scripting vulnerabilities. The Code2Vec model achieved 75% accuracy and a 98% recall rate in predicting OS-Command Injection vulnerabilities. The study concludes that even partial AST representations of source code can be useful for vulnerability prediction. The approach has the potential for automated intelligent analysis of source code, including vulnerability prediction on unseen source code. State-of-the-art models using natural language processing techniques and CNN models with ensemble modelling techniques did not generalize well on unseen data and faced overfitting issues. However, predicting vulnerabilities in source code using machine learning poses challenges such as high dimensionality and complexity of source code, imbalanced datasets, and identifying specific types of vulnerabilities. Future work will address these challenges and expand the scope of the research.Keywords: code embeddings, neural networks, natural language processing, OS command injection, software security, code properties
Procedia PDF Downloads 10812079 Knowledge Graph Development to Connect Earth Metadata and Standard English Queries
Authors: Gabriel Montague, Max Vilgalys, Catherine H. Crawford, Jorge Ortiz, Dava Newman
Abstract:
There has never been so much publicly accessible atmospheric and environmental data. The possibilities of these data are exciting, but the sheer volume of available datasets represents a new challenge for researchers. The task of identifying and working with a new dataset has become more difficult with the amount and variety of available data. Datasets are often documented in ways that differ substantially from the common English used to describe the same topics. This presents a barrier not only for new scientists, but for researchers looking to find comparisons across multiple datasets or specialists from other disciplines hoping to collaborate. This paper proposes a method for addressing this obstacle: creating a knowledge graph to bridge the gap between everyday English language and the technical language surrounding these datasets. Knowledge graph generation is already a well-established field, although there are some unique challenges posed by working with Earth data. One is the sheer size of the databases – it would be infeasible to replicate or analyze all the data stored by an organization like The National Aeronautics and Space Administration (NASA) or the European Space Agency. Instead, this approach identifies topics from metadata available for datasets in NASA’s Earthdata database, which can then be used to directly request and access the raw data from NASA. By starting with a single metadata standard, this paper establishes an approach that can be generalized to different databases, but leaves the challenge of metadata harmonization for future work. Topics generated from the metadata are then linked to topics from a collection of English queries through a variety of standard and custom natural language processing (NLP) methods. The results from this method are then compared to a baseline of elastic search applied to the metadata. This comparison shows the benefits of the proposed knowledge graph system over existing methods, particularly in interpreting natural language queries and interpreting topics in metadata. For the research community, this work introduces an application of NLP to the ecological and environmental sciences, expanding the possibilities of how machine learning can be applied in this discipline. But perhaps more importantly, it establishes the foundation for a platform that can enable common English to access knowledge that previously required considerable effort and experience. By making this public data accessible to the full public, this work has the potential to transform environmental understanding, engagement, and action.Keywords: earth metadata, knowledge graphs, natural language processing, question-answer systems
Procedia PDF Downloads 15012078 Automatic Tagging and Accuracy in Assamese Text Data
Authors: Chayanika Hazarika Bordoloi
Abstract:
This paper is an attempt to work on a highly inflectional language called Assamese. This is also one of the national languages of India and very little has been achieved in terms of computational research. Building a language processing tool for a natural language is not very smooth as the standard and language representation change at various levels. This paper presents inflectional suffixes of Assamese verbs and how the statistical tools, along with linguistic features, can improve the tagging accuracy. Conditional random fields (CRF tool) was used to automatically tag and train the text data; however, accuracy was improved after linguistic featured were fed into the training data. Assamese is a highly inflectional language; hence, it is challenging to standardizing its morphology. Inflectional suffixes are used as a feature of the text data. In order to analyze the inflections of Assamese word forms, a list of suffixes is prepared. This list comprises suffixes, comprising of all possible suffixes that various categories can take is prepared. Assamese words can be classified into inflected classes (noun, pronoun, adjective and verb) and un-inflected classes (adverb and particle). The corpus used for this morphological analysis has huge tokens. The corpus is a mixed corpus and it has given satisfactory accuracy. The accuracy rate of the tagger has gradually improved with the modified training data.Keywords: CRF, morphology, tagging, tagset
Procedia PDF Downloads 19512077 Evaluation of Modern Natural Language Processing Techniques via Measuring a Company's Public Perception
Authors: Burak Oksuzoglu, Savas Yildirim, Ferhat Kutlu
Abstract:
Opinion mining (OM) is one of the natural language processing (NLP) problems to determine the polarity of opinions, mostly represented on a positive-neutral-negative axis. The data for OM is usually collected from various social media platforms. In an era where social media has considerable control over companies’ futures, it’s worth understanding social media and taking actions accordingly. OM comes to the fore here as the scale of the discussion about companies increases, and it becomes unfeasible to gauge opinion on individual levels. Thus, the companies opt to automize this process by applying machine learning (ML) approaches to their data. For the last two decades, OM or sentiment analysis (SA) has been mainly performed by applying ML classification algorithms such as support vector machines (SVM) and Naïve Bayes to a bag of n-gram representations of textual data. With the advent of deep learning and its apparent success in NLP, traditional methods have become obsolete. Transfer learning paradigm that has been commonly used in computer vision (CV) problems started to shape NLP approaches and language models (LM) lately. This gave a sudden rise to the usage of the pretrained language model (PTM), which contains language representations that are obtained by training it on the large datasets using self-supervised learning objectives. The PTMs are further fine-tuned by a specialized downstream task dataset to produce efficient models for various NLP tasks such as OM, NER (Named-Entity Recognition), Question Answering (QA), and so forth. In this study, the traditional and modern NLP approaches have been evaluated for OM by using a sizable corpus belonging to a large private company containing about 76,000 comments in Turkish: SVM with a bag of n-grams, and two chosen pre-trained models, multilingual universal sentence encoder (MUSE) and bidirectional encoder representations from transformers (BERT). The MUSE model is a multilingual model that supports 16 languages, including Turkish, and it is based on convolutional neural networks. The BERT is a monolingual model in our case and transformers-based neural networks. It uses a masked language model and next sentence prediction tasks that allow the bidirectional training of the transformers. During the training phase of the architecture, pre-processing operations such as morphological parsing, stemming, and spelling correction was not used since the experiments showed that their contribution to the model performance was found insignificant even though Turkish is a highly agglutinative and inflective language. The results show that usage of deep learning methods with pre-trained models and fine-tuning achieve about 11% improvement over SVM for OM. The BERT model achieved around 94% prediction accuracy while the MUSE model achieved around 88% and SVM did around 83%. The MUSE multilingual model shows better results than SVM, but it still performs worse than the monolingual BERT model.Keywords: BERT, MUSE, opinion mining, pretrained language model, SVM, Turkish
Procedia PDF Downloads 14812076 Anti-Western Sentiment amongst Arabs and How It Drives Support for Russia against Ukraine
Authors: Soran Tarkhani
Abstract:
A glance at social media shows that Russia's invasion of Ukraine receives considerable support among Arabs. This significant support for the Russian invasion of Ukraine is puzzling since most Arab leaders openly condemned the Russian invasion through the UN ES‑11/4 Resolution, and Arabs are among the first who experienced the devastating consequences of war firsthand. This article tries to answer this question by using multiple regression to analyze the online content of Arab responses to Russia's invasion of Ukraine on seven major news networks: CNN Arabic, BBC Arabic, Sky News Arabic, France24 Arabic, DW, Aljazeera, and Al-Arabiya. The article argues that the underlying reason for this Arab support is a reaction to the common anti-Western sentiments among Arabs. The empirical result from regression analysis supports the central arguments and uncovers the motivations behind the endorsement of the Russian invasion of Ukraine and the opposing Ukraine by many Arabs.Keywords: Ukraine, Russia, Arabs, Ukrainians, Russians, Putin, invasion, Europe, war
Procedia PDF Downloads 7612075 Generating Insights from Data Using a Hybrid Approach
Authors: Allmin Susaiyah, Aki Härmä, Milan Petković
Abstract:
Automatic generation of insights from data using insight mining systems (IMS) is useful in many applications, such as personal health tracking, patient monitoring, and business process management. Existing IMS face challenges in controlling insight extraction, scaling to large databases, and generalising to unseen domains. In this work, we propose a hybrid approach consisting of rule-based and neural components for generating insights from data while overcoming the aforementioned challenges. Firstly, a rule-based data 2CNL component is used to extract statistically significant insights from data and represent them in a controlled natural language (CNL). Secondly, a BERTSum-based CNL2NL component is used to convert these CNLs into natural language texts. We improve the model using task-specific and domain-specific fine-tuning. Our approach has been evaluated using statistical techniques and standard evaluation metrics. We overcame the aforementioned challenges and observed significant improvement with domain-specific fine-tuning.Keywords: data mining, insight mining, natural language generation, pre-trained language models
Procedia PDF Downloads 12212074 From the “Movement Language” to Communication Language
Authors: Mahmudjon Kuchkarov, Marufjon Kuchkarov
Abstract:
The origin of ‘Human Language’ is still a secret and the most interesting subject of historical linguistics. The core element is the nature of labeling or coding the things or processes with symbols and sounds. In this paper, we investigate human’s involuntary Paired Sounds and Shape Production (PSSP) and its contribution to the development of early human communication. Aimed at twenty-six volunteers who provided many physical movements with various difficulties, the research team investigated the natural, repeatable, and paired sounds and shape productions during human activities. The paper claims the involvement of Paired Sounds and Shape Production (PSSP) in the phonetic origin of some modern words and the existence of similarities between elements of PSSP with characters of the classic Latin alphabet. The results may be used not only as a supporting idea for existing theories but to create a closer look at some fundamental nature of the origin of the languages as well.Keywords: body shape, body language, coding, Latin alphabet, merging method, movement language, movement sound, natural sound, origin of language, pairing, phonetics, sound and shape production, word origin, word semantic
Procedia PDF Downloads 25312073 Avoiding Gas Hydrate Problems in Qatar Oil and Gas Industry: Environmentally Friendly Solvents for Gas Hydrate Inhibition
Authors: Nabila Mohamed, Santiago Aparicio, Bahman Tohidi, Mert Atilhan
Abstract:
Qatar's one of the biggest problem in processing its natural resource, which is natural gas, is the often occurring blockage in the pipelines caused due to uncontrolled gas hydrate formation in the pipelines. Several millions of dollars are being spent at the process site to dehydrate the blockage safely by using chemical inhibitors. We aim to establish national database, which addresses the physical conditions that promotes Qatari natural gas to form gas hydrates in the pipelines. Moreover, we aim to design and test novel hydrate inhibitors that are suitable for Qatari natural gas and its processing facilities. From these perspectives we are aiming to provide more effective and sustainable reservoir utilization and processing of Qatari natural gas. In this work, we present the initial findings of a QNRF funded project, which deals with the natural gas hydrate formation characteristics of Qatari type gas in both experimental (PVTx) and computational (molecular simulations) methods. We present the data from the two fully automated apparatus: a gas hydrate autoclave and a rocking cell. Hydrate equilibrium curves including growth/dissociation conditions for multi-component systems for several gas mixtures that represent Qatari type natural gas with and without the presence of well known kinetic and thermodynamic hydrate inhibitors. Ionic liquids were designed and used for testing their inhibition performance and their DFT and molecular modeling simulation results were also obtained and compared with the experimental results. Results showed significant performance of ionic liquids with up to 0.5 % in volume with up to 2 to 4 0C inhibition at high pressures.Keywords: gas hydrates, natural gas, ionic liquids, inhibition, thermodynamic inhibitors, kinetic inhibitors
Procedia PDF Downloads 132312072 Evaluation Methods for Question Decomposition Formalism
Authors: Aviv Yaniv, Ron Ben Arosh, Nadav Gasner, Michael Konviser, Arbel Yaniv
Abstract:
This paper introduces two methods for the evaluation of Question Decomposition Meaning Representation (QDMR) as predicted by sequence-to-sequence model and COPYNET parser for natural language questions processing, motivated by the fact that previous evaluation metrics used for this task do not take into account some characteristics of the representation, such as partial ordering structure. To this end, several heuristics to extract such partial dependencies are formulated, followed by the hereby proposed evaluation methods denoted as Proportional Graph Matcher (PGM) and Conversion to Normal String Representation (Nor-Str), designed to better capture the accuracy level of QDMR predictions. Experiments are conducted to demonstrate the efficacy of the proposed evaluation methods and show the added value suggested by one of them- the Nor-Str, for better distinguishing between high and low-quality QDMR when predicted by models such as COPYNET. This work represents an important step forward in the development of better evaluation methods for QDMR predictions, which will be critical for improving the accuracy and reliability of natural language question-answering systems.Keywords: NLP, question answering, question decomposition meaning representation, QDMR evaluation metrics
Procedia PDF Downloads 7812071 A Novel Machine Learning Approach to Aid Agrammatism in Non-fluent Aphasia
Authors: Rohan Bhasin
Abstract:
Agrammatism in non-fluent Aphasia Cases can be defined as a language disorder wherein a patient can only use content words ( nouns, verbs and adjectives ) for communication and their speech is devoid of functional word types like conjunctions and articles, generating speech of with extremely rudimentary grammar . Past approaches involve Speech Therapy of some order with conversation analysis used to analyse pre-therapy speech patterns and qualitative changes in conversational behaviour after therapy. We describe this approach as a novel method to generate functional words (prepositions, articles, ) around content words ( nouns, verbs and adjectives ) using a combination of Natural Language Processing and Deep Learning algorithms. The applications of this approach can be used to assist communication. The approach the paper investigates is : LSTMs or Seq2Seq: A sequence2sequence approach (seq2seq) or LSTM would take in a sequence of inputs and output sequence. This approach needs a significant amount of training data, with each training data containing pairs such as (content words, complete sentence). We generate such data by starting with complete sentences from a text source, removing functional words to get just the content words. However, this approach would require a lot of training data to get a coherent input. The assumptions of this approach is that the content words received in the inputs of both text models are to be preserved, i.e, won't alter after the functional grammar is slotted in. This is a potential limit to cases of severe Agrammatism where such order might not be inherently correct. The applications of this approach can be used to assist communication mild Agrammatism in non-fluent Aphasia Cases. Thus by generating these function words around the content words, we can provide meaningful sentence options to the patient for articulate conversations. Thus our project translates the use case of generating sentences from content-specific words into an assistive technology for non-Fluent Aphasia Patients.Keywords: aphasia, expressive aphasia, assistive algorithms, neurology, machine learning, natural language processing, language disorder, behaviour disorder, sequence to sequence, LSTM
Procedia PDF Downloads 16412070 Exploratory Analysis of A Review of Nonexistence Polarity in Native Speech
Authors: Deawan Rakin Ahamed Remal, Sinthia Chowdhury, Sharun Akter Khushbu, Sheak Rashed Haider Noori
Abstract:
Native Speech to text synthesis has its own leverage for the purpose of mankind. The extensive nature of art to speaking different accents is common but the purpose of communication between two different accent types of people is quite difficult. This problem will be motivated by the extraction of the wrong perception of language meaning. Thus, many existing automatic speech recognition has been placed to detect text. Overall study of this paper mentions a review of NSTTR (Native Speech Text to Text Recognition) synthesis compared with Text to Text recognition. Review has exposed many text to text recognition systems that are at a very early stage to comply with the system by native speech recognition. Many discussions started about the progression of chatbots, linguistic theory another is rule based approach. In the Recent years Deep learning is an overwhelming chapter for text to text learning to detect language nature. To the best of our knowledge, In the sub continent a huge number of people speak in Bangla language but they have different accents in different regions therefore study has been elaborate contradictory discussion achievement of existing works and findings of future needs in Bangla language acoustic accent.Keywords: TTR, NSTTR, text to text recognition, deep learning, natural language processing
Procedia PDF Downloads 13312069 Natural Language News Generation from Big Data
Authors: Bastian Haarmann, Likas Sikorski
Abstract:
In this paper, we introduce an NLG application for the automatic creation of ready-to-publish texts from big data. The fully automatic generated stories have a high resemblance to the style in which the human writer would draw up a news story. Topics may include soccer games, stock exchange market reports, weather forecasts and many more. The generation of the texts runs according to the human language production. Each generated text is unique. Ready-to-publish stories written by a computer application can help humans to quickly grasp the outcomes of big data analyses, save time-consuming pre-formulations for journalists and cater to rather small audiences by offering stories that would otherwise not exist.Keywords: big data, natural language generation, publishing, robotic journalism
Procedia PDF Downloads 43112068 Integrating Natural Language Processing (NLP) and Machine Learning in Lung Cancer Diagnosis
Authors: Mehrnaz Mostafavi
Abstract:
The assessment and categorization of incidental lung nodules present a considerable challenge in healthcare, often necessitating resource-intensive multiple computed tomography (CT) scans for growth confirmation. This research addresses this issue by introducing a distinct computational approach leveraging radiomics and deep-learning methods. However, understanding local services is essential before implementing these advancements. With diverse tracking methods in place, there is a need for efficient and accurate identification approaches, especially in the context of managing lung nodules alongside pre-existing cancer scenarios. This study explores the integration of text-based algorithms in medical data curation, indicating their efficacy in conjunction with machine learning and deep-learning models for identifying lung nodules. Combining medical images with text data has demonstrated superior data retrieval compared to using each modality independently. While deep learning and text analysis show potential in detecting previously missed nodules, challenges persist, such as increased false positives. The presented research introduces a Structured-Query-Language (SQL) algorithm designed for identifying pulmonary nodules in a tertiary cancer center, externally validated at another hospital. Leveraging natural language processing (NLP) and machine learning, the algorithm categorizes lung nodule reports based on sentence features, aiming to facilitate research and assess clinical pathways. The hypothesis posits that the algorithm can accurately identify lung nodule CT scans and predict concerning nodule features using machine-learning classifiers. Through a retrospective observational study spanning a decade, CT scan reports were collected, and an algorithm was developed to extract and classify data. Results underscore the complexity of lung nodule cohorts in cancer centers, emphasizing the importance of careful evaluation before assuming a metastatic origin. The SQL and NLP algorithms demonstrated high accuracy in identifying lung nodule sentences, indicating potential for local service evaluation and research dataset creation. Machine-learning models exhibited strong accuracy in predicting concerning changes in lung nodule scan reports. While limitations include variability in disease group attribution, the potential for correlation rather than causality in clinical findings, and the need for further external validation, the algorithm's accuracy and potential to support clinical decision-making and healthcare automation represent a significant stride in lung nodule management and research.Keywords: lung cancer diagnosis, structured-query-language (SQL), natural language processing (NLP), machine learning, CT scans
Procedia PDF Downloads 10312067 A Neural Approach for the Offline Recognition of the Arabic Handwritten Words of the Algerian Departments
Authors: Salim Ouchtati, Jean Sequeira, Mouldi Bedda
Abstract:
In this work we present an off line system for the recognition of the Arabic handwritten words of the Algerian departments. The study is based mainly on the evaluation of neural network performances, trained with the gradient back propagation algorithm. The used parameters to form the input vector of the neural network are extracted on the binary images of the handwritten word by several methods: the parameters of distribution, the moments centered of the different projections and the Barr features. It should be noted that these methods are applied on segments gotten after the division of the binary image of the word in six segments. The classification is achieved by a multi layers perceptron. Detailed experiments are carried and satisfactory recognition results are reported.Keywords: handwritten word recognition, neural networks, image processing, pattern recognition, features extraction
Procedia PDF Downloads 51412066 Natural Language Processing for the Classification of Social Media Posts in Post-Disaster Management
Authors: Ezgi Şendil
Abstract:
Information extracted from social media has received great attention since it has become an effective alternative for collecting people’s opinions and emotions based on specific experiences in a faster and easier way. The paper aims to put data in a meaningful way to analyze users’ posts and get a result in terms of the experiences and opinions of the users during and after natural disasters. The posts collected from Reddit are classified into nine different categories, including injured/dead people, infrastructure and utility damage, missing/found people, donation needs/offers, caution/advice, and emotional support, identified by using labelled Twitter data and four different machine learning (ML) classifiers.Keywords: disaster, NLP, postdisaster management, sentiment analysis
Procedia PDF Downloads 7512065 Using Artificial Intelligence Technology to Build the User-Oriented Platform for Integrated Archival Service
Authors: Lai Wenfang
Abstract:
Tthis study will describe how to use artificial intelligence (AI) technology to build the user-oriented platform for integrated archival service. The platform will be launched in 2020 by the National Archives Administration (NAA) in Taiwan. With the progression of information communication technology (ICT) the NAA has built many systems to provide archival service. In order to cope with new challenges, such as new ICT, artificial intelligence or blockchain etc. the NAA will try to use the natural language processing (NLP) and machine learning (ML) skill to build a training model and propose suggestions based on the data sent to the platform. NAA expects the platform not only can automatically inform the sending agencies’ staffs which records catalogues are against the transfer or destroy rules, but also can use the model to find the details hidden in the catalogues and suggest NAA’s staff whether the records should be or not to be, to shorten the auditing time. The platform keeps all the users’ browse trails; so that the platform can predict what kinds of archives user could be interested and recommend the search terms by visualization, moreover, inform them the new coming archives. In addition, according to the Archives Act, the NAA’s staff must spend a lot of time to mark or remove the personal data, classified data, etc. before archives provided. To upgrade the archives access service process, the platform will use some text recognition pattern to black out automatically, the staff only need to adjust the error and upload the correct one, when the platform has learned the accuracy will be getting higher. In short, the purpose of the platform is to deduct the government digital transformation and implement the vision of a service-oriented smart government.Keywords: artificial intelligence, natural language processing, machine learning, visualization
Procedia PDF Downloads 17512064 Motion Effects of Arabic Typography on Screen-Based Media
Authors: Ibrahim Hassan
Abstract:
Motion typography is one of the most important types of visual communication based on display. Through the digital display media, we can control the text properties (size, direction, thickness, color, etc.). The use of motion typography in visual communication made it have several images. We need to adjust the terminology and clarify the different differences between them, so relying on the word motion typography -considered a general term- is not enough to separate the different communicative functions of the moving text. In this paper, we discuss the different effects of motion typography on Arabic writing and how we can achieve harmony between the movement and the letterform, and we will, during our experiments, present a new type of text movement.Keywords: Arabic typography, motion typography, kinetic typography, fluid typography, temporal typography
Procedia PDF Downloads 16112063 The Effect of Self and Peer Assessment Activities in Second Language Writing: A Washback Effect Study on the Writing Growth during the Revision Phase in the Writing Process: Learners’ Perspective
Authors: Musbah Abdussayed
Abstract:
The washback effect refers to the influence of assessment on teaching and learning, and this washback effect can either be positive or negative. This study implemented, sequentially, self-assessment (SA) and peer assessment (PA) and examined the washback effect of self and peer assessment (SPA) activities on the writing growth during the revision phase in the writing process. Twenty advanced Arabic as a second language learners from a private school in the USA participated in the study. The participants composed and then revised a short Arabic story as a part of a midterm grade. Qualitative data was collected, analyzed, and synthesized from ten interviews with the learners and from the twenty learners’ post-reflective journals. The findings indicate positive washback effects on the learners’ writing growth. The PA activity enhanced descriptions and meaning, promoted creativity, and improved textual coherence, whereas the SA activity led to detecting editing issues. Furthermore, both SPA activities had washback effects in common, including helping the learners meet the writing genre conventions and developing metacognitive awareness. However, the findings also demonstrate negative washback effects on the learners’ attitudes during the revision phase in the writing process, including bias toward self-evaluation during the SA activity and reluctance to rate peers’ writing performance during the PA activity. The findings suggest that self-and peer assessment activities are essential teaching and learning tools that can be utilized sequentially to help learners tackle multiple writing areas during the revision phase in the writing process.Keywords: self assessment, peer assessment, washback effect, second language writing, writing process
Procedia PDF Downloads 6912062 Linguistic Analysis of Borderline Personality Disorder: Using Language to Predict Maladaptive Thoughts and Behaviours
Authors: Charlotte Entwistle, Ryan Boyd
Abstract:
Recent developments in information retrieval techniques and natural language processing have allowed for greater exploration of psychological and social processes. Linguistic analysis methods for understanding behaviour have provided useful insights within the field of mental health. One area within mental health that has received little attention though, is borderline personality disorder (BPD). BPD is a common mental health disorder characterised by instability of interpersonal relationships, self-image and affect. It also manifests through maladaptive behaviours, such as impulsivity and self-harm. Examination of language patterns associated with BPD could allow for a greater understanding of the disorder and its links to maladaptive thoughts and behaviours. Language analysis methods could also be used in a predictive way, such as by identifying indicators of BPD or predicting maladaptive thoughts, emotions and behaviours. Additionally, associations that are uncovered between language and maladaptive thoughts and behaviours could then be applied at a more general level. This study explores linguistic characteristics of BPD, and their links to maladaptive thoughts and behaviours, through the analysis of social media data. Data were collected from a large corpus of posts from the publicly available social media platform Reddit, namely, from the ‘r/BPD’ subreddit whereby people identify as having BPD. Data were collected using the Python Reddit API Wrapper and included all users which had posted within the BPD subreddit. All posts were manually inspected to ensure that they were not posted by someone who clearly did not have BPD, such as people posting about a loved one with BPD. These users were then tracked across all other subreddits of which they had posted in and data from these subreddits were also collected. Additionally, data were collected from a random control group of Reddit users. Disorder-relevant behaviours, such as self-harming or aggression-related behaviours, outlined within Reddit posts were coded to by expert raters. All posts and comments were aggregated by user and split by subreddit. Language data were then analysed using the Linguistic Inquiry and Word Count (LIWC) 2015 software. LIWC is a text analysis program that identifies and categorises words based on linguistic and paralinguistic dimensions, psychological constructs and personal concern categories. Statistical analyses of linguistic features could then be conducted. Findings revealed distinct linguistic features associated with BPD, based on Reddit posts, which differentiated these users from a control group. Language patterns were also found to be associated with the occurrence of maladaptive thoughts and behaviours. Thus, this study demonstrates that there are indeed linguistic markers of BPD present on social media. It also implies that language could be predictive of maladaptive thoughts and behaviours associated with BPD. These findings are of importance as they suggest potential for clinical interventions to be provided based on the language of people with BPD to try to reduce the likelihood of maladaptive thoughts and behaviours occurring. For example, by social media tracking or engaging people with BPD in expressive writing therapy. Overall, this study has provided a greater understanding of the disorder and how it manifests through language and behaviour.Keywords: behaviour analysis, borderline personality disorder, natural language processing, social media data
Procedia PDF Downloads 35312061 Multi-Sensory Coding as Intervention Therapy for ESL Spellers with Auditory Processing Delays: A South African Case-Study
Authors: A. Van Staden, N. Purcell
Abstract:
Spelling development is complex and multifaceted and relies on several cognitive-linguistic processes. This paper explored the spelling difficulties of English second language learners with auditory processing delays. This empirical study aims to address these issues by means of an intervention design. Specifically, the objectives are: (a) to develop and implement a multi-sensory spelling program for second language learners with auditory processing difficulties (APD) for a period of 6 months; (b) to assess the efficacy of the multi-sensory spelling program and whether this intervention could significantly improve experimental learners' spelling, phonological awareness, and processing (PA), rapid automatized naming (RAN), working memory (WM), word reading and reading comprehension; and (c) to determine the relationship (or interplay) between these cognitive and linguistic skills (mentioned above), and how they influence spelling development. Forty-four English, second language learners with APD were sampled from one primary school in the Free State province. The learners were randomly assigned to either an experimental (n=22) or control group (n=22). During the implementation of the spelling program, several visual, tactile and kinesthetic exercises, including the utilization of fingerspelling were introduced to support the experimental learners’ (N = 22) spelling development. Post-test results showed the efficacy of the multi-sensory spelling program, with the experimental group who were trained in utilising multi-sensory coding and fingerspelling outperforming learners from the control group on the cognitive-linguistic, spelling and reading measures. The results and efficacy of this multi-sensory spelling program and the utilisation of fingerspelling for hearing second language learners with APD open up innovative perspectives for the prevention and targeted remediation of spelling difficulties.Keywords: English second language spellers, auditory processing delays, spelling difficulties, multi-sensory intervention program
Procedia PDF Downloads 13712060 Deep-Learning to Generation of Weights for Image Captioning Using Part-of-Speech Approach
Authors: Tiago do Carmo Nogueira, Cássio Dener Noronha Vinhal, Gélson da Cruz Júnior, Matheus Rudolfo Diedrich Ullmann
Abstract:
Generating automatic image descriptions through natural language is a challenging task. Image captioning is a task that consistently describes an image by combining computer vision and natural language processing techniques. To accomplish this task, cutting-edge models use encoder-decoder structures. Thus, Convolutional Neural Networks (CNN) are used to extract the characteristics of the images, and Recurrent Neural Networks (RNN) generate the descriptive sentences of the images. However, cutting-edge approaches still suffer from problems of generating incorrect captions and accumulating errors in the decoders. To solve this problem, we propose a model based on the encoder-decoder structure, introducing a module that generates the weights according to the importance of the word to form the sentence, using the part-of-speech (PoS). Thus, the results demonstrate that our model surpasses state-of-the-art models.Keywords: gated recurrent units, caption generation, convolutional neural network, part-of-speech
Procedia PDF Downloads 10312059 The Effect of the Pronunciation of Emphatic Sounds on Perceived Masculinity/Femininity
Authors: M. Sayyour, M. Abdulkareem, O. Osman, S. Salmeh
Abstract:
Emphatic sounds in Arabic are /tˤ/, /sˤ/, /dˤ/, and /ðˤ/. They involve a secondary articulation in the pharynx area as opposed to their counterparts: /t/,/s/,/d/and /ð/. Although they are present in most Arabic dialects, some dialects have lost this class as a historical development, such as Maltese Arabic. It has been found that there is a difference in the pronunciation of these emphatic sounds between the two genders, arguing that males tend to produce more evident emphasis than females. This study builds on these studies by trying to investigate whether listeners perceive fully emphatic sounds as more masculine and less emphatic sounds as more feminine. Furthermore, the study aims to find out which is more important in this perception process: the emphatic consonant itself or the vowel following it. To test this, natural and manipulated tokens of two male and two female speakers were used. The natural tokens include words that have emphatic consonant and emphatic vowel and tokens that have plain consonant and plain vowel. The manipulated tokens include words that have emphatic consonant but central vowel and plain consonant followed by the same central vowel. These manipulated tokens allow us to see whether the consonant will still affect the perception even if the vowel is controlled. Another group of words that contained no emphatic sounds was used as a control group. The total number of tokens (natural, manipulated, and control) are 160 tokens. After that, 60 university students (30 males and 30 females) listened to these tokens and responded by choosing a specific character that they think is likely to produce each token. The characters’ descriptions are carefully written with two degrees of femininity and two degrees of masculinity. The preliminary results for the femininity level showed that the highest degree of femininity was for tokens that contain a plain consonant and a plain vowel. The lowest level of femininity was given for tokens that have fully emphatic consonant and vowel. For the manipulated tokens that contained plain consonant and central vowel, the femininity degree was high which indicates that the consonant is more important than the vowel, while for the manipulated tokens that contain emphatic consonant and a central vowel, the femininity level was higher than that for the tokens that have emphatic consonant and emphatic vowel, which indicates that the vowel is more important for the perception of emphatic consonants. These results are interpreted in light of feminist linguistic theories, linguistic expectations, performed gender and linguistic change theories.Keywords: Emphatic sounds, gender studies, perception, sociophonetics
Procedia PDF Downloads 38512058 Adaptation in Translation of 'Christmas Every Day' Short Story by William Dean Howells
Authors: Mohsine Khazrouni
Abstract:
The present study is an attempt to highlight the importance of adaptation in translation. To convey the message, the translator needs to take into account not only the text but also extra-linguistic factors such as the target audience. The present paper claims that adaptation is an unavoidable translation strategy when dealing with texts that are heavy with religious and cultural themes. The translation task becomes even more challenging when dealing with children’s literature as the audience are children whose comprehension, experience and world knowledge are limited. The study uses the Arabic translation of the short story ‘Christmas Every Day’ as a case study. The short story will be translated, and the pragmatic problems involved will be discussed. The focus will be on the issue of adaptation. i.e., the source text should be adapted to the target language audience`s social and cultural environment.Keywords: pragmatic adaptation, Arabic translation, children's literature, equivalence
Procedia PDF Downloads 21612057 Syntactic Analyzer for Tamil Language
Authors: Franklin Thambi Jose.S
Abstract:
Computational Linguistics is a branch of linguistics, which deals with the computer and linguistic levels. It is also said, as a branch of language studies which applies computer techniques to linguistics field. In Computational Linguistics, Natural Language Processing plays an important role. This came to exist because of the invention of Information Technology. In computational syntax, the syntactic analyser breaks a sentence into phrases and clauses and identifies the sentence with the syntactic information. Tamil is one of the major Dravidian languages, which has a very long written history of more than 2000 years. It is mainly spoken in Tamilnadu (in India), Srilanka, Malaysia and Singapore. It is an official language in Tamilnadu (in India), Srilanka, Malaysia and Singapore. In Malaysia Tamil speaking people are considered as an ethnic group. In Tamil syntax, the sentences in Tamil are classified into four for this research, namely: 1. Main Sentence 2. Interrogative Sentence 3. Equational Sentence 4. Elliptical Sentence. In computational syntax, the first step is to provide required information regarding the head and its constituent of each sentence. This information will be incorporated to the system using programming languages. Now the system can easily analyse a given sentence with the criteria or mechanisms given to it. Providing needful criteria or mechanisms to the computer to identify the basic types of sentences using Syntactic parser in Tamil language is the major objective of this paper.Keywords: tamil, syntax, criteria, sentences, parser
Procedia PDF Downloads 51712056 Feasibility and Efficacy of Matrix Model in Arabic Countries
Authors: Yasin Ibrahim, Hisham Almohandes, Chia Hsu, Regina Baronia, Jesse Worsham, Sara Abdelgawad, Mansour Shawky, Mohammed Abdelfattah, Nesif Alhemiary
Abstract:
Background: The matrix model (MM) is an evidence-based program for treating substance use disorders. Since first translated into Arabic in 2010, the MM has been gaining popularity in Arabic countries. However, there is no published data as pertains to its efficacy and feasibility in Arabic communities. Here we aimed at exploring providers’ perspectives on its feasibility and efficacy. Methods: Eight addiction treatment centers from four Arabic countries, namely Egypt, Kingdom of Saudi Arabia, the United Arab Emirates, and Iraq, were contacted via email. They were asked to fill in a 21-item questionnaire. Results: Matrix model continues to be utilized in 6 out of the 8 contacted programs. One center in Egypt has discontinued the MM as the providers felt it was not suitable for substance disorders other than stimulants, which are not common in Egypt. Baghdad University Medical Center has substituted MM with Colombo Program as there have been more training opportunities available for it. Data showed wide variability in regards to number of clients treated with the MM (from 300 to 2500). The Arabic version was utilized for training providers in 5 out of the 8 centers while the providers of the other 3 have been trained in the United States. All providers reported that MM made their job significantly easier, and seven providers believed that MM has favorably affected the relapse rate. In all of the six centers, MM is being utilized for many substance use disorders in addition to stimulant use disorders. Reported challenges included the acceptability of patients and their families, difficulty understanding some concepts, and high drop rates in some centers. Conclusion: Matrix model seems to be a valuable modality for the treatment of substance use disorders in Arabic countries. It has its own challenges and limitations that call for more culturally adapted versions.Keywords: addiction, Arabic countries, developing countries, matrix model
Procedia PDF Downloads 15612055 The Importance of Visual Communication in Artificial Intelligence
Authors: Manjitsingh Rajput
Abstract:
Visual communication plays an important role in artificial intelligence (AI) because it enables machines to understand and interpret visual information, similar to how humans do. This abstract explores the importance of visual communication in AI and emphasizes the importance of various applications such as computer vision, object emphasis recognition, image classification and autonomous systems. In going deeper, with deep learning techniques and neural networks that modify visual understanding, In addition to AI programming, the abstract discusses challenges facing visual interfaces for AI, such as data scarcity, domain optimization, and interpretability. Visual communication and other approaches, such as natural language processing and speech recognition, have also been explored. Overall, this abstract highlights the critical role that visual communication plays in advancing AI capabilities and enabling machines to perceive and understand the world around them. The abstract also explores the integration of visual communication with other modalities like natural language processing and speech recognition, emphasizing the critical role of visual communication in AI capabilities. This methodology explores the importance of visual communication in AI development and implementation, highlighting its potential to enhance the effectiveness and accessibility of AI systems. It provides a comprehensive approach to integrating visual elements into AI systems, making them more user-friendly and efficient. In conclusion, Visual communication is crucial in AI systems for object recognition, facial analysis, and augmented reality, but challenges like data quality, interpretability, and ethics must be addressed. Visual communication enhances user experience, decision-making, accessibility, and collaboration. Developers can integrate visual elements for efficient and accessible AI systems.Keywords: visual communication AI, computer vision, visual aid in communication, essence of visual communication.
Procedia PDF Downloads 9712054 Exploring SL Writing and SL Sensitivity during Writing Tasks: Poor and Advanced Writing in a Context of Second Language other than English
Authors: Sandra Figueiredo, Margarida Alves Martins, Carlos Silva, Cristina Simões
Abstract:
This study integrates a larger research empirical project that examines second language (SL) learners’ profiles and valid procedures to perform complete and diagnostic assessment in schools. 102 learners of Portuguese as a SL aged 7 and 17 years speakers of distinct home languages were assessed in several linguistic tasks. In this article, we focused on writing performance in the specific task of narrative essay composition. The written outputs were measured using the score in six components adapted from an English SL assessment context (Alberta Education): linguistic vocabulary, grammar, syntax, strategy, socio-linguistic, and discourse. The writing processes and strategies in Portuguese language used by different immigrant students were analysed to determine features and diversity of deficits on authentic texts performed by SL writers. Differentiated performance was based on the diversity of the following variables: grades, previous schooling, home language, instruction in first language, and exposure to Portuguese as Second Language. Indo-Aryan languages speakers showed low writing scores compared to their peers and the type of language and respective cognitive mapping (such as Mandarin and Arabic) was the predictor, not linguistic distance. Home language instruction should also be prominently considered in further research to understand specificities of cognitive academic profile in a Romance languages learning context. Additionally, this study also examined the teachers representations that will be here addressed to understand educational implications of second language teaching in psychological distress of different minorities in schools of specific host countries.Keywords: home language, immigrant students, Portuguese language, second language, writing assessment
Procedia PDF Downloads 464