Search results for: NLP
30 Role of Natural Language Processing in Information Retrieval; Challenges and Opportunities
Authors: Khaled M. Alhawiti
Abstract:
This paper aims to analyze the role of natural language processing (NLP). The paper will discuss the role in the context of automated data retrieval, automated question answer, and text structuring. NLP techniques are gaining wider acceptance in real life applications and industrial concerns. There are various complexities involved in processing the text of natural language that could satisfy the need of decision makers. This paper begins with the description of the qualities of NLP practices. The paper then focuses on the challenges in natural language processing. The paper also discusses major techniques of NLP. The last section describes opportunities and challenges for future research.
Keywords: Data Retrieval, Information retrieval, Natural Language Processing, Text Structuring.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 286129 A Low-Latency Quadratic Extended Domain Modular Multiplier for Bilinear Pairing Based on NLP Multiplication
Authors: Yulong Jia, Xiang Zhang, Ziyuan Wu, Shiji Hu
Abstract:
The calculation of bilinear pairing is the core of the SM9 algorithm, which relies on the underlying prime domain algorithm and the quadratic extension domain algorithm. Among the field algorithms, modular multiplication operation is the most time-consuming part. Therefore, the underlying modular multiplication algorithm is optimized to maximize the operation speed of bilinear pairings. This paper uses a modular multiplication method based on non-least positive (NLP) combined with Karatsuba and schoolbook multiplication to improve the Montgomery algorithm. At the same time, according to the characteristics of multiplication operation in quadratic extension domain, a quadratic extension domain FP2-NLP modular multiplication algorithm for bilinear pairings is proposed, which effectively reduces the operation time of modular multiplication in quadratic extension domain. The sub-expanded domain Fp2-NLP modular multiplication algorithm effectively reduces the operation time of modular multiplication under the second-expanded domain; The multiplication unit in the quadratic extension domain is implemented using SMIC55nm process, and two different implementation architectures are designed to cope with different application scenarios. Compared with the existing related literature, The output latency of this design can reach a minimum of 15 cycles. The shortest time for calculating the (AB+CD)r−1 mod M form is 37.5ns, and the comprehensive area-time product (AT) is 11400, The final R-ate pairing algorithm hardware accelerator consumes 2670k equivalent logic gates and 1.8ms computing time in 55nm process.
Keywords: sm9, hardware, NLP, Montgomery.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2128 A Text Classification Approach Based on Natural Language Processing and Machine Learning Techniques
Authors: Rim Messaoudi, Nogaye-Gueye Gning, François Azelart
Abstract:
Automatic text classification applies mostly natural language processing (NLP) and other artificial intelligence (AI)-guided techniques to automatically classify text in a faster and more accurate manner. This paper discusses the subject of using predictive maintenance to manage incident tickets inside the sociality. It focuses on proposing a tool that treats and analyses comments and notes written by administrators after resolving an incident ticket. The goal here is to increase the quality of these comments. Additionally, this tool is based on NLP and machine learning techniques to realize the textual analytics of the extracted data. This approach was tested using real data taken from the French National Railways (SNCF) company and was given a high-quality result.
Keywords: Machine learning, text classification, NLP techniques, semantic representation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 36827 Investigating Solar Cycles and Media Sentiment Through Advanced NLP Techniques
Authors: Aghamusa Azizov
Abstract:
This study investigates the correlation between solar activity and sentiment in news media coverage, using a large-scale dataset of solar activity since 1750 and over 15 million articles from "The New York Times" dating from 1851 onwards. Employing Pearson's correlation coefficient and multiple Natural Language Processing (NLP) tools—TextBlob, Vader, and DistillBERT—the research examines the extent to which fluctuations in solar phenomena are reflected in the sentiment of historical news narratives. The findings reveal that the correlation between solar activity and media sentiment is generally negligible, suggesting a weak influence of solar patterns on the portrayal of events in news media. Notably, a moderate positive correlation was observed between the sentiments derived from TextBlob and Vader, indicating consistency across NLP tools. The analysis provides insights into the historical impact of solar activity on human affairs and highlights the importance of using multiple analytical methods to understand complex relationships in large datasets. The study contributes to the broader understanding of how extraterrestrial factors may intersect with media-reported events and underlines the intricate nature of interdisciplinary research in the data science and historical domains.
Keywords: Solar Activity Correlation, Media Sentiment Analysis, Natural Language Processing, NLP, Historical Event Patterns.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14826 Implementing a Database from a Requirement Specification
Abstract:
Creating a database scheme is essentially a manual process. From a requirement specification the information contained within has to be analyzed and reduced into a set of tables, attributes and relationships. This is a time consuming process that has to go through several stages before an acceptable database schema is achieved. The purpose of this paper is to implement a Natural Language Processing (NLP) based tool to produce a relational database from a requirement specification. The Stanford CoreNLP version 3.3.1 and the Java programming were used to implement the proposed model. The outcome of this study indicates that a first draft of a relational database schema can be extracted from a requirement specification by using NLP tools and techniques with minimum user intervention. Therefore this method is a step forward in finding a solution that requires little or no user intervention.
Keywords: Information Extraction, Natural Language Processing, Relation Extraction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 225725 Native Language Identification with Cross-Corpus Evaluation Using Social Media Data: 'Reddit'
Authors: Yasmeen Bassas, Sandra Kuebler, Allen Riddell
Abstract:
Native Language Identification is one of the growing subfields in Natural Language Processing (NLP). The task of Native Language Identification (NLI) is mainly concerned with predicting the native language of an author’s writing in a second language. In this paper, we investigate the performance of two types of features; content-based features vs. content independent features when they are evaluated on a different corpus (using social media data “Reddit”). In this NLI task, the predefined models are trained on one corpus (TOEFL) and then the trained models are evaluated on a different data using an external corpus (Reddit). Three classifiers are used in this task; the baseline, linear SVM, and Logistic Regression. Results show that content-based features are more accurate and robust than content independent ones when tested within corpus and across corpus.
Keywords: NLI, NLP, content-based features, content independent features, social media corpus, ML.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 48224 Ontology Population via NLP Techniques in Risk Management
Authors: Jawad Makki, Anne-Marie Alquier, Violaine Prince
Abstract:
In this paper we propose an NLP-based method for Ontology Population from texts and apply it to semi automatic instantiate a Generic Knowledge Base (Generic Domain Ontology) in the risk management domain. The approach is semi-automatic and uses a domain expert intervention for validation. The proposed approach relies on a set of Instances Recognition Rules based on syntactic structures, and on the predicative power of verbs in the instantiation process. It is not domain dependent since it heavily relies on linguistic knowledge. A description of an experiment performed on a part of the ontology of the PRIMA1 project (supported by the European community) is given. A first validation of the method is done by populating this ontology with Chemical Fact Sheets from Environmental Protection Agency2. The results of this experiment complete the paper and support the hypothesis that relying on the predicative power of verbs in the instantiation process improves the performance.Keywords: Information Extraction, Instance Recognition Rules, Ontology Population, Risk Management, Semantic analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 155823 DocPro: A Framework for Processing Semantic and Layout Information in Business Documents
Authors: Ming-Jen Huang, Chun-Fang Huang, Chiching Wei
Abstract:
With the recent advance of the deep neural network, we observe new applications of NLP (natural language processing) and CV (computer vision) powered by deep neural networks for processing business documents. However, creating a real-world document processing system needs to integrate several NLP and CV tasks, rather than treating them separately. There is a need to have a unified approach for processing documents containing textual and graphical elements with rich formats, diverse layout arrangement, and distinct semantics. In this paper, a framework that fulfills this unified approach is presented. The framework includes a representation model definition for holding the information generated by various tasks and specifications defining the coordination between these tasks. The framework is a blueprint for building a system that can process documents with rich formats, styles, and multiple types of elements. The flexible and lightweight design of the framework can help build a system for diverse business scenarios, such as contract monitoring and reviewing.
Keywords: Document processing, framework, formal definition, machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 67922 Enhancing Word Meaning Retrieval Using FastText and NLP Techniques
Authors: Sankalp Devanand, Prateek Agasimani, V. S. Shamith, Rohith Neeraje
Abstract:
Machine translation has witnessed significant advancements in recent years, but the translation of languages with distinct linguistic characteristics, such as English and Sanskrit, remains a challenging task. This research presents the development of a dedicated English to Sanskrit machine translation model, aiming to bridge the linguistic and cultural gap between these two languages. Using a variety of natural language processing (NLP) approaches including FastText embeddings, this research proposes a thorough method to improve word meaning retrieval. Data preparation, part-of-speech tagging, dictionary searches, and transliteration are all included in the methodology. The study also addresses the implementation of an interpreter pattern and uses a word similarity task to assess the quality of word embeddings. The experimental outcomes show how the suggested approach may be used to enhance word meaning retrieval tasks with greater efficacy, accuracy, and adaptability. Evaluation of the model's performance is conducted through rigorous testing, comparing its output against existing machine translation systems. The assessment includes quantitative metrics such as BLEU scores, METEOR scores, Jaccard Similarity etc.
Keywords: Machine translation, English to Sanskrit, natural language processing, word meaning retrieval, FastText embeddings.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23821 Contextual SenSe Model: Word Sense Disambiguation Using Sense and Sense Value of Context Surrounding the Target
Authors: Vishal Raj, Noorhan Abbas
Abstract:
Ambiguity in NLP (Natural Language Processing) refers to the ability of a word, phrase, sentence, or text to have multiple meanings. This results in various kinds of ambiguities such as lexical, syntactic, semantic, anaphoric and referential. This study is focused mainly on solving the issue of Lexical ambiguity. Word Sense Disambiguation (WSD) is an NLP technique that aims to resolve lexical ambiguity by determining the correct meaning of a word within a given context. Most WSD solutions rely on words for training and testing, but we have used lemma and Part of Speech (POS) tokens of words for training and testing. Lemma adds generality and POS adds properties of word into token. We have designed a method to create an affinity matrix to calculate the affinity between any pair of lemma_POS (a token where lemma and POS of word are joined by underscore) of given training set. Additionally, we have devised an algorithm to create the sense clusters of tokens using affinity matrix under hierarchy of POS of lemma. Furthermore, three different mechanisms to predict the sense of target word using the affinity/similarity value are devised. Each contextual token contributes to the sense of target word with some value and whichever sense gets higher value becomes the sense of target word. So, contextual tokens play a key role in creating sense clusters and predicting the sense of target word, hence, the model is named Contextual SenSe Model (CSM). CSM exhibits a noteworthy simplicity and explication lucidity in contrast to contemporary deep learning models characterized by intricacy, time-intensive processes, and challenging explication. CSM is trained on SemCor training data and evaluated on SemEval test dataset. The results indicate that despite the naivety of the method, it achieves promising results when compared to the Most Frequent Sense (MFS) model.
Keywords: Word Sense Disambiguation, WSD, Contextual SenSe Model, Most Frequent Sense, part of speech, POS, Natural Language Processing, NLP, OOV, out of vocabulary, ELMo, Embeddings from Language Model, BERT, Bidirectional Encoder Representations from Transformers, Word2Vec, lemma_POS, Algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 54320 Twitter Sentiment Analysis during the Lockdown on New Zealand
Authors: Smah Doeban Almotiri
Abstract:
One of the most common fields of natural language processing (NLP) is sentimental analysis. The inferred feeling in the text can be successfully mined for various events using sentiment analysis. Twitter is viewed as a reliable data point for sentimental analytics studies since people are using social media to receive and exchange different types of data on a broad scale during the COVID-19 epidemic. The processing of such data may aid in making critical decisions on how to keep the situation under control. The aim of this research is to look at how sentimental states differed in a single geographic region during the lockdown at two different times.1162 tweets were analyzed related to the COVID-19 pandemic lockdown using keywords hashtags (lockdown, COVID-19) for the first sample tweets were from March 23, 2020, until April 23, 2020, and the second sample for the following year was from March 1, 2021, until April 4, 2021. Natural language processing (NLP), which is a form of Artificial intelligent was used for this research to calculate the sentiment value of all of the tweets by using AFINN Lexicon sentiment analysis method. The findings revealed that the sentimental condition in both different times during the region's lockdown was positive in the samples of this study, which are unique to the specific geographical area of New Zealand. This research suggests applied machine learning sentimental method such as Crystal Feel and extended the size of the sample tweet by using multiple tweets over a longer period of time.
Keywords: sentiment analysis, Twitter analysis, lockdown, Covid-19, AFINN, NodeJS
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 62819 A BERT-Based Model for Financial Social Media Sentiment Analysis
Authors: Josiel Delgadillo, Johnson Kinyua, Charles Mutigwe
Abstract:
The purpose of sentiment analysis is to determine the sentiment strength (e.g., positive, negative, neutral) from a textual source for good decision-making. Natural Language Processing (NLP) in domains such as financial markets requires knowledge of domain ontology, and pre-trained language models, such as BERT, have made significant breakthroughs in various NLP tasks by training on large-scale un-labeled generic corpora such as Wikipedia. However, sentiment analysis is a strong domain-dependent task. The rapid growth of social media has given users a platform to share their experiences and views about products, services, and processes, including financial markets. StockTwits and Twitter are social networks that allow the public to express their sentiments in real time. Hence, leveraging the success of unsupervised pre-training and a large amount of financial text available on social media platforms could potentially benefit a wide range of financial applications. This work is focused on sentiment analysis using social media text on platforms such as StockTwits and Twitter. To meet this need, SkyBERT, a domain-specific language model pre-trained and fine-tuned on financial corpora, has been developed. The results show that SkyBERT outperforms current state-of-the-art models in financial sentiment analysis. Extensive experimental results demonstrate the effectiveness and robustness of SkyBERT.
Keywords: BERT, financial markets, Twitter, sentiment analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 81018 Detecting Fake News: A Natural Language Processing, Reinforcement Learning, and Blockchain Approach
Authors: Ashly Joseph, Jithu Paulose
Abstract:
In an era where misleading information may quickly circulate on digital news channels, it is crucial to have efficient and trustworthy methods to detect and reduce the impact of misinformation. This research proposes an innovative framework that combines Natural Language Processing (NLP), Reinforcement Learning (RL), and Blockchain technologies to precisely detect and minimize the spread of false information in news articles on social media. The framework starts by gathering a variety of news items from different social media sites and performing preprocessing on the data to ensure its quality and uniformity. NLP methods are utilized to extract complete linguistic and semantic characteristics, effectively capturing the subtleties and contextual aspects of the language used. These features are utilized as input for a RL model. This model acquires the most effective tactics for detecting and mitigating the impact of false material by modeling the intricate dynamics of user engagements and incentives on social media platforms. The integration of blockchain technology establishes a decentralized and transparent method for storing and verifying the accuracy of information. The Blockchain component guarantees the unchangeability and safety of verified news records, while encouraging user engagement for detecting and fighting false information through an incentive system based on tokens. The suggested framework seeks to provide a thorough and resilient solution to the problems presented by misinformation in social media articles.
Keywords: Natural Language Processing, Reinforcement Learning, Blockchain, fake news mitigation, misinformation detection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19117 Study of Syntactic Errors for Deep Parsing at Machine Translation
Authors: Yukiko Sasaki Alam, Shahid Alam
Abstract:
Syntactic parsing is vital for semantic treatment by many applications related to natural language processing (NLP), because form and content coincide in many cases. However, it has not yet reached the levels of reliable performance. By manually examining and analyzing individual machine translation output errors that involve syntax as well as semantics, this study attempts to discover what is required for improving syntactic and semantic parsing.
Keywords: Machine translation, error analysis, syntactic errors, knowledge required for parsing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 128316 Continuous Text Translation Using Text Modeling in the Thetos System
Authors: Nina Suszczanska, Przemyslaw Szmal, Slawomir Kulikow
Abstract:
In the paper a method of modeling text for Polish is discussed. The method is aimed at transforming continuous input text into a text consisting of sentences in so called canonical form, whose characteristic is, among others, a complete structure as well as no anaphora or ellipses. The transformation is lossless as to the content of text being transformed. The modeling method has been worked out for the needs of the Thetos system, which translates Polish written texts into the Polish sign language. We believe that the method can be also used in various applications that deal with the natural language, e.g. in a text summary generator for Polish.Keywords: anaphora, machine translation, NLP, sign language, text syntax.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 167615 Query Reformulation Guided by External Resource for Information Retrieval
Authors: Mohammed El Amine Abderrahim
Abstract:
Reformulating the user query is a technique that aims to improve the performance of an Information Retrieval System (IRS) in terms of precision and recall. This paper tries to evaluate the technique of query reformulation guided by an external resource for Arabic texts. To do this, various precision and recall measures were conducted and two corpora with different external resources like Arabic WordNet (AWN) and the Arabic Dictionary (thesaurus) of Meaning (ADM) were used. Examination of the obtained results will allow us to measure the real contribution of this reformulation technique in improving the IRS performance.
Keywords: Arabic NLP, Arabic Information Retrieval, Arabic WordNet, Query Expansion.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 142614 WebGD: A CORBA-based Document Classification and Retrieval System on the Web
Authors: Fuyang Peng, Bo Deng, Chao Qi, Mou Zhan
Abstract:
This paper presents the design and implementation of the WebGD, a CORBA-based document classification and retrieval system on Internet. The WebGD makes use of such techniques as Web, CORBA, Java, NLP, fuzzy technique, knowledge-based processing and database technology. Unified classification and retrieval model, classifying and retrieving with one reasoning engine and flexible working mode configuration are some of its main features. The architecture of WebGD, the unified classification and retrieval model, the components of the WebGD server and the fuzzy inference engine are discussed in this paper in detail.Keywords: Text Mining, document classification, knowledgeprocessing, fuzzy logic, Web, CORBA
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 188313 Personal Information Classification Based on Deep Learning in Automatic Form Filling System
Authors: Shunzuo Wu, Xudong Luo, Yuanxiu Liao
Abstract:
Recently, the rapid development of deep learning makes artificial intelligence (AI) penetrate into many fields, replacing manual work there. In particular, AI systems also become a research focus in the field of automatic office. To meet real needs in automatic officiating, in this paper we develop an automatic form filling system. Specifically, it uses two classical neural network models and several word embedding models to classify various relevant information elicited from the Internet. When training the neural network models, we use less noisy and balanced data for training. We conduct a series of experiments to test my systems and the results show that our system can achieve better classification results.Keywords: Personal information, deep learning, auto fill, NLP, document analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 90012 Tweets to Touchdowns: Predicting National Football League Achievement from Social Media Optimism
Authors: Rohan Erasala, Ian McCulloh
Abstract:
The National Football League (NFL) Draft is a chance for every NFL team to select their next superstar. As a result, teams heavily invest in scouting, and millions of fans partake in the online discourse surrounding the draft. This paper investigates the potential correlations between positive sentiment in individual draft selection threads from the subreddit r/NFL and if these data can be used to make successful player recommendations. It is hypothesized that there will be limited correlations and nonviable recommendations made from these threads. The hypothesis is tested using sentiment analysis of draft thread comments and analyzing correlation and precision at k of top scores. The results indicate weak correlations between the percentage of positive comments in a draft selection thread and a player’s approximate value, but potentially viable recommendations from looking at players whose draft selection threads have the highest percentage of positive comments.
Keywords: National Football League, NFL, NFL Draft, sentiment analysis, Reddit, social media, NLP, sentiment analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11311 A Study on Removal of Toluidine Blue Dye from Aqueous Solution by Adsorption onto Neem Leaf Powder
Authors: Himanshu Patel, R. T. Vashi
Abstract:
Adsorption of Toluidine blue dye from aqueous solutions onto Neem Leaf Powder (NLP) has been investigated. The surface characterization of this natural material was examined by Particle size analysis, Scanning Electron Microscopy (SEM), Fourier Transform Infrared (FTIR) spectroscopy and X-Ray Diffraction (XRD). The effects of process parameters such as initial concentration, pH, temperature and contact duration on the adsorption capacities have been evaluated, in which pH has been found to be most effective parameter among all. The data were analyzed using the Langmuir and Freundlich for explaining the equilibrium characteristics of adsorption. And kinetic models like pseudo first- order, second-order model and Elovich equation were utilized to describe the kinetic data. The experimental data were well fitted with Langmuir adsorption isotherm model and pseudo second order kinetic model. The thermodynamic parameters, such as Free energy of adsorption (AG"), enthalpy change (AH') and entropy change (AS°) were also determined and evaluated.
Keywords: Adsorption, isotherm models, kinetic models, temperature, toluidine blue dye, surface chemistry.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 181910 A Novel Optimal Setting for Directional over Current Relay Coordination using Particle Swarm Optimization
Authors: D. Vijayakumar, R. K. Nema
Abstract:
Over Current Relays (OCRs) and Directional Over Current Relays (DOCRs) are widely used for the radial protection and ring sub transmission protection systems and for distribution systems. All previous work formulates the DOCR coordination problem either as a Non-Linear Programming (NLP) for TDS and Ip or as a Linear Programming (LP) for TDS using recently a social behavior (Particle Swarm Optimization techniques) introduced to the work. In this paper, a Modified Particle Swarm Optimization (MPSO) technique is discussed for the optimal settings of DOCRs in power systems as a Non-Linear Programming problem for finding Ip values of the relays and for finding the TDS setting as a linear programming problem. The calculation of the Time Dial Setting (TDS) and the pickup current (Ip) setting of the relays is the core of the coordination study. PSO technique is considered as realistic and powerful solution schemes to obtain the global or quasi global optimum in optimization problem.
Keywords: Directional over current relays, Optimization techniques, Particle swarm optimization, Power system protection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 27789 Harnessing the Power of AI: Transforming DevSecOps for Enhanced Cloud Security
Authors: Ashly Joseph, Jithu Paulose
Abstract:
The increased usage of cloud computing has revolutionized the IT landscape, but it has also raised new security concerns. DevSecOps emerged as a way for tackling these difficulties by integrating security into the software development process. However, the rising complexity and sophistication of cyber threats need more advanced solutions. This paper looks into the usage of artificial intelligence (AI) techniques in the DevSecOps framework to increase cloud security. This study uses quantitative and qualitative techniques to assess the usefulness of AI approaches such as machine learning, natural language processing, and deep learning in reducing security issues. This paper thoroughly examines the symbiotic relationship between AI and DevSecOps, concentrating on how AI may be seamlessly integrated into the continuous integration and continuous delivery (CI/CD) pipeline, automated security testing, and real-time monitoring methods. The findings emphasize AI's huge potential to improve threat detection, risk assessment, and incident response skills. Furthermore, the paper examines the implications and challenges of using AI in DevSecOps workflows, considering factors like as scalability, interpretability, and adaptability. This paper adds to a better understanding of AI's revolutionary role in cloud security and provides valuable insights for practitioners and scholars in the field.
Keywords: Cloud Security, DevSecOps, Artificial Intelligence, AI, Machine Learning, Natural Language Processing, NLP, cybersecurity, AI-driven Security.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2618 Event Information Extraction System (EIEE): FSM vs HMM
Authors: Shaukat Wasi, Zubair A. Shaikh, Sajid Qasmi, Hussain Sachwani, Rehman Lalani, Aamir Chagani
Abstract:
Automatic Extraction of Event information from social text stream (emails, social network sites, blogs etc) is a vital requirement for many applications like Event Planning and Management systems and security applications. The key information components needed from Event related text are Event title, location, participants, date and time. Emails have very unique distinctions over other social text streams from the perspective of layout and format and conversation style and are the most commonly used communication channel for broadcasting and planning events. Therefore we have chosen emails as our dataset. In our work, we have employed two statistical NLP methods, named as Finite State Machines (FSM) and Hidden Markov Model (HMM) for the extraction of event related contextual information. An application has been developed providing a comparison among the two methods over the event extraction task. It comprises of two modules, one for each method, and works for both bulk as well as direct user input. The results are evaluated using Precision, Recall and F-Score. Experiments show that both methods produce high performance and accuracy, however HMM was good enough over Title extraction and FSM proved to be better for Venue, Date, and time.Keywords: Emails, Event Extraction, Event Detection, Finite state machines, Hidden Markov Model.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23437 Q-Map: Clinical Concept Mining from Clinical Documents
Authors: Sheikh Shams Azam, Manoj Raju, Venkatesh Pagidimarri, Vamsi Kasivajjala
Abstract:
Over the past decade, there has been a steep rise in the data-driven analysis in major areas of medicine, such as clinical decision support system, survival analysis, patient similarity analysis, image analytics etc. Most of the data in the field are well-structured and available in numerical or categorical formats which can be used for experiments directly. But on the opposite end of the spectrum, there exists a wide expanse of data that is intractable for direct analysis owing to its unstructured nature which can be found in the form of discharge summaries, clinical notes, procedural notes which are in human written narrative format and neither have any relational model nor any standard grammatical structure. An important step in the utilization of these texts for such studies is to transform and process the data to retrieve structured information from the haystack of irrelevant data using information retrieval and data mining techniques. To address this problem, the authors present Q-Map in this paper, which is a simple yet robust system that can sift through massive datasets with unregulated formats to retrieve structured information aggressively and efficiently. It is backed by an effective mining technique which is based on a string matching algorithm that is indexed on curated knowledge sources, that is both fast and configurable. The authors also briefly examine its comparative performance with MetaMap, one of the most reputed tools for medical concepts retrieval and present the advantages the former displays over the latter.Keywords: Information retrieval (IR), unified medical language system (UMLS), Syntax Based Analysis, natural language processing (NLP), medical informatics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8196 COVID_ICU_BERT: A Fine-tuned Language Model for COVID-19 Intensive Care Unit Clinical Notes
Authors: Shahad Nagoor, Lucy Hederman, Kevin Koidl, Annalina Caputo
Abstract:
Doctors’ notes reflect their impressions, attitudes, clinical sense, and opinions about patients’ conditions and progress, and other information that is essential for doctors’ daily clinical decisions. Despite their value, clinical notes are insufficiently researched within the language processing community. Automatically extracting information from unstructured text data is known to be a difficult task as opposed to dealing with structured information such as physiological vital signs, images and laboratory results. The aim of this research is to investigate how Natural Language Processing (NLP) techniques and machine learning techniques applied to clinician notes can assist in doctors’ decision making in Intensive Care Unit (ICU) for coronavirus disease 2019 (COVID-19) patients. The hypothesis is that clinical outcomes like survival or mortality can be useful to influence the judgement of clinical sentiment in ICU clinical notes. This paper presents two contributions: first, we introduce COVID_ICU_BERT, a fine-tuned version of a clinical transformer model that can reliably predict clinical sentiment for notes of COVID patients in ICU. We train the model on clinical notes for COVID-19 patients, ones not previously seen by Bio_ClinicalBERT or Bio_Discharge_Summary_BERT. The model which was based on Bio_ClinicalBERT achieves higher predictive accuracy than the one based on Bio_Discharge_Summary_BERT (Acc 93.33%, AUC 0.98, and Precision 0.96). Second, we perform data augmentation using clinical contextual word embedding that is based on a pre-trained clinical model to balance the samples in each class in the data (survived vs. deceased patients). Data augmentation improves the accuracy of prediction slightly (Acc 96.67%, AUC 0.98, and Precision 0.92).
Keywords: BERT fine-tuning, clinical sentiment, COVID-19, data augmentation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3215 SMaTTS: Standard Malay Text to Speech System
Authors: Othman O. Khalifa, Zakiah Hanim Ahmad, Teddy Surya Gunawan
Abstract:
This paper presents a rule-based text- to- speech (TTS) Synthesis System for Standard Malay, namely SMaTTS. The proposed system using sinusoidal method and some pre- recorded wave files in generating speech for the system. The use of phone database significantly decreases the amount of computer memory space used, thus making the system very light and embeddable. The overall system was comprised of two phases the Natural Language Processing (NLP) that consisted of the high-level processing of text analysis, phonetic analysis, text normalization and morphophonemic module. The module was designed specially for SM to overcome few problems in defining the rules for SM orthography system before it can be passed to the DSP module. The second phase is the Digital Signal Processing (DSP) which operated on the low-level process of the speech waveform generation. A developed an intelligible and adequately natural sounding formant-based speech synthesis system with a light and user-friendly Graphical User Interface (GUI) is introduced. A Standard Malay Language (SM) phoneme set and an inclusive set of phone database have been constructed carefully for this phone-based speech synthesizer. By applying the generative phonology, a comprehensive letter-to-sound (LTS) rules and a pronunciation lexicon have been invented for SMaTTS. As for the evaluation tests, a set of Diagnostic Rhyme Test (DRT) word list was compiled and several experiments have been performed to evaluate the quality of the synthesized speech by analyzing the Mean Opinion Score (MOS) obtained. The overall performance of the system as well as the room for improvements was thoroughly discussed.Keywords: Natural Language Processing, Text-To-Speech (TTS), Diphone, source filter, low-/ high- level synthesis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19924 A Sentence-to-Sentence Relation Network for Recognizing Textual Entailment
Authors: Isaac K. E. Ampomah, Seong-Bae Park, Sang-Jo Lee
Abstract:
Over the past decade, there have been promising developments in Natural Language Processing (NLP) with several investigations of approaches focusing on Recognizing Textual Entailment (RTE). These models include models based on lexical similarities, models based on formal reasoning, and most recently deep neural models. In this paper, we present a sentence encoding model that exploits the sentence-to-sentence relation information for RTE. In terms of sentence modeling, Convolutional neural network (CNN) and recurrent neural networks (RNNs) adopt different approaches. RNNs are known to be well suited for sequence modeling, whilst CNN is suited for the extraction of n-gram features through the filters and can learn ranges of relations via the pooling mechanism. We combine the strength of RNN and CNN as stated above to present a unified model for the RTE task. Our model basically combines relation vectors computed from the phrasal representation of each sentence and final encoded sentence representations. Firstly, we pass each sentence through a convolutional layer to extract a sequence of higher-level phrase representation for each sentence from which the first relation vector is computed. Secondly, the phrasal representation of each sentence from the convolutional layer is fed into a Bidirectional Long Short Term Memory (Bi-LSTM) to obtain the final sentence representations from which a second relation vector is computed. The relations vectors are combined and then used in then used in the same fashion as attention mechanism over the Bi-LSTM outputs to yield the final sentence representations for the classification. Experiment on the Stanford Natural Language Inference (SNLI) corpus suggests that this is a promising technique for RTE.Keywords: Deep neural models, natural language inference, recognizing textual entailment, sentence-to-sentence relation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14853 Redefining Health Information Systems with Machine Learning: Harnessing the Potential of AI-Powered Data Fusion Ecosystems
Authors: Shohoni Mahabub
Abstract:
Health Information Systems (HIS) are essential to contemporary healthcare; nonetheless, they frequently encounter challenges such as data fragmentation, inefficiencies, and an absence of real-time analytics. The advent of Machine Learning (ML) and Artificial Intelligence (AI) provides a revolutionary potential to address these difficulties via AI-driven data fusion ecosystems. These ecosystems integrate many health data sources, including Electronic Health Records (EHRs), wearable devices, and genetic data, with sophisticated ML techniques such as Natural Language Processing (NLP) and predictive analytics to produce actionable insights. Through the integration of strong data intake layers, secure interoperability protocols, and privacy-preserving models, these ecosystems provide individualized treatment, early illness diagnosis, and enhanced operational efficiency. This paradigm change enhances clinical decision-making and rectifies systemic inefficiencies in healthcare delivery. Nonetheless, adoption presents problems such as data privacy concerns, ethical considerations, and scalability constraints. The study examines options such as federated learning for safe, decentralized data sharing, explainable AI for transparency, and cloud-based infrastructure for scalability to address these issues. These ecosystems aim to address health equity disparities, particularly in resource-limited environments, and improve public health surveillance, notably in pandemic response initiatives. This article emphasizes the revolutionary potential of AI-driven data fusion ecosystems in redefining HIS by providing an implementation roadmap and showcasing successful deployment case studies. The suggested method promotes a cooperative initiative among legislators, healthcare professionals, and technology to establish a cohesive, efficient, and patient-centric healthcare model.
Keywords: AI-powered healthcare systems, data fusion ecosystem, predictive analytics, digital health interoperability.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 132 Improving Subjective Bias Detection Using Bidirectional Encoder Representations from Transformers and Bidirectional Long Short-Term Memory
Authors: Ebipatei Victoria Tunyan, T. A. Cao, Cheol Young Ock
Abstract:
Detecting subjectively biased statements is a vital task. This is because this kind of bias, when present in the text or other forms of information dissemination media such as news, social media, scientific texts, and encyclopedias, can weaken trust in the information and stir conflicts amongst consumers. Subjective bias detection is also critical for many Natural Language Processing (NLP) tasks like sentiment analysis, opinion identification, and bias neutralization. Having a system that can adequately detect subjectivity in text will boost research in the above-mentioned areas significantly. It can also come in handy for platforms like Wikipedia, where the use of neutral language is of importance. The goal of this work is to identify the subjectively biased language in text on a sentence level. With machine learning, we can solve complex AI problems, making it a good fit for the problem of subjective bias detection. A key step in this approach is to train a classifier based on BERT (Bidirectional Encoder Representations from Transformers) as upstream model. BERT by itself can be used as a classifier; however, in this study, we use BERT as data preprocessor as well as an embedding generator for a Bi-LSTM (Bidirectional Long Short-Term Memory) network incorporated with attention mechanism. This approach produces a deeper and better classifier. We evaluate the effectiveness of our model using the Wiki Neutrality Corpus (WNC), which was compiled from Wikipedia edits that removed various biased instances from sentences as a benchmark dataset, with which we also compare our model to existing approaches. Experimental analysis indicates an improved performance, as our model achieved state-of-the-art accuracy in detecting subjective bias. This study focuses on the English language, but the model can be fine-tuned to accommodate other languages.
Keywords: Subjective bias detection, machine learning, BERT–BiLSTM–Attention, text classification, natural language processing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8911 Named Entity Recognition using Support Vector Machine: A Language Independent Approach
Authors: Asif Ekbal, Sivaji Bandyopadhyay
Abstract:
Named Entity Recognition (NER) aims to classify each word of a document into predefined target named entity classes and is now-a-days considered to be fundamental for many Natural Language Processing (NLP) tasks such as information retrieval, machine translation, information extraction, question answering systems and others. This paper reports about the development of a NER system for Bengali and Hindi using Support Vector Machine (SVM). Though this state of the art machine learning technique has been widely applied to NER in several well-studied languages, the use of this technique to Indian languages (ILs) is very new. The system makes use of the different contextual information of the words along with the variety of features that are helpful in predicting the four different named (NE) classes, such as Person name, Location name, Organization name and Miscellaneous name. We have used the annotated corpora of 122,467 tokens of Bengali and 502,974 tokens of Hindi tagged with the twelve different NE classes 1, defined as part of the IJCNLP-08 NER Shared Task for South and South East Asian Languages (SSEAL) 2. In addition, we have manually annotated 150K wordforms of the Bengali news corpus, developed from the web-archive of a leading Bengali newspaper. We have also developed an unsupervised algorithm in order to generate the lexical context patterns from a part of the unlabeled Bengali news corpus. Lexical patterns have been used as the features of SVM in order to improve the system performance. The NER system has been tested with the gold standard test sets of 35K, and 60K tokens for Bengali, and Hindi, respectively. Evaluation results have demonstrated the recall, precision, and f-score values of 88.61%, 80.12%, and 84.15%, respectively, for Bengali and 80.23%, 74.34%, and 77.17%, respectively, for Hindi. Results show the improvement in the f-score by 5.13% with the use of context patterns. Statistical analysis, ANOVA is also performed to compare the performance of the proposed NER system with that of the existing HMM based system for both the languages.
Keywords: Named Entity (NE), Named Entity Recognition (NER), Support Vector Machine (SVM), Bengali, Hindi.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3455