Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 8890

Search results for: natural language processing

8890 Role of Natural Language Processing in Information Retrieval; Challenges and Opportunities

Authors: Khaled M. Alhawiti

Abstract:

This paper aims to analyze the role of natural language processing (NLP). The paper will discuss the role in the context of automated data retrieval, automated question answer, and text structuring. NLP techniques are gaining wider acceptance in real life applications and industrial concerns. There are various complexities involved in processing the text of natural language that could satisfy the need of decision makers. This paper begins with the description of the qualities of NLP practices. The paper then focuses on the challenges in natural language processing. The paper also discusses major techniques of NLP. The last section describes opportunities and challenges for future research.

Keywords: data retrieval, information retrieval, natural language processing, text structuring

Procedia PDF Downloads 201
8889 Impact of Natural Language Processing in Educational Setting: An Effective Approach towards Improved Learning

Authors: Khaled M. Alhawiti

Abstract:

Natural Language Processing (NLP) is an effective approach for bringing improvement in educational setting. This involves initiating the process of learning through the natural acquisition in the educational systems. It is based on following effective approaches for providing the solution for various problems and issues in education. Natural Language Processing provides solution in a variety of different fields associated with the social and cultural context of language learning. It is based on involving various tools and techniques such as grammar, syntax, and structure of text. It is effective approach for teachers, students, authors, and educators for providing assistance for writing, analysis, and assessment procedure. Natural Language Processing is widely integrated in the large number of educational contexts such as research, science, linguistics, e-learning, evaluations system, and various other educational settings such as schools, higher education system, and universities. Natural Language Processing is based on applying scientific approach in the educational settings. In the educational settings, NLP is an effective approach to ensure that students can learn easily in the same way as they acquired language in the natural settings.

Keywords: natural language processing, education, application, e-learning, scientific studies, educational system

Procedia PDF Downloads 365
8888 Natural Language Processing; the Future of Clinical Record Management

Authors: Khaled M. Alhawiti

Abstract:

This paper investigates the future of medicine and the use of Natural language processing. The importance of having correct clinical information available online is remarkable; improving patient care at affordable costs could be achieved using automated applications to use the online clinical information. The major challenge towards the retrieval of such vital information is to have it appropriately coded. Majority of the online patient reports are not found to be coded and not accessible as its recorded in natural language text. The use of Natural Language processing provides a feasible solution by retrieving and organizing clinical information, available in text and transforming clinical data that is available for use. Systems used in NLP are rather complex to construct, as they entail considerable knowledge, however significant development has been made. Newly formed NLP systems have been tested and have established performance that is promising and considered as practical clinical applications.

Keywords: clinical information, information retrieval, natural language processing, automated applications

Procedia PDF Downloads 258
8887 Resource Creation Using Natural Language Processing Techniques for Malay Translated Qur'an

Authors: Nor Diana Ahmad, Eric Atwell, Brandon Bennett

Abstract:

Text processing techniques for English have been developed for several decades. But for the Malay language, text processing methods are still far behind. Moreover, there are limited resources, tools for computational linguistic analysis available for the Malay language. Therefore, this research presents the use of natural language processing (NLP) in processing Malay translated Qur’an text. As the result, a new language resource for Malay translated Qur’an was created. This resource will help other researchers to build the necessary processing tools for the Malay language. This research also develops a simple question-answer prototype to demonstrate the use of the Malay Qur’an resource for text processing. This prototype has been developed using Python. The prototype pre-processes the Malay Qur’an and an input query using a stemming algorithm and then searches for occurrences of the query word stem. The result produced shows improved matching likelihood between user query and its answer. A POS-tagging algorithm has also been produced. The stemming and tagging algorithms can be used as tools for research related to other Malay texts and can be used to support applications such as information retrieval, question answering systems, ontology-based search and other text analysis tasks.

Keywords: language resource, Malay translated Qur'an, natural language processing (NLP), text processing

Procedia PDF Downloads 169
8886 Resume Ranking Using Custom Word2vec and Rule-Based Natural Language Processing Techniques

Authors: Subodh Chandra Shakya, Rajendra Sapkota, Aakash Tamang, Shushant Pudasaini, Sujan Adhikari, Sajjan Adhikari

Abstract:

Lots of efforts have been made in order to measure the semantic similarity between the text corpora in the documents. Techniques have been evolved to measure the similarity of two documents. One such state-of-art technique in the field of Natural Language Processing (NLP) is word to vector models, which converts the words into their word-embedding and measures the similarity between the vectors. We found this to be quite useful for the task of resume ranking. So, this research paper is the implementation of the word2vec model along with other Natural Language Processing techniques in order to rank the resumes for the particular job description so as to automate the process of hiring. The research paper proposes the system and the findings that were made during the process of building the system.

Keywords: chunking, document similarity, information extraction, natural language processing, word2vec, word embedding

Procedia PDF Downloads 9
8885 Gender Bias in Natural Language Processing: Machines Reflect Misogyny in Society

Authors: Irene Yi

Abstract:

Machine learning, natural language processing, and neural network models of language are becoming more and more prevalent in the fields of technology and linguistics today. Training data for machines are at best, large corpora of human literature and at worst, a reflection of the ugliness in society. Machines have been trained on millions of human books, only to find that in the course of human history, derogatory and sexist adjectives are used significantly more frequently when describing females in history and literature than when describing males. This is extremely problematic, both as training data, and as the outcome of natural language processing. As machines start to handle more responsibilities, it is crucial to ensure that they do not take with them historical sexist and misogynistic notions. This paper gathers data and algorithms from neural network models of language having to deal with syntax, semantics, sociolinguistics, and text classification. Results are significant in showing the existing intentional and unintentional misogynistic notions used to train machines, as well as in developing better technologies that take into account the semantics and syntax of text to be more mindful and reflect gender equality. Further, this paper deals with the idea of non-binary gender pronouns and how machines can process these pronouns correctly, given its semantic and syntactic context. This paper also delves into the implications of gendered grammar and its effect, cross-linguistically, on natural language processing. Languages such as French or Spanish not only have rigid gendered grammar rules, but also historically patriarchal societies. The progression of society comes hand in hand with not only its language, but how machines process those natural languages. These ideas are all extremely vital to the development of natural language models in technology, and they must be taken into account immediately.

Keywords: gendered grammar, misogynistic language, natural language processing, neural networks

Procedia PDF Downloads 10
8884 Context Detection in Spreadsheets Based on Automatically Inferred Table Schema

Authors: Alexander Wachtel, Michael T. Franzen, Walter F. Tichy

Abstract:

Programming requires years of training. With natural language and end user development methods, programming could become available to everyone. It enables end users to program their own devices and extend the functionality of the existing system without any knowledge of programming languages. In this paper, we describe an Interactive Spreadsheet Processing Module (ISPM), a natural language interface to spreadsheets that allows users to address ranges within the spreadsheet based on inferred table schema. Using the ISPM, end users are able to search for values in the schema of the table and to address the data in spreadsheets implicitly. Furthermore, it enables them to select and sort the spreadsheet data by using natural language. ISPM uses a machine learning technique to automatically infer areas within a spreadsheet, including different kinds of headers and data ranges. Since ranges can be identified from natural language queries, the end users can query the data using natural language. During the evaluation 12 undergraduate students were asked to perform operations (sum, sort, group and select) using the system and also Excel without ISPM interface, and the time taken for task completion was compared across the two systems. Only for the selection task did users take less time in Excel (since they directly selected the cells using the mouse) than in ISPM, by using natural language for end user software engineering, to overcome the present bottleneck of professional developers.

Keywords: natural language processing, natural language interfaces, human computer interaction, end user development, dialog systems, data recognition, spreadsheet

Procedia PDF Downloads 205
8883 Application of Natural Language Processing in Education

Authors: Khaled M. Alhawiti

Abstract:

Reading capability is a major segment of language competency. On the other hand, discovering topical writings at a fitting level for outside and second language learners is a test for educators. We address this issue utilizing natural language preparing innovation to survey reading level and streamline content. In the connection of outside and second-language learning, existing measures of reading level are not appropriate to this errand. Related work has demonstrated the profit of utilizing measurable language preparing procedures; we expand these thoughts and incorporate other potential peculiarities to measure intelligibility. In the first piece of this examination, we join characteristics from measurable language models, customary reading level measures and other language preparing apparatuses to deliver a finer technique for recognizing reading level. We examine the execution of human annotators and assess results for our finders concerning human appraisals. A key commitment is that our identifiers are trainable; with preparing and test information from the same space, our finders beat more general reading level instruments (Flesch-Kincaid and Lexile). Trainability will permit execution to be tuned to address the needs of specific gatherings or understudies.

Keywords: natural language processing, trainability, syntactic simplification tools, education

Procedia PDF Downloads 378
8882 Detecting Paraphrases in Arabic Text

Authors: Amal Alshahrani, Allan Ramsay

Abstract:

Paraphrasing is one of the important tasks in natural language processing; i.e. alternative ways to express the same concept by using different words or phrases. Paraphrases can be used in many natural language applications, such as Information Retrieval, Machine Translation, Question Answering, Text Summarization, or Information Extraction. To obtain pairs of sentences that are paraphrases we create a system that automatically extracts paraphrases from a corpus, which is built from different sources of news article since these are likely to contain paraphrases when they report the same event on the same day. There are existing simple standard approaches (e.g. TF-IDF vector space, cosine similarity) and alignment technique (e.g. Dynamic Time Warping (DTW)) for extracting paraphrase which have been applied to the English. However, the performance of these approaches could be affected when they are applied to another language, for instance Arabic language, due to the presence of phenomena which are not present in English, such as Free Word Order, Zero copula, and Pro-dropping. These phenomena will affect the performance of these algorithms. Thus, if we can analysis how the existing algorithms for English fail for Arabic then we can find a solution for Arabic. The results are promising.

Keywords: natural language processing, TF-IDF, cosine similarity, dynamic time warping (DTW)

Procedia PDF Downloads 248
8881 A Controlled Natural Language Assisted Approach for the Design and Automated Processing of Service Level Agreements

Authors: Christopher Schwarz, Katrin Riegler, Erwin Zinser

Abstract:

The management of outsourcing relationships between IT service providers and their customers proofs to be a critical issue that has to be stipulated by means of Service Level Agreements (SLAs). Since service requirements differ from customer to customer, SLA content and language structures vary largely, standardized SLA templates may not be used and an automated processing of SLA content is not possible. Hence, SLA management is usually a time-consuming and inefficient manual process. For overcoming these challenges, this paper presents an innovative and ITIL V3-conform approach for automated SLA design and management using controlled natural language in enterprise collaboration portals. The proposed novel concept is based on a self-developed controlled natural language that follows a subject-predicate-object approach to specify well-defined SLA content structures that act as templates for customized contracts and support automated SLA processing. The derived results eventually enable IT service providers to automate several SLA request, approval and negotiation processes by means of workflows and business rules within an enterprise collaboration portal. The illustrated prototypical realization gives evidence of the practical relevance in service-oriented scenarios as well as the high flexibility and adaptability of the presented model. Thus, the prototype enables the automated creation of well defined, customized SLA documents, providing a knowledge representation that is both human understandable and machine processable.

Keywords: automated processing, controlled natural language, knowledge representation, information technology outsourcing, service level management

Procedia PDF Downloads 308
8880 Morphology of Cartographic Words: A Perspective from Chinese Characters

Authors: Xinyu Gong, Zhilin Li, Xintao Liu

Abstract:

Maps are a means of communication. Cartographic language involves established theories of natural language for understanding maps. “Cartographic words’, or “map symbols”, are crucial elements of cartographic language. Personalized mapping is increasingly popular, with growing demands for customized map-making by the general public. Automated symbol-making and customization play a key role in personalized mapping. However, formal representations for the automated construction of map symbols are still lacking. In natural language, the process of word and sentence construction can be formalized. Through the analogy between natural language and graphical language, formal representations of natural language construction can be used as a reference for constructing cartographic language. We selected Chinese character structures (i.e., S

Keywords: personalized mapping, Chinese character, cartographic language, map symbols

Procedia PDF Downloads 24
8879 A Survey of the Applications of Sentiment Analysis

Authors: Pingping Lin, Xudong Luo

Abstract:

Natural language often conveys emotions of speakers. Therefore, sentiment analysis on what people say is prevalent in the field of natural language process and has great application value in many practical problems. Thus, to help people understand its application value, in this paper, we survey various applications of sentiment analysis, including the ones in online business and offline business as well as other types of its applications. In particular, we give some application examples in intelligent customer service systems in China. Besides, we compare the applications of sentiment analysis on Twitter, Weibo, Taobao and Facebook, and discuss some challenges. Finally, we point out the challenges faced in the applications of sentiment analysis and the work that is worth being studied in the future.

Keywords: application, natural language processing, online comments, sentiment analysis

Procedia PDF Downloads 23
8878 Models and Metamodels for Computer-Assisted Natural Language Grammar Learning

Authors: Evgeny Pyshkin, Maxim Mozgovoy, Vladislav Volkov

Abstract:

The paper follows a discourse on computer-assisted language learning. We examine problems of foreign language teaching and learning and introduce a metamodel that can be used to define learning models of language grammar structures in order to support teacher/student interaction. Special attention is paid to the concept of a virtual language lab. Our approach to language education assumes to encourage learners to experiment with a language and to learn by discovering patterns of grammatically correct structures created and managed by a language expert.

Keywords: computer-assisted instruction, language learning, natural language grammar models, HCI

Procedia PDF Downloads 367
8877 Part of Speech Tagging Using Statistical Approach for Nepali Text

Authors: Archit Yajnik

Abstract:

Part of Speech Tagging has always been a challenging task in the era of Natural Language Processing. This article presents POS tagging for Nepali text using Hidden Markov Model and Viterbi algorithm. From the Nepali text, annotated corpus training and testing data set are randomly separated. Both methods are employed on the data sets. Viterbi algorithm is found to be computationally faster and accurate as compared to HMM. The accuracy of 95.43% is achieved using Viterbi algorithm. Error analysis where the mismatches took place is elaborately discussed.

Keywords: hidden markov model, natural language processing, POS tagging, viterbi algorithm

Procedia PDF Downloads 235
8876 Resource Framework Descriptors for Interestingness in Data

Authors: C. B. Abhilash, Kavi Mahesh

Abstract:

Human beings are the most advanced species on earth; it's all because of the ability to communicate and share information via human language. In today's world, a huge amount of data is available on the web in text format. This has also resulted in the generation of big data in structured and unstructured formats. In general, the data is in the textual form, which is highly unstructured. To get insights and actionable content from this data, we need to incorporate the concepts of text mining and natural language processing. In our study, we mainly focus on Interesting data through which interesting facts are generated for the knowledge base. The approach is to derive the analytics from the text via the application of natural language processing. Using semantic web Resource framework descriptors (RDF), we generate the triple from the given data and derive the interesting patterns. The methodology also illustrates data integration using the RDF for reliable, interesting patterns.

Keywords: RDF, interestingness, knowledge base, semantic data

Procedia PDF Downloads 22
8875 Implementing a Database from a Requirement Specification

Authors: M. Omer, D. Wilson

Abstract:

Creating a database scheme is essentially a manual process. From a requirement specification, the information contained within has to be analyzed and reduced into a set of tables, attributes and relationships. This is a time-consuming process that has to go through several stages before an acceptable database schema is achieved. The purpose of this paper is to implement a Natural Language Processing (NLP) based tool to produce a from a requirement specification. The Stanford CoreNLP version 3.3.1 and the Java programming were used to implement the proposed model. The outcome of this study indicates that the first draft of a relational database schema can be extracted from a requirement specification by using NLP tools and techniques with minimum user intervention. Therefore, this method is a step forward in finding a solution that requires little or no user intervention.

Keywords: information extraction, natural language processing, relation extraction

Procedia PDF Downloads 116
8874 From User's Requirements to UML Class Diagram

Authors: Zeineb Ben Azzouz, Wahiba Ben Abdessalem Karaa

Abstract:

The automated extraction of UML class diagram from natural language requirements is a highly challenging task. Many approaches, frameworks and tools have been presented in this field. Nonetheless, the experiments of these tools have shown that there is no approach that can work best all the time. In this context, we propose a new accurate approach to facilitate the automatic mapping from textual requirements to UML class diagram. Our new approach integrates the best properties of statistical Natural Language Processing (NLP) techniques to reduce ambiguity when analysing natural language requirements text. In addition, our approach follows the best practices defined by conceptual modelling experts to determine some patterns indispensable for the extraction of basic elements and concepts of the class diagram. Once the relevant information of class diagram is captured, a XMI document is generated and imported with a CASE tool to build the corresponding UML class diagram.

Keywords: class diagram, user’s requirements, XMI, software engineering

Procedia PDF Downloads 334
8873 Computational Linguistic Implications of Gender Bias: Machines Reflect Misogyny in Society

Authors: Irene Yi

Abstract:

Machine learning, natural language processing, and neural network models of language are becoming more and more prevalent in the fields of technology and linguistics today. Training data for machines are at best, large corpora of human literature and at worst, a reflection of the ugliness in society. Computational linguistics is a growing field dealing with such issues of data collection for technological development. Machines have been trained on millions of human books, only to find that in the course of human history, derogatory and sexist adjectives are used significantly more frequently when describing females in history and literature than when describing males. This is extremely problematic, both as training data, and as the outcome of natural language processing. As machines start to handle more responsibilities, it is crucial to ensure that they do not take with them historical sexist and misogynistic notions. This paper gathers data and algorithms from neural network models of language having to deal with syntax, semantics, sociolinguistics, and text classification. Computational analysis on such linguistic data is used to find patterns of misogyny. Results are significant in showing the existing intentional and unintentional misogynistic notions used to train machines, as well as in developing better technologies that take into account the semantics and syntax of text to be more mindful and reflect gender equality. Further, this paper deals with the idea of non-binary gender pronouns and how machines can process these pronouns correctly, given its semantic and syntactic context. This paper also delves into the implications of gendered grammar and its effect, cross-linguistically, on natural language processing. Languages such as French or Spanish not only have rigid gendered grammar rules, but also historically patriarchal societies. The progression of society comes hand in hand with not only its language, but how machines process those natural languages. These ideas are all extremely vital to the development of natural language models in technology, and they must be taken into account immediately.

Keywords: computational analysis, gendered grammar, misogynistic language, neural networks

Procedia PDF Downloads 11
8872 A Supervised Approach for Word Sense Disambiguation Based on Arabic Diacritics

Authors: Alaa Alrakaf, Sk. Md. Mizanur Rahman

Abstract:

Since the last two decades’ Arabic natural language processing (ANLP) has become increasingly much more important. One of the key issues related to ANLP is ambiguity. In Arabic language different pronunciation of one word may have a different meaning. Furthermore, ambiguity also has an impact on the effectiveness and efficiency of Machine Translation (MT). The issue of ambiguity has limited the usefulness and accuracy of the translation from Arabic to English. The lack of Arabic resources makes ambiguity problem more complicated. Additionally, the orthographic level of representation cannot specify the exact meaning of the word. This paper looked at the diacritics of Arabic language and used them to disambiguate a word. The proposed approach of word sense disambiguation used Diacritizer application to Diacritize Arabic text then found the most accurate sense of an ambiguous word using Naïve Bayes Classifier. Our Experimental study proves that using Arabic Diacritics with Naïve Bayes Classifier enhances the accuracy of choosing the appropriate sense by 23% and also decreases the ambiguity in machine translation.

Keywords: Arabic natural language processing, machine learning, machine translation, Naive bayes classifier, word sense disambiguation

Procedia PDF Downloads 239
8871 Controlling Drone Flight Missions through Natural Language Processors Using Artificial Intelligence

Authors: Sylvester Akpah, Selasi Vondee

Abstract:

Unmanned Aerial Vehicles (UAV) as they are also known, drones have attracted increasing attention in recent years due to their ubiquitous nature and boundless applications in the areas of communication, surveying, aerial photography, weather forecasting, medical delivery, surveillance amongst others. Operated remotely in real-time or pre-programmed, drones can fly autonomously or on pre-defined routes. The application of these aerial vehicles has successfully penetrated the world due to technological evolution, thus a lot more businesses are utilizing their capabilities. Unfortunately, while drones are replete with the benefits stated supra, they are riddled with some problems, mainly attributed to the complexities in learning how to master drone flights, collision avoidance and enterprise security. Additional challenges, such as the analysis of flight data recorded by sensors attached to the drone may take time and require expert help to analyse and understand. This paper presents an autonomous drone control system using a chatbot. The system allows for easy control of drones using conversations with the aid of Natural Language Processing, thus to reduce the workload needed to set up, deploy, control, and monitor drone flight missions. The results obtained at the end of the study revealed that the drone connected to the chatbot was able to initiate flight missions with just text and voice commands, enable conversation and give real-time feedback from data and requests made to the chatbot. The results further revealed that the system was able to process natural language and produced human-like conversational abilities using Artificial Intelligence (Natural Language Understanding). It is recommended that radio signal adapters be used instead of wireless connections thus to increase the range of communication with the aerial vehicle.

Keywords: artificial ntelligence, chatbot, natural language processing, unmanned aerial vehicle

Procedia PDF Downloads 18
8870 Learning Grammars for Detection of Disaster-Related Micro Events

Authors: Josef Steinberger, Vanni Zavarella, Hristo Tanev

Abstract:

Natural disasters cause tens of thousands of victims and massive material damages. We refer to all those events caused by natural disasters, such as damage on people, infrastructure, vehicles, services and resource supply, as micro events. This paper addresses the problem of micro - event detection in online media sources. We present a natural language grammar learning algorithm and apply it to online news. The algorithm in question is based on distributional clustering and detection of word collocations. We also explore the extraction of micro-events from social media and describe a Twitter mining robot, who uses combinations of keywords to detect tweets which talk about effects of disasters.

Keywords: online news, natural language processing, machine learning, event extraction, crisis computing, disaster effects, Twitter

Procedia PDF Downloads 372
8869 Development of Fake News Model Using Machine Learning through Natural Language Processing

Authors: Sajjad Ahmed, Knut Hinkelmann, Flavio Corradini

Abstract:

Fake news detection research is still in the early stage as this is a relatively new phenomenon in the interest raised by society. Machine learning helps to solve complex problems and to build AI systems nowadays and especially in those cases where we have tacit knowledge or the knowledge that is not known. We used machine learning algorithms and for identification of fake news; we applied three classifiers; Passive Aggressive, Naïve Bayes, and Support Vector Machine. Simple classification is not completely correct in fake news detection because classification methods are not specialized for fake news. With the integration of machine learning and text-based processing, we can detect fake news and build classifiers that can classify the news data. Text classification mainly focuses on extracting various features of text and after that incorporating those features into classification. The big challenge in this area is the lack of an efficient way to differentiate between fake and non-fake due to the unavailability of corpora. We applied three different machine learning classifiers on two publicly available datasets. Experimental analysis based on the existing dataset indicates a very encouraging and improved performance.

Keywords: fake news detection, natural language processing, machine learning, classification techniques.

Procedia PDF Downloads 21
8868 Morphological Processing of Punjabi Text for Sentiment Analysis of Farmer Suicides

Authors: Jaspreet Singh, Gurvinder Singh, Prabhsimran Singh, Rajinder Singh, Prithvipal Singh, Karanjeet Singh Kahlon, Ravinder Singh Sawhney

Abstract:

Morphological evaluation of Indian languages is one of the burgeoning fields in the area of Natural Language Processing (NLP). The evaluation of a language is an eminent task in the era of information retrieval and text mining. The extraction and classification of knowledge from text can be exploited for sentiment analysis and morphological evaluation. This study coalesce morphological evaluation and sentiment analysis for the task of classification of farmer suicide cases reported in Punjab state of India. The pre-processing of Punjabi text involves morphological evaluation and normalization of Punjabi word tokens followed by the training of proposed model using deep learning classification on Punjabi language text extracted from online Punjabi news reports. The class-wise accuracies of sentiment prediction for four negatively oriented classes of farmer suicide cases are 93.85%, 88.53%, 83.3%, and 95.45% respectively. The overall accuracy of sentiment classification obtained using proposed framework on 275 Punjabi text documents is found to be 90.29%.

Keywords: deep neural network, farmer suicides, morphological processing, punjabi text, sentiment analysis

Procedia PDF Downloads 72
8867 Composite Kernels for Public Emotion Recognition from Twitter

Authors: Chien-Hung Chen, Yan-Chun Hsing, Yung-Chun Chang

Abstract:

The Internet has grown into a powerful medium for information dispersion and social interaction that leads to a rapid growth of social media which allows users to easily post their emotions and perspectives regarding certain topics online. Our research aims at using natural language processing and text mining techniques to explore the public emotions expressed on Twitter by analyzing the sentiment behind tweets. In this paper, we propose a composite kernel method that integrates tree kernel with the linear kernel to simultaneously exploit both the tree representation and the distributed emotion keyword representation to analyze the syntactic and content information in tweets. The experiment results demonstrate that our method can effectively detect public emotion of tweets while outperforming the other compared methods.

Keywords: emotion recognition, natural language processing, composite kernel, sentiment analysis, text mining

Procedia PDF Downloads 44
8866 A Comparative Analysis of Hyper-Parameters Using Neural Networks for E-Mail Spam Detection

Authors: Syed Mahbubuz Zaman, A. B. M. Abrar Haque, Mehedi Hassan Nayeem, Misbah Uddin Sagor

Abstract:

Everyday e-mails are being used by millions of people as an effective form of communication over the Internet. Although e-mails allow high-speed communication, there is a constant threat known as spam. Spam e-mail is often called junk e-mails which are unsolicited and sent in bulk. These unsolicited emails cause security concerns among internet users because they are being exposed to inappropriate content. There is no guaranteed way to stop spammers who use static filters as they are bypassed very easily. In this paper, a smart system is proposed that will be using neural networks to approach spam in a different way, and meanwhile, this will also detect the most relevant features that will help to design the spam filter. Also, a comparison of different parameters for different neural network models has been shown to determine which model works best within suitable parameters.

Keywords: long short-term memory, bidirectional long short-term memory, gated recurrent unit, natural language processing, natural language processing

Procedia PDF Downloads 35
8865 Emotional Analysis for Text Search Queries on Internet

Authors: Gemma García López

Abstract:

The goal of this study is to analyze if search queries carried out in search engines such as Google, can offer emotional information about the user that performs them. Knowing the emotional state in which the Internet user is located can be a key to achieve the maximum personalization of content and the detection of worrying behaviors. For this, two studies were carried out using tools with advanced natural language processing techniques. The first study determines if a query can be classified as positive, negative or neutral, while the second study extracts emotional content from words and applies the categorical and dimensional models for the representation of emotions. In addition, we use search queries in Spanish and English to establish similarities and differences between two languages. The results revealed that text search queries performed by users on the Internet can be classified emotionally. This allows us to better understand the emotional state of the user at the time of the search, which could involve adapting the technology and personalizing the responses to different emotional states.

Keywords: emotion classification, text search queries, emotional analysis, sentiment analysis in text, natural language processing

Procedia PDF Downloads 43
8864 Preparation on Sentimental Analysis on Social Media Comments with Bidirectional Long Short-Term Memory Gated Recurrent Unit and Model Glove in Portuguese

Authors: Leonardo Alfredo Mendoza, Cristian Munoz, Marco Aurelio Pacheco, Manoela Kohler, Evelyn Batista, Rodrigo Moura

Abstract:

Natural Language Processing (NLP) techniques are increasingly more powerful to be able to interpret the feelings and reactions of a person to a product or service. Sentiment analysis has become a fundamental tool for this interpretation but has few applications in languages other than English. This paper presents a classification of sentiment analysis in Portuguese with a base of comments from social networks in Portuguese. A word embedding's representation was used with a 50-Dimension GloVe pre-trained model, generated through a corpus completely in Portuguese. To generate this classification, the bidirectional long short-term memory and bidirectional Gated Recurrent Unit (GRU) models are used, reaching results of 99.1%.

Keywords: natural processing language, sentiment analysis, bidirectional long short-term memory, BI-LSTM, gated recurrent unit, GRU

Procedia PDF Downloads 20
8863 Learner's Difficulties Acquiring English: The Case of Native Speakers of Rio de La Plata Spanish Towards Justifying the Need for Corpora

Authors: Maria Zinnia Bardas Hoffmann

Abstract:

Contrastive Analysis (CA) is the systematic comparison between two languages. It stems from the notion that errors are caused by interference of the L1 system in the acquisition process of an L2. CA represents a useful tool to understand the nature of learning and acquisition. Also, this particular method promises a path to un-derstand the nature of underlying cognitive processes, even when other factors such as intrinsic motivation and teaching strategies were found to best explain student’s problems in acquisition. CA study is justified not only from the need to get a deeper understanding of the nature of SLA, but as an invaluable source to provide clues, at a cognitive level, for those general processes involved in rule formation and abstract thought. It is relevant for cross disciplinary studies and the fields of Computational Thought, Natural Language processing, Applied Linguistics, Cognitive Linguistics and Math Theory. That being said, this paper intends to address here as well its own set of constraints and limitations. Finally, this paper: (a) aims at identifying some of the difficulties students may find in their learning process due to the nature of their specific variety of L1, Rio de la Plata Spanish (RPS), (b) represents an attempt to discuss the necessity for specific models to approach CA.

Keywords: second language acquisition, applied linguistics, contrastive analysis, applied contrastive analysis English language department, meta-linguistic rules, cross-linguistics studies, computational thought, natural language processing

Procedia PDF Downloads 28
8862 Unsupervised Part-of-Speech Tagging for Amharic Using K-Means Clustering

Authors: Zelalem Fantahun

Abstract:

Part-of-speech tagging is the process of assigning a part-of-speech or other lexical class marker to each word into naturally occurring text. Part-of-speech tagging is the most fundamental and basic task almost in all natural language processing. In natural language processing, the problem of providing large amount of manually annotated data is a knowledge acquisition bottleneck. Since, Amharic is one of under-resourced language, the availability of tagged corpus is the bottleneck problem for natural language processing especially for POS tagging. A promising direction to tackle this problem is to provide a system that does not require manually tagged data. In unsupervised learning, the learner is not provided with classifications. Unsupervised algorithms seek out similarity between pieces of data in order to determine whether they can be characterized as forming a group. This paper explicates the development of unsupervised part-of-speech tagger using K-Means clustering for Amharic language since large amount of data is produced in day-to-day activities. In the development of the tagger, the following procedures are followed. First, the unlabeled data (raw text) is divided into 10 folds and tokenization phase takes place; at this level, the raw text is chunked at sentence level and then into words. The second phase is feature extraction which includes word frequency, syntactic and morphological features of a word. The third phase is clustering. Among different clustering algorithms, K-means is selected and implemented in this study that brings group of similar words together. The fourth phase is mapping, which deals with looking at each cluster carefully and the most common tag is assigned to a group. This study finds out two features that are capable of distinguishing one part-of-speech from others these are morphological feature and positional information and show that it is possible to use unsupervised learning for Amharic POS tagging. In order to increase performance of the unsupervised part-of-speech tagger, there is a need to incorporate other features that are not included in this study, such as semantic related information. Finally, based on experimental result, the performance of the system achieves a maximum of 81% accuracy.

Keywords: POS tagging, Amharic, unsupervised learning, k-means

Procedia PDF Downloads 281
8861 The Output Fallacy: An Investigation into Input, Noticing, and Learners’ Mechanisms

Authors: Samantha Rix

Abstract:

The purpose of this research paper is to investigate the cognitive processing of learners who receive input but produce very little or no output, and who, when they do produce output, exhibit a similar language proficiency as do those learners who produced output more regularly in the language classroom. Previous studies have investigated the benefits of output (with somewhat differing results); therefore, the presentation will begin with an investigation of what may underlie gains in proficiency without output. Consequently, a pilot study was designed and conducted to gain insight into the cognitive processing of low-output language learners looking, for example, at quantity and quality of noticing. This will be carried out within the paradigm of action classroom research, observing and interviewing low-output language learners in an intensive English program at a small Midwest university. The results of the pilot study indicated that autonomy in language learning, specifically utilizing strategies such self-monitoring, self-talk, and thinking 'out-loud', were crucial in the development of language proficiency for academic-level performance. The presentation concludes with an examination of pedagogical implication for classroom use in order to aide students in their language development.

Keywords: cognitive processing, language learners, language proficiency, learning strategies

Procedia PDF Downloads 351