Search results for: word processing
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 4211

Search results for: word processing

4211 A Supervised Approach for Word Sense Disambiguation Based on Arabic Diacritics

Authors: Alaa Alrakaf, Sk. Md. Mizanur Rahman

Abstract:

Since the last two decades’ Arabic natural language processing (ANLP) has become increasingly much more important. One of the key issues related to ANLP is ambiguity. In Arabic language different pronunciation of one word may have a different meaning. Furthermore, ambiguity also has an impact on the effectiveness and efficiency of Machine Translation (MT). The issue of ambiguity has limited the usefulness and accuracy of the translation from Arabic to English. The lack of Arabic resources makes ambiguity problem more complicated. Additionally, the orthographic level of representation cannot specify the exact meaning of the word. This paper looked at the diacritics of Arabic language and used them to disambiguate a word. The proposed approach of word sense disambiguation used Diacritizer application to Diacritize Arabic text then found the most accurate sense of an ambiguous word using Naïve Bayes Classifier. Our Experimental study proves that using Arabic Diacritics with Naïve Bayes Classifier enhances the accuracy of choosing the appropriate sense by 23% and also decreases the ambiguity in machine translation.

Keywords: Arabic natural language processing, machine learning, machine translation, Naive bayes classifier, word sense disambiguation

Procedia PDF Downloads 323
4210 A Word-to-Vector Formulation for Word Representation

Authors: Sandra Rizkallah, Amir F. Atiya

Abstract:

This work presents a novel word to vector representation that is based on embedding the words into a sphere, whereby the dot product of the corresponding vectors represents the similarity between any two words. Embedding the vectors into a sphere enabled us to take into consideration the antonymity between words, not only the synonymity, because of the suitability to handle the polarity nature of words. For example, a word and its antonym can be represented as a vector and its negative. Moreover, we have managed to extract an adequate vocabulary. The obtained results show that the proposed approach can capture the essence of the language, and can be generalized to estimate a correct similarity of any new pair of words.

Keywords: natural language processing, word to vector, text similarity, text mining

Procedia PDF Downloads 235
4209 Genomic Sequence Representation Learning: An Analysis of K-Mer Vector Embedding Dimensionality

Authors: James Jr. Mashiyane, Risuna Nkolele, Stephanie J. Müller, Gciniwe S. Dlamini, Rebone L. Meraba, Darlington S. Mapiye

Abstract:

When performing language tasks in natural language processing (NLP), the dimensionality of word embeddings is chosen either ad-hoc or is calculated by optimizing the Pairwise Inner Product (PIP) loss. The PIP loss is a metric that measures the dissimilarity between word embeddings, and it is obtained through matrix perturbation theory by utilizing the unitary invariance of word embeddings. Unlike in natural language, in genomics, especially in genome sequence processing, unlike in natural language processing, there is no notion of a “word,” but rather, there are sequence substrings of length k called k-mers. K-mers sizes matter, and they vary depending on the goal of the task at hand. The dimensionality of word embeddings in NLP has been studied using the matrix perturbation theory and the PIP loss. In this paper, the sufficiency and reliability of applying word-embedding algorithms to various genomic sequence datasets are investigated to understand the relationship between the k-mer size and their embedding dimension. This is completed by studying the scaling capability of three embedding algorithms, namely Latent Semantic analysis (LSA), Word2Vec, and Global Vectors (GloVe), with respect to the k-mer size. Utilising the PIP loss as a metric to train embeddings on different datasets, we also show that Word2Vec outperforms LSA and GloVe in accurate computing embeddings as both the k-mer size and vocabulary increase. Finally, the shortcomings of natural language processing embedding algorithms in performing genomic tasks are discussed.

Keywords: word embeddings, k-mer embedding, dimensionality reduction

Procedia PDF Downloads 87
4208 Resume Ranking Using Custom Word2vec and Rule-Based Natural Language Processing Techniques

Authors: Subodh Chandra Shakya, Rajendra Sapkota, Aakash Tamang, Shushant Pudasaini, Sujan Adhikari, Sajjan Adhikari

Abstract:

Lots of efforts have been made in order to measure the semantic similarity between the text corpora in the documents. Techniques have been evolved to measure the similarity of two documents. One such state-of-art technique in the field of Natural Language Processing (NLP) is word to vector models, which converts the words into their word-embedding and measures the similarity between the vectors. We found this to be quite useful for the task of resume ranking. So, this research paper is the implementation of the word2vec model along with other Natural Language Processing techniques in order to rank the resumes for the particular job description so as to automate the process of hiring. The research paper proposes the system and the findings that were made during the process of building the system.

Keywords: chunking, document similarity, information extraction, natural language processing, word2vec, word embedding

Procedia PDF Downloads 123
4207 A Review of Research on Pre-training Technology for Natural Language Processing

Authors: Moquan Gong

Abstract:

In recent years, with the rapid development of deep learning, pre-training technology for natural language processing has made great progress. The early field of natural language processing has long used word vector methods such as Word2Vec to encode text. These word vector methods can also be regarded as static pre-training techniques. However, this context-free text representation brings very limited improvement to subsequent natural language processing tasks and cannot solve the problem of word polysemy. ELMo proposes a context-sensitive text representation method that can effectively handle polysemy problems. Since then, pre-training language models such as GPT and BERT have been proposed one after another. Among them, the BERT model has significantly improved its performance on many typical downstream tasks, greatly promoting the technological development in the field of natural language processing, and has since entered the field of natural language processing. The era of dynamic pre-training technology. Since then, a large number of pre-trained language models based on BERT and XLNet have continued to emerge, and pre-training technology has become an indispensable mainstream technology in the field of natural language processing. This article first gives an overview of pre-training technology and its development history, and introduces in detail the classic pre-training technology in the field of natural language processing, including early static pre-training technology and classic dynamic pre-training technology; and then briefly sorts out a series of enlightening technologies. Pre-training technology, including improved models based on BERT and XLNet; on this basis, analyze the problems faced by current pre-training technology research; finally, look forward to the future development trend of pre-training technology.

Keywords: natural language processing, pre-training, language model, word vectors

Procedia PDF Downloads 12
4206 The Patterns of Cross-Sentence: An Event-Related Potential Study of Mathematical Word Problem

Authors: Tien-Ching Yao, Ching-Ching Lu

Abstract:

Understanding human language processing is one of the main challenges of current cognitive neuroscience. The aims of the present study were to use a sentence decision task combined with event-related potentials to investigate the psychological reality of "cross-sentence patterns." Therefore, we take the math word problems the experimental materials and use the ERPs' P600 component to verify. In this study, the experimental material consisted of 200 math word problems with three different conditions were used ( multiplication word problems、division word problems type 1、division word problems type 2 ). Eighteen Mandarin native speakers participated in the ERPs study (14 of whom were female). The result of the grand average waveforms suggests a later posterior positivity at around 500ms - 900ms. These findings were tested statistically using repeated measures ANOVAs at the component caused by the stimulus type of different questions. Results suggest that three conditions present significant (P < 0.05) on the Mean Amplitude, Latency, and Peak Amplitude. The result showed the characteristic timing and posterior scalp distribution of a P600 effect. We interpreted these characteristic responses as the psychological reality of "cross-sentence patterns." These results provide insights into the sentence processing issues in linguistic theory and psycholinguistic models of language processing and advance our understanding of how people make sense of information during language comprehension.

Keywords: language processing, sentence comprehension, event-related potentials, cross-sentence patterns

Procedia PDF Downloads 117
4205 Probing Syntax Information in Word Representations with Deep Metric Learning

Authors: Bowen Ding, Yihao Kuang

Abstract:

In recent years, with the development of large-scale pre-trained lan-guage models, building vector representations of text through deep neural network models has become a standard practice for natural language processing tasks. From the performance on downstream tasks, we can know that the text representation constructed by these models contains linguistic information, but its encoding mode and extent are unclear. In this work, a structural probe is proposed to detect whether the vector representation produced by a deep neural network is embedded with a syntax tree. The probe is trained with the deep metric learning method, so that the distance between word vectors in the metric space it defines encodes the distance of words on the syntax tree, and the norm of word vectors encodes the depth of words on the syntax tree. The experiment results on ELMo and BERT show that the syntax tree is encoded in their parameters and the word representations they produce.

Keywords: deep metric learning, syntax tree probing, natural language processing, word representations

Procedia PDF Downloads 27
4204 Network Word Discovery Framework Based on Sentence Semantic Vector Similarity

Authors: Ganfeng Yu, Yuefeng Ma, Shanliang Yang

Abstract:

The word discovery is a key problem in text information retrieval technology. Methods in new word discovery tend to be closely related to words because they generally obtain new word results by analyzing words. With the popularity of social networks, individual netizens and online self-media have generated various network texts for the convenience of online life, including network words that are far from standard Chinese expression. How detect network words is one of the important goals in the field of text information retrieval today. In this paper, we integrate the word embedding model and clustering methods to propose a network word discovery framework based on sentence semantic similarity (S³-NWD) to detect network words effectively from the corpus. This framework constructs sentence semantic vectors through a distributed representation model, uses the similarity of sentence semantic vectors to determine the semantic relationship between sentences, and finally realizes network word discovery by the meaning of semantic replacement between sentences. The experiment verifies that the framework not only completes the rapid discovery of network words but also realizes the standard word meaning of the discovery of network words, which reflects the effectiveness of our work.

Keywords: text information retrieval, natural language processing, new word discovery, information extraction

Procedia PDF Downloads 56
4203 The Implementation of Word Study Wall in an Online English Word Memorization Class

Authors: Yidan Shao

Abstract:

With the advancement of the economy, technology promotes online teaching, and learning has become one of the common features in the educational field. Meanwhile, the dramatic expansion of the online environment provides opportunities for more learners, including second language learners. A greater command of vocabulary improves students’ learning capacity, and word acquisition and development play a critical role in learning. Furthermore, the Word Wall is an effective tool to improve students’ knowledge of words, which works for a wide range of age groups. Therefore, this study is going to use the Word Wall as an intervention to examine whether it can bring some memorization changes in an online English language class for a second language learner based on the word morphology method. The participant will take ten courses in the experiment as it plans. The findings show that the Word Wall activity plays a slight role in improving word memorizing, but it does affect instant memorization. If longer periods and more comprehensive designs of research can be applied, it is expected to have more value.

Keywords: second language acquisition, word morphology, word memorization, the Word Wall

Procedia PDF Downloads 74
4202 Contextual SenSe Model: Word Sense Disambiguation using Sense and Sense Value of Context Surrounding the Target

Authors: Vishal Raj, Noorhan Abbas

Abstract:

Ambiguity in NLP (Natural language processing) refers to the ability of a word, phrase, sentence, or text to have multiple meanings. This results in various kinds of ambiguities such as lexical, syntactic, semantic, anaphoric and referential am-biguities. This study is focused mainly on solving the issue of Lexical ambiguity. Word Sense Disambiguation (WSD) is an NLP technique that aims to resolve lexical ambiguity by determining the correct meaning of a word within a given context. Most WSD solutions rely on words for training and testing, but we have used lemma and Part of Speech (POS) tokens of words for training and testing. Lemma adds generality and POS adds properties of word into token. We have designed a novel method to create an affinity matrix to calculate the affinity be-tween any pair of lemma_POS (a token where lemma and POS of word are joined by underscore) of given training set. Additionally, we have devised an al-gorithm to create the sense clusters of tokens using affinity matrix under hierar-chy of POS of lemma. Furthermore, three different mechanisms to predict the sense of target word using the affinity/similarity value are devised. Each contex-tual token contributes to the sense of target word with some value and whichever sense gets higher value becomes the sense of target word. So, contextual tokens play a key role in creating sense clusters and predicting the sense of target word, hence, the model is named Contextual SenSe Model (CSM). CSM exhibits a noteworthy simplicity and explication lucidity in contrast to contemporary deep learning models characterized by intricacy, time-intensive processes, and chal-lenging explication. CSM is trained on SemCor training data and evaluated on SemEval test dataset. The results indicate that despite the naivety of the method, it achieves promising results when compared to the Most Frequent Sense (MFS) model.

Keywords: word sense disambiguation (wsd), contextual sense model (csm), most frequent sense (mfs), part of speech (pos), natural language processing (nlp), oov (out of vocabulary), lemma_pos (a token where lemma and pos of word are joined by underscore), information retrieval (ir), machine translation (mt)

Procedia PDF Downloads 59
4201 Word of Mouth and Its Impact on Marketing

Authors: Fatima Naz, Ayesha Tariq

Abstract:

In view of growing of the internet users for e-commerce and taking into account, the emergent impact of word of mouth phenomenon this research has different aims. The aims of this study were built following dissimilar discussion with teachers and colleagues enlightening that word of mouth information for online purchasing do not have the same effect for everybody. Then they were born following dissimilar researchers together with what was already done in previous researches and what was completed. As a result different aims were drawn; the initial aim of this research is to study the attention of the customers in the word of mouth to power their online purchasing activities. The next aim is to analyze the people influenced by the interest of word of mouth. The following aim is to examine the marketing behavior bearing in mind the internet progress and word of mouth, their consideration for word of mouth marketing. In the form of research questions the aims of the study are: 1) How community utilizes and multiplies word of mouth information about online purchasing experience? 2) How communities perceive the word of mouth marketing? 3) How marketers take the word of mouth phenomenon and how they handle it?

Keywords: belief, power, inspiration, self-expression, positive attitude to online marketing, forwarding of contents, purchasing decision, standard marketing

Procedia PDF Downloads 388
4200 Affective Transparency in Compound Word Processing

Authors: Jordan Gallant

Abstract:

In the compound word processing literature, much attention has been paid to the relationship between a compound’s denotational meaning and that of its morphological whole-word constituents, which is referred to as ‘semantic transparency’. However, the parallel relationship between a compound’s connotation and that of its constituents has not been addressed at all. For instance, while a compound like ‘painkiller’ might be semantically transparent, it is not ‘affectively transparent’. That is, both constituents have primarily negative connotations, while the whole compound has a positive one. This paper investigates the role of affective transparency on compound processing using two methodologies commonly employed in this field: a lexical decision task and a typing task. The critical stimuli used were 112 English bi-constituent compounds that differed in terms of the effective transparency of their constituents. Of these, 36 stimuli contained constituents with similar connotations to the compound (e.g., ‘dreamland’), 36 contained constituents with more positive connotations (e.g. ‘bedpan’), and 36 contained constituents with more negative connotations (e.g. ‘painkiller’). Connotation of whole-word constituents and compounds were operationalized via valence ratings taken from an off-line ratings database. In Experiment 1, compound stimuli and matched non-word controls were presented visually to participants, who were then asked to indicate whether it was a real word in English. Response times and accuracy were recorded. In Experiment 2, participants typed compound stimuli presented to them visually. Individual keystroke response times and typing accuracy were recorded. The results of both experiments provided positive evidence that compound processing is influenced by effective transparency. In Experiment 1, compounds in which both constituents had more negative connotations than the compound itself were responded to significantly more slowly than compounds in which the constituents had similar or more positive connotations. Typed responses from Experiment 2 showed that inter-keystroke intervals at the morphological constituent boundary were significantly longer when the connotation of the head constituent was either more positive or more negative than that of the compound. The interpretation of this finding is discussed in the context of previous compound typing research. Taken together, these findings suggest that affective transparency plays a role in the recognition, storage, and production of English compound words. This study provides a promising first step in a new direction for research on compound words.

Keywords: compound processing, semantic transparency, typed production, valence

Procedia PDF Downloads 93
4199 The Influence of Concreteness on English Compound Noun Processing: Modulation of Constituent Transparency

Authors: Turgut Coskun

Abstract:

'Concreteness effect' refers to faster processing of concrete words and 'compound facilitation' refers to faster response to compounds. In this study, our main goal was to investigate the interaction between compound facilitation and concreteness effect. The latter might modulate compound processing basing on constituents’ transparency patterns. To evaluate these, we created lists for compound and monomorphemic words, sub-categorized them into concrete and abstract words, and further sub-categorized them basing on their transparency. The transparency conditions were opaque-opaque (OO), transparent-opaque (TO), and transparent-transparent (TT). We used RT data from English Lexicon Project (ELP) for our comparisons. The results showed the importance of concreteness factor (facilitation) in both compound and monomorphemic processing. Important for our present concern, separate concrete and abstract compound analyses revealed different patterns for OO, TO, and TT compounds. Concrete TT and TO conditions were processed faster than Concrete OO, Abstract OO and Abstract TT compounds, however, they weren’t processed faster than Abstract TO compounds. These results may reflect on different representation patterns of concrete and abstract compounds.

Keywords: abstract word, compound representation, concrete word, constituent transparency, processing speed

Procedia PDF Downloads 156
4198 TransDrift: Modeling Word-Embedding Drift Using Transformer

Authors: Nishtha Madaan, Prateek Chaudhury, Nishant Kumar, Srikanta Bedathur

Abstract:

In modern NLP applications, word embeddings are a crucial backbone that can be readily shared across a number of tasks. However, as the text distributions change and word semantics evolve over time, the downstream applications using the embeddings can suffer if the word representations do not conform to the data drift. Thus, maintaining word embeddings to be consistent with the underlying data distribution is a key problem. In this work, we tackle this problem and propose TransDrift, a transformer-based prediction model for word embeddings. Leveraging the flexibility of the transformer, our model accurately learns the dynamics of the embedding drift and predicts future embedding. In experiments, we compare with existing methods and show that our model makes significantly more accurate predictions of the word embedding than the baselines. Crucially, by applying the predicted embeddings as a backbone for downstream classification tasks, we show that our embeddings lead to superior performance compared to the previous methods.

Keywords: NLP applications, transformers, Word2vec, drift, word embeddings

Procedia PDF Downloads 53
4197 BiLex-Kids: A Bilingual Word Database for Children 5-13 Years Old

Authors: Aris R. Terzopoulos, Georgia Z. Niolaki, Lynne G. Duncan, Mark A. J. Wilson, Antonios Kyparissiadis, Jackie Masterson

Abstract:

As word databases for bilingual children are not available, researchers, educators and textbook writers must rely on monolingual databases. The aim of this study is thus to develop a bilingual word database, BiLex-kids, an online open access developmental word database for 5-13 year old bilingual children who learn Greek as a second language and have English as their dominant one. BiLex-kids is compiled from 120 Greek textbooks used in Greek-English bilingual education in the UK, USA and Australia, and provides word translations in the two languages, pronunciations in Greek, and psycholinguistic variables (e.g. Zipf, Frequency per million, Dispersion, Contextual Diversity, Neighbourhood size). After clearing the textbooks of non-relevant items (e.g. punctuation), algorithms were applied to extract the psycholinguistic indices for all words. As well as one total lexicon, the database produces values for all ages (one lexicon for each age) and for three age bands (one lexicon per age band: 5-8, 9-11, 12-13 years). BiLex-kids provides researchers with accurate figures for a wide range of psycholinguistic variables, making it a useful and reliable research tool for selecting stimuli to examine lexical processing among bilingual children. In addition, it offers children the opportunity to study word spelling, learn translations and listen to pronunciations in their second language. It further benefits educators in selecting age-appropriate words for teaching reading and spelling, while special educational needs teachers will have a resource to control the content of word lists when designing interventions for bilinguals with literacy difficulties.

Keywords: bilingual children, psycholinguistics, vocabulary development, word databases

Procedia PDF Downloads 279
4196 Reading Comprehension in Profound Deaf Readers

Authors: S. Raghibdoust, E. Kamari

Abstract:

Research show that reduced functional hearing has a detrimental influence on the ability of an individual to establish proper phonological representations of words, since the phonological representations are claimed to mediate the conceptual processing of written words. Word processing efficiency is expected to decrease with a decrease in functional hearing. In other words, it is predicted that hearing individuals would be more capable of word processing than individuals with hearing loss, as their functional hearing works normally. Studies also demonstrate that the quality of the functional hearing affects reading comprehension via its effect on their word processing skills. In other words, better hearing facilitates the development of phonological knowledge, and can promote enhanced strategies for the recognition of written words, which in turn positively affect higher-order processes underlying reading comprehension. The aims of this study were to investigate and compare the effect of deafness on the participants’ abilities to process written words at the lexical and sentence levels through using two online and one offline reading comprehension tests. The performance of a group of 8 deaf male students (ages 8-12) was compared with that of a control group of normal hearing male students. All the participants had normal IQ and visual status, and came from an average socioeconomic background. None were diagnosed with a particular learning or motor disability. The language spoken in the homes of all participants was Persian. Two tests of word processing were developed and presented to the participants using OpenSesame software, in order to measure the speed and accuracy of their performance at the two perceptual and conceptual levels. In the third offline test of reading comprehension which comprised of semantically plausible and semantically implausible subject relative clauses, the participants had to select the correct answer out of two choices. The data derived from the statistical analysis using SPSS software indicated that hearing and deaf participants had a similar word processing performance both in terms of speed and accuracy of their responses. The results also showed that there was no significant difference between the performance of the deaf and hearing participants in comprehending semantically plausible sentences (p > 0/05). However, a significant difference between the performances of the two groups was observed with respect to their comprehension of semantically implausible sentences (p < 0/05). In sum, the findings revealed that the seriously impoverished sentence reading ability characterizing the profound deaf subjects of the present research, exhibited their reliance on reading strategies that are based on insufficient or deviant structural knowledge, in particular in processing semantically implausible sentences, rather than a failure to efficiently process written words at the lexical level. This conclusion, of course, does not mean to say that deaf individuals may never experience deficits at the word processing level, deficits that impede their understanding of written texts. However, as stated in previous researches, it sounds reasonable to assume that the more deaf individuals get familiar with written words, the better they can recognize them, despite having a profound phonological weakness.

Keywords: deafness, reading comprehension, reading strategy, word processing, subject and object relative sentences

Procedia PDF Downloads 300
4195 Memory Retrieval and Implicit Prosody during Reading: Anaphora Resolution by L1 and L2 Speakers of English

Authors: Duong Thuy Nguyen, Giulia Bencini

Abstract:

The present study examined structural and prosodic factors on the computation of antecedent-reflexive relationships and sentence comprehension in native English (L1) and Vietnamese-English bilinguals (L2). Participants read sentences presented on the computer screen in one of three presentation formats aimed at manipulating prosodic parsing: word-by-word (RSVP), phrase-segment (self-paced), or whole-sentence (self-paced), then completed a grammaticality rating and a comprehension task (following Pratt & Fernandez, 2016). The design crossed three factors: syntactic structure (simple; complex), grammaticality (target-match; target-mismatch) and presentation format. An example item is provided in (1): (1) The actress that (Mary/John) interviewed at the awards ceremony (about two years ago/organized outside the theater) described (herself/himself) as an extreme workaholic). Results showed that overall, both L1 and L2 speakers made use of a good-enough processing strategy at the expense of more detailed syntactic analyses. L1 and L2 speakers’ comprehension and grammaticality judgements were negatively affected by the most prosodically disrupting condition (word-by-word). However, the two groups demonstrated differences in their performance in the other two reading conditions. For L1 speakers, the whole-sentence and the phrase-segment formats were both facilitative in the grammaticality rating and comprehension tasks; for L2, compared with the whole-sentence condition, the phrase-segment paradigm did not significantly improve accuracy or comprehension. These findings are consistent with the findings of Pratt & Fernandez (2016), who found a similar pattern of results in the processing of subject-verb agreement relations using the same experimental paradigm and prosodic manipulation with English L1 and L2 English-Spanish speakers. The results provide further support for a Good-Enough cue model of sentence processing that integrates cue-based retrieval and implicit prosodic parsing (Pratt & Fernandez, 2016) and highlights similarities and differences between L1 and L2 sentence processing and comprehension.

Keywords: anaphora resolution, bilingualism, implicit prosody, sentence processing

Procedia PDF Downloads 111
4194 The Grammatical Dictionary Compiler: A System for Kartvelian Languages

Authors: Liana Lortkipanidze, Nino Amirezashvili, Nino Javashvili

Abstract:

The purpose of the grammatical dictionary is to provide information on the morphological and syntactic characteristics of the basic word in the dictionary entry. The electronic grammatical dictionaries are used as a tool of automated morphological analysis for texts processing. The Georgian Grammatical Dictionary should contain grammatical information for each word: part of speech, type of declension/conjugation, grammatical forms of the word (paradigm), alternative variants of basic word/lemma. In this paper, we present the system for compiling the Georgian Grammatical Dictionary automatically. We propose dictionary-based methods for extending grammatical lexicons. The input lexicon contains only a few number of words with identical grammatical features. The extension is based on similarity measures between features of words; more precisely, we add words to the extended lexicons, which are similar to those, which are already in the grammatical dictionary. Our dictionaries are corpora-based, and for the compiling, we introduce the method for lemmatization of unknown words, i.e., words of which neither full form nor lemma is in the grammatical dictionary.

Keywords: acquisition of lexicon, Georgian grammatical dictionary, lemmatization rules, morphological processor

Procedia PDF Downloads 110
4193 Math Word Problems: Context and Achievement

Authors: Irena Smetackova

Abstract:

The important part of school mathematics are word problems which represent the connection between school knowledge and life reality. To find the reasons why students consider word problems to be difficult, it is necessary to take into consideration the motivational settings, besides mathematical knowledge and reading skills. Our goal is to identify whether the familiar or unfamiliar context of math word problem influences solving success rate and if so, whether the reasons are motivational or cognitive. For this purpose, we conducted three steps study in group of fifty pupils 9-10 years old. In the first step, we asked pupils to create ‘the best’ word problems for entered numerical formula. The set of 19 word problems with different contexts were selected. In the second step, pupils were asked to evaluate (without solving) how they like each item and how easy it is for them. The 6 word problems with low preference and low estimated success rate were selected and combined with other 6 problems with high preference and success rate. In the third step, the same pupils were asked to solve the word problems. The analysis showed that pupils attitudes and solving toward word problems varied by the context. The strong gender patterns both in preferred contexts and in estimated success rates were identified however the real success rate did not differ so strongly. The success gap between word problems with and without preferred contexts were stronger than the gap between problems with and without real experience with the context. The hypothesis that motivational factors are more important than cognitive factors was confirmed.

Keywords: mathematics, context of reality, motivation, cognition, word problems

Procedia PDF Downloads 165
4192 A Neural Approach for the Offline Recognition of the Arabic Handwritten Words of the Algerian Departments

Authors: Salim Ouchtati, Jean Sequeira, Mouldi Bedda

Abstract:

In this work we present an off line system for the recognition of the Arabic handwritten words of the Algerian departments. The study is based mainly on the evaluation of neural network performances, trained with the gradient back propagation algorithm. The used parameters to form the input vector of the neural network are extracted on the binary images of the handwritten word by several methods: the parameters of distribution, the moments centered of the different projections and the Barr features. It should be noted that these methods are applied on segments gotten after the division of the binary image of the word in six segments. The classification is achieved by a multi layers perceptron. Detailed experiments are carried and satisfactory recognition results are reported.

Keywords: handwritten word recognition, neural networks, image processing, pattern recognition, features extraction

Procedia PDF Downloads 478
4191 Sarcasm Recognition System Using Hybrid Tone-Word Spotting Audio Mining Technique

Authors: Sandhya Baskaran, Hari Kumar Nagabushanam

Abstract:

Sarcasm sentiment recognition is an area of natural language processing that is being probed into in the recent times. Even with the advancements in NLP, typical translations of words, sentences in its context fail to provide the exact information on a sentiment or emotion of a user. For example, if something bad happens, the statement ‘That's just what I need, great! Terrific!’ is expressed in a sarcastic tone which could be misread as a positive sign by any text-based analyzer. In this paper, we are presenting a unique real time ‘word with its tone’ spotting technique which would provide the sentiment analysis for a tone or pitch of a voice in combination with the words being expressed. This hybrid approach increases the probability for identification of special sentiment like sarcasm much closer to the real world than by mining text or speech individually. The system uses a tone analyzer such as YIN-FFT which extracts pitch segment-wise that would be used in parallel with a speech recognition system. The clustered data is classified for sentiments and sarcasm score for each of it determined. Our Simulations demonstrates the improvement in f-measure of around 12% compared to existing detection techniques with increased precision and recall.

Keywords: sarcasm recognition, tone-word spotting, natural language processing, pitch analyzer

Procedia PDF Downloads 258
4190 Secure Text Steganography for Microsoft Word Document

Authors: Khan Farhan Rafat, M. Junaid Hussain

Abstract:

Seamless modification of an entity for the purpose of hiding a message of significance inside its substance in a manner that the embedding remains oblivious to an observer is known as steganography. Together with today's pervasive registering frameworks, steganography has developed into a science that offers an assortment of strategies for stealth correspondence over the globe that must, however, need a critical appraisal from security breach standpoint. Microsoft Word is amongst the preferably used word processing software, which comes as a part of the Microsoft Office suite. With a user-friendly graphical interface, the richness of text editing, and formatting topographies, the documents produced through this software are also most suitable for stealth communication. This research aimed not only to epitomize the fundamental concepts of steganography but also to expound on the utilization of Microsoft Word document as a carrier for furtive message exchange. The exertion is to examine contemporary message hiding schemes from security aspect so as to present the explorative discoveries and suggest enhancements which may serve a wellspring of information to encourage such futuristic research endeavors.

Keywords: hiding information in plain sight, stealth communication, oblivious information exchange, conceal, steganography

Procedia PDF Downloads 206
4189 Resource Creation Using Natural Language Processing Techniques for Malay Translated Qur'an

Authors: Nor Diana Ahmad, Eric Atwell, Brandon Bennett

Abstract:

Text processing techniques for English have been developed for several decades. But for the Malay language, text processing methods are still far behind. Moreover, there are limited resources, tools for computational linguistic analysis available for the Malay language. Therefore, this research presents the use of natural language processing (NLP) in processing Malay translated Qur’an text. As the result, a new language resource for Malay translated Qur’an was created. This resource will help other researchers to build the necessary processing tools for the Malay language. This research also develops a simple question-answer prototype to demonstrate the use of the Malay Qur’an resource for text processing. This prototype has been developed using Python. The prototype pre-processes the Malay Qur’an and an input query using a stemming algorithm and then searches for occurrences of the query word stem. The result produced shows improved matching likelihood between user query and its answer. A POS-tagging algorithm has also been produced. The stemming and tagging algorithms can be used as tools for research related to other Malay texts and can be used to support applications such as information retrieval, question answering systems, ontology-based search and other text analysis tasks.

Keywords: language resource, Malay translated Qur'an, natural language processing (NLP), text processing

Procedia PDF Downloads 280
4188 Differences in the Processing of Sentences with Lexical Ambiguity and Structural Ambiguity: An Experimental Study

Authors: Mariana T. Teixeira, Joana P. Luz

Abstract:

This paper is based on assumptions of psycholinguistics and investigates the processing of ambiguous sentences in Brazilian Portuguese. Specifically, it aims to verify if there is a difference in processing time between sentences with lexical ambiguity and sentences with structural (or syntactic) ambiguity. We hypothesize, based on the Garden Path Theory, that the two types of ambiguity entail different cognitive efforts, since sentences with structural ambiguity require that two structures be processed, whereas ambiguous phrases whose root of ambiguity is in a word require the processing of a single structure, which admits a variation of punctual meaning, within the scope of only one lexical item. In order to test this hypothesis, 25 undergraduate students, whose average age was 27.66 years, native speakers of Brazilian Portuguese, performed a self-monitoring reading task of ambiguous sentences, which had lexical and structural ambiguity. The results suggest that unambiguous sentence processing is faster than ambiguous sentence processing, whether it has lexical or structural ambiguity. In addition, participants presented a mean reading time greater for sentences with syntactic ambiguity than for sentences with lexical ambiguity, evidencing a greater cognitive effort in sentence processing with structural ambiguity.

Keywords: Brazilian portuguese, lexical ambiguity, sentence processing, syntactic ambiguity

Procedia PDF Downloads 190
4187 Expressivity of Word-Formation in English and Russian Advertising Lexicon

Authors: Voronina Ekaterina Borisovna

Abstract:

The problem of expressivity of advertising lexicon is studied in the article. The comparison of English and Russian advertising lexicons is done. The objects of the analysis were English and Russian advertising texts, both printed advertising texts and texts extracted from the commercials. Some conclusions concerning the expressivity of advertising lexicon were made. Expressivity can be included in the semantic structure of words or created by word-formation means. Expressivity caused by morphological derivatives includes such facilities as derivational affixes, models and types of word formation.

Keywords: advertising lexicon, expressivity, word-formation means, linguistics

Procedia PDF Downloads 318
4186 Building Semantic-Relatedness Thai Word Ontology for Semantic Analysis

Authors: Gridaphat Sriharee

Abstract:

Building semantic-relatedness Thai word ontology can be implemented by considering word forms and word meaning. This research proposed the methodology for building the ontology, which can be used for semantic analysis. There are four categories of words: similar form and the same meaning, similar form and similar meaning, different form and opposite/same meaning, and different form and similar meaning, which will be used as initial words for building the proposed ontology. Extension of the ontology can be augmented by considering the messages that give the meaning of the word from the dictionaries. Exploiting WordNet to construct the proposed ontology was investigated and discussed. The proposed ontology was evaluated for its quality. With the proposed methodology, it is promising that the constructed ontology is a well-defined ontology.

Keywords: Thai, NLP, semantics, ontology

Procedia PDF Downloads 57
4185 Accounting as Addressed in the Qur’aan

Authors: Shahriar M. Saadullah, Abdul-Quddoos Abdul-Basith, Zaki K. Abushawish

Abstract:

As a part of academic research in Islamic Accounting it is important to know how the word Accounting is discussed in the Qur’aan. This paper identifies and analyzes the word Accounting in the Qur’aan, which is significant to know and understand. The paper uses a methodology of identifying the root word of Accounting Hasaba (حسب) in the Qur’aan with the help of Islam 360 software and analyzes the use of the relevant words derived from the root word. Then the paper attempts to connect the findings to the contemporary Accounting issues. The paper finds that the root word of Accounting Hasaba (حسب) appears in the Qur’aan 109 times but it is only used in the sense Account, Accountable, or Accounting 45 times. These words appear in 44 different verses in the Qur’aan, appearing twice in one of the verses. The paper divides these verses into 8 different themes namely, Day of Accounting, without any Accounting, Accounting of Time, Self-Accounting, Swift in Accounting, Accounting is only with God, Awareness and the Good Accounting, and Heedlessness and the Bad Accounting. The way the words Account, Accounting, and Accountable is discussed in the Qur’aan links to the contemporary accounting issues including Ethics, Agency Theory, and Internal Control. The links discovered in the paper clearly shows the timeless nature of the message of the Qur’aan.

Keywords: accounting, contemporary accounting issues, Qur'aan, root word of accounting hasaba

Procedia PDF Downloads 383
4184 Perceiving Casual Speech: A Gating Experiment with French Listeners of L2 English

Authors: Naouel Zoghlami

Abstract:

Spoken-word recognition involves the simultaneous activation of potential word candidates which compete with each other for final correct recognition. In continuous speech, the activation-competition process gets more complicated due to speech reductions existing at word boundaries. Lexical processing is more difficult in L2 than in L1 because L2 listeners often lack phonetic, lexico-semantic, syntactic, and prosodic knowledge in the target language. In this study, we investigate the on-line lexical segmentation hypotheses that French listeners of L2 English form and then revise as subsequent perceptual evidence is revealed. Our purpose is to shed further light on the processes of L2 spoken-word recognition in context and better understand L2 listening difficulties through a comparison of skilled and unskilled reactions at the point where their working hypothesis is rejected. We use a variant of the gating experiment in which subjects transcribe an English sentence presented in increments of progressively greater duration. The spoken sentence was “And this amazing athlete has just broken another world record”, chosen mainly because it included common reductions and phonetic features in English, such as elision and assimilation. Our preliminary results show that there is an important difference in the manner in which proficient and less-proficient L2 listeners handle connected speech. Less-proficient listeners delay recognition of words as they wait for lexical and syntactic evidence to appear in the gates. Further statistical results are currently being undertaken.

Keywords: gating paradigm, spoken word recognition, online lexical segmentation, L2 listening

Procedia PDF Downloads 436
4183 Pudhaiyal: A Maze-Based Treasure Hunt Game for Tamil Words

Authors: Aarthy Anandan, Anitha Narasimhan, Madhan Karky

Abstract:

Word-based games are popular in helping people to improve their vocabulary skills. Games like ‘word search’ and crosswords provide a smart way of increasing vocabulary skills. Word search games are fun to play, but also educational which actually helps to learn a language. Finding the words from word search puzzle helps the player to remember words in an easier way, and it also helps to learn the spellings of words. In this paper, we present a tile distribution algorithm for a Maze-Based Treasure Hunt Game 'Pudhaiyal’ for Tamil words, which describes how words can be distributed horizontally, vertically or diagonally in a 10 x 10 grid. Along with the tile distribution algorithm, we also present an algorithm for the scoring model of the game. The proposed game has been tested with 20,000 Tamil words.

Keywords: Pudhaiyal, Tamil word game, word search, scoring, maze, algorithm

Procedia PDF Downloads 407
4182 Effects of Global Validity of Predictive Cues upon L2 Discourse Comprehension: Evidence from Self-paced Reading

Authors: Binger Lu

Abstract:

It remains unclear whether second language (L2) speakers could use discourse context cues to predict upcoming information as native speakers do during online comprehension. Some researchers propose that L2 learners may have a reduced ability to generate predictions during discourse processing. At the same time, there is evidence that discourse-level cues are weighed more heavily in L2 processing than in L1. Previous studies showed that L1 prediction is sensitive to the global validity of predictive cues. The current study aims to explore whether and to what extent L2 learners can dynamically and strategically adjust their prediction in accord with the global validity of predictive cues in L2 discourse comprehension as native speakers do. In a self-paced reading experiment, Chinese native speakers (N=128), C-E bilinguals (N=128), and English native speakers (N=128) read high-predictable (e.g., Jimmy felt thirsty after running. He wanted to get some water from the refrigerator.) and low-predictable (e.g., Jimmy felt sick this morning. He wanted to get some water from the refrigerator.) discourses in two-sentence frames. The global validity of predictive cues was manipulated by varying the ratio of predictable (e.g., Bill stood at the door. He opened it with the key.) and unpredictable fillers (e.g., Bill stood at the door. He opened it with the card.), such that across conditions, the predictability of the final word of the fillers ranged from 100% to 0%. The dependent variable was reading time on the critical region (the target word and the following word), analyzed with linear mixed-effects models in R. C-E bilinguals showed reliable prediction across all validity conditions (β = -35.6 ms, SE = 7.74, t = -4.601, p< .001), and Chinese native speakers showed significant effect (β = -93.5 ms, SE = 7.82, t = -11.956, p< .001) in two of the four validity conditions (namely, the High-validity and MedLow conditions, where fillers ended with predictable words in 100% and 25% cases respectively), whereas English native speakers didn’t predict at all (β = -2.78 ms, SE = 7.60, t = -.365, p = .715). There was neither main effect (χ^²(3) = .256, p = .968) nor interaction (Predictability: Background: Validity, χ^²(3) = 1.229, p = .746; Predictability: Validity, χ^²(3) = 2.520, p = .472; Background: Validity, χ^²(3) = 1.281, p = .734) of Validity with speaker groups. The results suggest that prediction occurs in L2 discourse processing but to a much less extent in L1, witha significant effect in some conditions of L1 Chinese and anull effect in L1 English processing, consistent with the view that L2 speakers are more sensitive to discourse cues compared with L1 speakers. Additionally, the pattern of L1 and L2 predictive processing was not affected by the global validity of predictive cues. C-E bilinguals’ predictive processing could be partly transferred from their L1, as prior research showed that discourse information played a more significant role in L1 Chinese processing.

Keywords: bilingualism, discourse processing, global validity, prediction, self-paced reading

Procedia PDF Downloads 101