Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 168

Search results for: syntactic parsing

78 An Emerging Trend of Wrong Plurals among Pakistani Bilinguals: A Sociolinguistic Perspective

Abstract:

English is being used as linguafranca in most of the formal and informal situations of Pakistan. This extensive use has been rapidly replacing the identity of national language of Pakistani.e. Urdu. The nature of syntactic representation has always been the matter of confusion among linguists. Being unaware of the correct plural forms the non-natives commit mistakes while making plurals. But the situation is reverse when non-natives of English irrespective of knowing the right plurals make wrong plurals usually talking in their native language. The observation method was opted to check this hypothesis. Along with it, a checklist has been made in which these certain occurrences have been mentioned, where this flouting of the norms is a normal routine. The result confirms that Pakistani commit this mistake, i.e. ‘tablian’ the plural of tables, ‘filain’ the plural of files, though this is done by them on unconscious level. This emerging trend of unconscious mistake is leading Pakistani bilinguals towards a diglossic situation where they are coining portmanteau.

Keywords: bilinguals, emerging trend, portmanteau, trends

Procedia PDF Downloads 148

77 Specialized Translation Teaching Strategies: A Corpus-Based Approach

Authors: Yingying Ding

Abstract:

This study presents a methodology of specialized translation with the objective of helping teachers to improve the strategies in teaching translation. In order to allow students to acquire skills to translate specialized texts, they need to become familiar with the semantic and syntactic features of source texts and target texts. The aim of our study is to use a corpus-based approach in the teaching of specialized translation between Chinese and Italian. This study proposes to construct a specialized Chinese - Italian comparable corpus that consists of 50 economic contracts from the domain of food. With the help of AntConc, we propose to compile a comparable corpus in for translation teaching purposes. This paper attempts to provide insight into how teachers could benefit from comparable corpus in the teaching of specialized translation from Italian into Chinese and through some examples of passive sentences how students could learn to apply different strategies for translating appropriately the voice.

Keywords: contrastive studies, specialised translation, corpus-based approach, teaching

Procedia PDF Downloads 341

76 Grammatically Coded Corpus of Spoken Lithuanian: Methodology and Development

Authors: L. Kamandulytė-Merfeldienė

Abstract:

The paper deals with the main issues of methodology of the Corpus of Spoken Lithuanian which was started to be developed in 2006. At present, the corpus consists of 300,000 grammatically annotated word forms. The creation of the corpus consists of three main stages: collecting the data, the transcription of the recorded data, and the grammatical annotation. Collecting the data was based on the principles of balance and naturality. The recorded speech was transcribed according to the CHAT requirements of CHILDES. The transcripts were double-checked and annotated grammatically using CHILDES. The development of the Corpus of Spoken Lithuanian has led to the constant increase in studies on spontaneous communication, and various papers have dealt with a distribution of parts of speech, use of different grammatical forms, variation of inflectional paradigms, distribution of fillers, syntactic functions of adjectives, the mean length of utterances.

Keywords: CHILDES, corpus of spoken Lithuanian, grammatical annotation, grammatical disambiguation, lexicon, Lithuanian

Procedia PDF Downloads 208

75 A Relationship Extraction Method from Literary Fiction Considering Korean Linguistic Features

Authors: Hee-Jeong Ahn, Kee-Won Kim, Seung-Hoon Kim

Abstract:

The knowledge of the relationship between characters can help readers to understand the overall story or plot of the literary fiction. In this paper, we present a method for extracting the specific relationship between characters from a Korean literary fiction. Generally, methods for extracting relationships between characters in text are statistical or computational methods based on the sentence distance between characters without considering Korean linguistic features. Furthermore, it is difficult to extract the relationship with direction from text, such as one-sided love, because they consider only the weight of relationship, without considering the direction of the relationship. Therefore, in order to identify specific relationships between characters, we propose a statistical method considering linguistic features, such as syntactic patterns and speech verbs in Korean. The result of our method is represented by a weighted directed graph of the relationship between the characters. Furthermore, we expect that proposed method could be applied to the relationship analysis between characters of other content like movie or TV drama.

Keywords: data mining, Korean linguistic feature, literary fiction, relationship extraction

Procedia PDF Downloads 344

74 Composite Kernels for Public Emotion Recognition from Twitter

Authors: Chien-Hung Chen, Yan-Chun Hsing, Yung-Chun Chang

Abstract:

The Internet has grown into a powerful medium for information dispersion and social interaction that leads to a rapid growth of social media which allows users to easily post their emotions and perspectives regarding certain topics online. Our research aims at using natural language processing and text mining techniques to explore the public emotions expressed on Twitter by analyzing the sentiment behind tweets. In this paper, we propose a composite kernel method that integrates tree kernel with the linear kernel to simultaneously exploit both the tree representation and the distributed emotion keyword representation to analyze the syntactic and content information in tweets. The experiment results demonstrate that our method can effectively detect public emotion of tweets while outperforming the other compared methods.

Keywords: emotion recognition, natural language processing, composite kernel, sentiment analysis, text mining

Procedia PDF Downloads 193

73 Sentence Structure for Free Word Order Languages in Context with Anaphora Resolution: A Case Study of Hindi

Authors: Pardeep Singh, Kamlesh Dutta

Abstract:

Many languages have fixed sentence structure and others are free word order. The accuracy of anaphora resolution of syntax based algorithm depends on structure of the sentence. So, it is important to analyze the structure of any language before implementing these algorithms. In this study, we analyzed the sentence structure exploiting the case marker in Hindi as well as some special tag for subject and object. We also investigated the word order for Hindi. Word order typology refers to the study of the order of the syntactic constituents of a language. We analyzed 165 news items of Ranchi Express from EMILEE corpus of plain text. It consisted of 1745 sentences. Eight file of dialogue based from the same corpus has been analyzed which will have 1521 sentences. The percentages of subject object verb structure (SOV) and object subject verb (OSV) are 66.90 and 33.10, respectively.

Keywords: anaphora resolution, free word order languages, SOV, OSV

Procedia PDF Downloads 443

72 Minimizing Mutant Sets by Equivalence and Subsumption

Authors: Samia Alblwi, Amani Ayad

Abstract:

Mutation testing is the art of generating syntactic variations of a base program and checking whether a candidate test suite can identify all the mutants that are not semantically equivalent to the base: this technique is widely used by researchers to select quality test suites. One of the main obstacles to the widespread use of mutation testing is cost: even small pro-grams (a few dozen lines of code) can give rise to a large number of mutants (up to hundreds): this has created an incentive to seek to reduce the number of mutants while preserving their collective effectiveness. Two criteria have been used to reduce the size of mutant sets: equiva-lence, which aims to partition the set of mutants into equivalence classes modulo semantic equivalence, and selecting one representative per class; subsumption, which aims to define a partial ordering among mutants that ranks mutants by effectiveness and seeks to select maximal elements in this ordering. In this paper we analyze these two policies using analytical and em-pirical criteria.

Keywords: mutation testing, mutant sets, mutant equivalence, mutant subsumption, mutant set minimization

Procedia PDF Downloads 32

71 The Narrative Coherence of Autistic Children’s Accounts of an Experienced Event over Time

Authors: Fuming Yang, Telma Sousa Almeida, Xinyu Li, Yunxi Deng, Heying Zhang, Michael E. Lamb

Abstract:

Twenty-seven children aged 6-15 years with autism spectrum disorder (ASD) and 32 typically developing children were questioned about their participation in a set of activities after a two-week delay and again after a two-month delay, using a best-practice interview protocol. This paper assessed the narrative coherence of children’s reports based on key story grammar elements and temporal features included in their accounts of the event. Results indicated that, over time, both children with ASD and typically developing (TD) children decreased their narrative coherence. Children with ASD were no different from TD peers with regards to story length and syntactic complexity. However, they showed significantly less coherence than TD children. They were less likely to use the gist of the story to organize their narrative coherence. Interviewer prompts influenced children’s narrative coherence. The findings indicated that children with ASD could provide meaningful and reliable testimony about an event they personally experienced, but the narrative coherence of their reports deteriorates over time and is affected by interviewer prompts.

Keywords: autism spectrum disorders, delay, eyewitness testimony, narrative coherence

Procedia PDF Downloads 248

70 Agents and Causers in the Experiencer-Verb Lexicon

Authors: Margaret Ryan, Linda Cupples, Lyndsey Nickels, Paul Sowman

Abstract:

The current investigation explored the thematic roles of the nouns specified in the lexical entries of experiencer verbs. While prior experimental research assumes experiencer and theme roles for both subject-experiencer (SE) and object-experiencer (OE) verbs, syntactic theorists have posited additional agent and causer roles. Experiment 1 provided evidence for an agent as participants assigned a high degree of intentionality to the logical subject of a subset of SE and OE actives and passives. Experiment 2 provided evidence for a causer as participants assigned high levels of causality to the logical subjects of experiencer sentences generally. However, the presence of an agent, but not a causer, coincided with processing ease. Causality may be an aspect rather than a thematic role. The varying thematic roles amongst experiencer-verb sentences have important implications for stimulus selection because we cannot presume processing is similar across differing sentence subtypes.

Keywords: sentence comprehension, lexicon, canonicity, processing, thematic roles, syntax

Procedia PDF Downloads 83

69 The Grammatical Dictionary Compiler: A System for Kartvelian Languages

Authors: Liana Lortkipanidze, Nino Amirezashvili, Nino Javashvili

Abstract:

The purpose of the grammatical dictionary is to provide information on the morphological and syntactic characteristics of the basic word in the dictionary entry. The electronic grammatical dictionaries are used as a tool of automated morphological analysis for texts processing. The Georgian Grammatical Dictionary should contain grammatical information for each word: part of speech, type of declension/conjugation, grammatical forms of the word (paradigm), alternative variants of basic word/lemma. In this paper, we present the system for compiling the Georgian Grammatical Dictionary automatically. We propose dictionary-based methods for extending grammatical lexicons. The input lexicon contains only a few number of words with identical grammatical features. The extension is based on similarity measures between features of words; more precisely, we add words to the extended lexicons, which are similar to those, which are already in the grammatical dictionary. Our dictionaries are corpora-based, and for the compiling, we introduce the method for lemmatization of unknown words, i.e., words of which neither full form nor lemma is in the grammatical dictionary.

Keywords: acquisition of lexicon, Georgian grammatical dictionary, lemmatization rules, morphological processor

Procedia PDF Downloads 116

68 Hierarchical Tree Long Short-Term Memory for Sentence Representations

Authors: Xiuying Wang, Changliang Li, Bo Xu

Abstract:

A fixed-length feature vector is required for many machine learning algorithms in NLP field. Word embeddings have been very successful at learning lexical information. However, they cannot capture the compositional meaning of sentences, which prevents them from a deeper understanding of language. In this paper, we introduce a novel hierarchical tree long short-term memory (HTLSTM) model that learns vector representations for sentences of arbitrary syntactic type and length. We propose to split one sentence into three hierarchies: short phrase, long phrase and full sentence level. The HTLSTM model gives our algorithm the potential to fully consider the hierarchical information and long-term dependencies of language. We design the experiments on both English and Chinese corpus to evaluate our model on sentiment analysis task. And the results show that our model outperforms several existing state of the art approaches significantly.

Keywords: deep learning, hierarchical tree long short-term memory, sentence representation, sentiment analysis

Procedia PDF Downloads 328

67 A Tool to Represent People Approach to the Use of Pharmaceuticals and Related Criticality and Needs: A Territory Experience

Authors: Barbara Pittau, Piergiorgio Palla, Antonio Mastino

Abstract:

Communication is fundamental to health education. The proper use of medicinal products is a crucial aspect of the health of citizens that affects both safety and health care spending. Therefore, encouraging/promoting communication, concerning the importance of proper use of pharmaceuticals, has substantial implications in terms of individual health, health care, and health care system sustainability. In view of these considerations, in the context of two projects, one of which is still in progress, a relational database-backed web application named COLLABORAFARMACISOLA has been designed and developed as a tool to analyze and visualize how people approach the use of medicinal products, with the aim of improving and enhancing communication efficacy. The software application is being used to collect information (anonymously and voluntarily) from the citizens of Sardinia, an Italian region, regarding their knowledge, experiences, and opinions towards pharmaceuticals. This study that was conducted to date on thousand of interviewed people, has focused on different aspects such as: the treatment interruption and the "self-prescription” without medical consultation, the attention paid to reading the leaflets, the awareness of the economic value of the pharmaceuticals, the importance of avoiding the waste of medicinal products and the attitudes towards the use of generics. To this purpose, our software application provides a set of ad hoc parsing routines, to store information into the structure of a relational database and to process and visualize it through a set of interactive tools aimed to emphasize the findings and the insights obtained. The results of our preliminary analysis show the efficacy of the awareness plan and, at the same time, the criticality and the needs of the territory under examination. The ultimate goal of our study is to provide a contribution to the community by improving communication that can result in a benefit for public health in a context strictly connected to the reality of the territory.

Keywords: communication, pharmaceuticals, public health, relational database, tool, web application

Procedia PDF Downloads 110

66 AI Peer Review Challenge: Standard Model of Physics vs 4D GEM EOS

Authors: David A. Harness

Abstract:

Natural evolution of ATP cognitive systems is to meet AI peer review standards. ATP process of axiom selection from Mizar to prove a conjecture would be further refined, as in all human and machine learning, by solving the real world problem of the proposed AI peer review challenge: Determine which conjecture forms the higher confidence level constructive proof between Standard Model of Physics SU(n) lattice gauge group operation vs. present non-standard 4D GEM EOS SU(n) lattice gauge group spatially extended operation in which the photon and electron are the first two trace angular momentum invariants of a gravitoelectromagnetic (GEM) energy momentum density tensor wavetrain integration spin-stress pressure-volume equation of state (EOS), initiated via 32 lines of Mathematica code. Resulting gravitoelectromagnetic spectrum ranges from compressive through rarefactive of the central cosmological constant vacuum energy density in units of pascals. Said self-adjoint group operation exclusively operates on the stress energy momentum tensor of the Einstein field equations, introducing quantization directly on the 4D spacetime level, essentially reformulating the Yang-Mills virtual superpositioned particle compounded lattice gauge groups quantization of the vacuum—into a single hyper-complex multi-valued GEM U(1) × SU(1,3) lattice gauge group Planck spacetime mesh quantization of the vacuum. Thus the Mizar corpus already contains all of the axioms required for relevant DeepMath premise selection and unambiguous formal natural language parsing in context deep learning.

Keywords: automated theorem proving, constructive quantum field theory, information theory, neural networks

Procedia PDF Downloads 149

65 Distinguishing Borrowings from Code Mixes: An Analysis of English Lexical Items Used in the Print Media in Sri Lanka

Authors: Chamindi Dilkushi Senaratne

Abstract:

Borrowing is the morphological, syntactic and (usually) phonological integration of lexical items from one language into the structure of another language. Borrowings show complete linguistic integration and due to the frequency of use become fossilized in the recipient language differentiating them from switches and mixes. Code mixes are different to borrowings. Code mixing takes place when speakers use lexical items in casual conversation to serve a variety of functions. This study presents an analysis of lexical items used in English newspapers in Sri Lanka in 2017 which reveal characteristics of borrowing or code mixes. Both phenomena arise due to language contact. The study will also use data from social media websites that comment on newspaper articles available on the web. The study reiterates that borrowings are distinguishable from code mixes and that they are two different phenomena that occur in language contact situations. The study also shows how existing morphological processes are used to create new vocabulary in language use. The study sheds light into how existing morphological processes are used by the bilingual to be creative, innovative and convey a bilingual identity.

Keywords: borrowing, code mixing, morphological processes

Procedia PDF Downloads 196

64 Application of Natural Language Processing in Education

Authors: Khaled M. Alhawiti

Abstract:

Reading capability is a major segment of language competency. On the other hand, discovering topical writings at a fitting level for outside and second language learners is a test for educators. We address this issue utilizing natural language preparing innovation to survey reading level and streamline content. In the connection of outside and second-language learning, existing measures of reading level are not appropriate to this errand. Related work has demonstrated the profit of utilizing measurable language preparing procedures; we expand these thoughts and incorporate other potential peculiarities to measure intelligibility. In the first piece of this examination, we join characteristics from measurable language models, customary reading level measures and other language preparing apparatuses to deliver a finer technique for recognizing reading level. We examine the execution of human annotators and assess results for our finders concerning human appraisals. A key commitment is that our identifiers are trainable; with preparing and test information from the same space, our finders beat more general reading level instruments (Flesch-Kincaid and Lexile). Trainability will permit execution to be tuned to address the needs of specific gatherings or understudies.

Keywords: natural language processing, trainability, syntactic simplification tools, education

Procedia PDF Downloads 459

63 Towards a Large Scale Deep Semantically Analyzed Corpus for Arabic: Annotation and Evaluation

Authors: S. Alansary, M. Nagi

Abstract:

This paper presents an approach of conducting semantic annotation of Arabic corpus using the Universal Networking Language (UNL) framework. UNL is intended to be a promising strategy for providing a large collection of semantically annotated texts with formal, deep semantics rather than shallow. The result would constitute a semantic resource (semantic graphs) that is editable and that integrates various phenomena, including predicate-argument structure, scope, tense, thematic roles and rhetorical relations, into a single semantic formalism for knowledge representation. The paper will also present the Interactive Analysis tool for automatic semantic annotation (IAN). In addition, the cornerstone of the proposed methodology which are the disambiguation and transformation rules, will be presented. Semantic annotation using UNL has been applied to a corpus of 20,000 Arabic sentences representing the most frequent structures in the Arabic Wikipedia. The representation, at different linguistic levels was illustrated starting from the morphological level passing through the syntactic level till the semantic representation is reached. The output has been evaluated using the F-measure. It is 90% accurate. This demonstrates how powerful the formal environment is, as it enables intelligent text processing and search.

Keywords: semantic analysis, semantic annotation, Arabic, universal networking language

Procedia PDF Downloads 561

62 Investigating the Associative Network of Color Terms among Turkish University Students: A Cognitive-Based Study

Authors: R. Güçlü, E. Küçüksakarya

Abstract:

Word association (WA) gives the broadest information on how knowledge is structured in the human mind. Cognitive linguistics, psycholinguistics, and applied linguistics are the disciplines that consider WA tests as substantial in gaining insights into the very nature of the human cognitive system and semantic knowledge. In this study, Berlin and Kay’s basic 11 color terms (1969) are presented as the stimuli words to a total number of 300 Turkish university students. The responses are analyzed according to Fitzpatrick’s model (2007), including four categories, namely meaning-based responses, position-based responses, form-based responses, and erratic responses. In line with the findings, the responses to free association tests are expected to give much information about Turkish university students’ psychological structuring of vocabulary, especially morpho-syntactic and semantic relationships among words. To conclude, theoretical and practical implications are discussed to make an in-depth evaluation of how associations of basic color terms are represented in the mental lexicon of Turkish university students.

Keywords: color term, gender, mental lexicon, word association task

Procedia PDF Downloads 94

61 Negativization: A Focus Strategy in Basà Language

Authors: Imoh Philip

Abstract:

Basà language is classified as belonging to Kainji family, under the sub-phylum Western-Kainji known as Rubasa (Basa Benue) (Croizier & Blench, 1992:32). Basà is an under-described language spoken in the North-Central Nigeria. The language is characterized by subject-verb-object (henceforth SVO) as its canonical word order. Data for this work is sourced from the researcher’s native intuition of the language corroborated with a careful observation of native speakers. This paper investigates the syntactic derivational strategy of information-structure encoding in Basà language. It emphasizes on a negative operator, as a strategy for focusing a constituent or clause that follows it and negativizes a whole proposition. For items that are not nouns, they have to undergo an obligatory nominalization process, either by affixation, modification or conversion before they are moved to the pre verbal position for these operations. The study discovers and provides evidence of the fact showing that deferent constituents in the sentence such as the subject, direct, indirect object, genitive, verb phrase, prepositional phrase, clause and idiophone, etc. can be focused with the same negativizing operator. The process is characterized by focusing the pre verbal NP constituent alone, whereas the whole proposition is negated. The study can stimulate similar study or be replicated in other languages.

Keywords: negation, focus, Basà, nominalization

Procedia PDF Downloads 570

60 Deep Learning Based-Object-classes Semantic Classification of Arabic Texts

Authors: Imen Elleuch, Wael Ouarda, Gargouri Bilel

Abstract:

We proposes in this paper a Deep Learning based approach to classify text in order to enrich an Arabic ontology based on the objects classes of Gaston Gross. Those object classes are defined by taking into account the syntactic and semantic features of the treated language. Thus, our proposed approach is a hybrid one. In fact, it is based on the one hand on the object classes that represents a knowledge based-approach on classification of text and in the other hand it uses the deep learning approach that use the word embedding-based-approach to classify text. We have applied our proposed approach on a corpus constructed from an Arabic dictionary. The obtained semantic classification of text will enrich the Arabic objects classes ontology. In fact, new classes can be added to the ontology or an expansion of the features that characterizes each object class can be updated. The obtained results are compared to a similar work that treats the same object with a classical linguistic approach for the semantic classification of text. This comparison highlight our hybrid proposed approach that can be ameliorated by broaden the dataset used in the deep learning process.

Keywords: deep-learning approach, object-classes, semantic classification, Arabic

Procedia PDF Downloads 42

59 Investigating Translations of Websites of Pakistani Public Offices

Authors: Sufia Maroof

Abstract:

This empirical study investigated the web-translations of five Pakistani public offices (FPSC, FIA, HEC, USB, and Ministry of Finance) offering Urdu tab as an option to access information on their official websites. Triangulation of quantitative and qualitative research design informed the researcher of the semantic, lexical and syntactic caveats in these translations. The study hypothesized that majority of the Pakistani population is oblivious of the Supreme Court’s amendments in language policy concerning national and official language; hence, Urdu web-translations of the public departments have not been accessed effectively. Firstly, the researcher conducted an online survey, comprising of two sections, close ended and short answer based questions. Secondly, the researcher compiled corpus of the five selected websites in a tabular form to compare the data. Thirdly, the administrators of the departments had been contacted regarding the methods of translation and the expertise of the personnel involved. The corpus was assessed for TQA after examining the lexical, semantic, syntactical and technical alignment inaccuracies and imperfections. The study suggests the public offices to invest in their Urdu webs by either hiring expert translators or engaging expertise of a translation agency for this project to offer quality translation to public.

Keywords: machine translations, public offices, Urdu translations, websites

Procedia PDF Downloads 98

58 Cognitive Stylistics and Horror Fiction: A Case Study of Stephen King’s Misery

Authors: Kriangkrai Vathanalaoha

Abstract:

Misery generates fear and anxiety in readers through its intense plot associated with the unpredictable emotional states of the nurse, Annie Wilkes. At the same time, she mentally and physically abuses the novelist victim, Paul Sheldon. The suspense is not only at the story level, where the violent expressions are used but also at the discourse level, where the linguistic structures may intentionally cause the reader to view language as disturbing performative. This performativity could be reflected through linguistic choices where the writer triggers a new imaginative world through experiential metafunction and schema disruption. This study explores striking excerpts from the fiction through mind style and transitivity analysis to demonstrate how the horrific experience contrasts when the protagonist and the antagonist converse extensively. The results reveal that stylistic deviation can be found at the syntactic levels, where the intensity of emotions can be apparent when the protagonist is verbally abused. In addition, transitivity can flesh out how the protagonist is expressed chiefly through the internalized process, whereas the antagonist is eminent with the externalized process. The findings suggest that the application of cognitive stylistics, such as mind style and transitivity analysis, could contribute to the mental representation of horrific reality.

Keywords: horror, mind style, misery, stylistics, transitivity

Procedia PDF Downloads 110

57 Tibyan Automated Arabic Correction Using Machine-Learning in Detecting Syntactical Mistakes

Authors: Ashwag O. Maghraby, Nida N. Khan, Hosnia A. Ahmed, Ghufran N. Brohi, Hind F. Assouli, Jawaher S. Melibari

Abstract:

The Arabic language is one of the most important languages. Learning it is so important for many people around the world because of its religious and economic importance and the real challenge lies in practicing it without grammatical or syntactical mistakes. This research focused on detecting and correcting the syntactic mistakes of Arabic syntax according to their position in the sentence and focused on two of the main syntactical rules in Arabic: Dual and Plural. It analyzes each sentence in the text, using Stanford CoreNLP morphological analyzer and machine-learning approach in order to detect the syntactical mistakes and then correct it. A prototype of the proposed system was implemented and evaluated. It uses support vector machine (SVM) algorithm to detect Arabic grammatical errors and correct them using the rule-based approach. The prototype system has a far accuracy 81%. In general, it shows a set of useful grammatical suggestions that the user may forget about while writing due to lack of familiarity with grammar or as a result of the speed of writing such as alerting the user when using a plural term to indicate one person.

Keywords: Arabic language acquisition and learning, natural language processing, morphological analyzer, part-of-speech

Procedia PDF Downloads 123

56 Selecting Answers for Questions with Multiple Answer Choices in Arabic Question Answering Based on Textual Entailment Recognition

Authors: Anes Enakoa, Yawei Liang

Abstract:

Question Answering (QA) system is one of the most important and demanding tasks in the field of Natural Language Processing (NLP). In QA systems, the answer generation task generates a list of candidate answers to the user's question, in which only one answer is correct. Answer selection is one of the main components of the QA, which is concerned with selecting the best answer choice from the candidate answers suggested by the system. However, the selection process can be very challenging especially in Arabic due to its particularities. To address this challenge, an approach is proposed to answer questions with multiple answer choices for Arabic QA systems based on Textual Entailment (TE) recognition. The developed approach employs a Support Vector Machine that considers lexical, semantic and syntactic features in order to recognize the entailment between the generated hypotheses (H) and the text (T). A set of experiments has been conducted for performance evaluation and the overall performance of the proposed method reached an accuracy of 67.5% with C@1 score of 80.46%. The obtained results are promising and demonstrate that the proposed method is effective for TE recognition task.

Keywords: information retrieval, machine learning, natural language processing, question answering, textual entailment

Procedia PDF Downloads 120

55 The Structural Pattern: An Event-Related Potential Study on Tang Poetry

Authors: ShuHui Yang, ChingChing Lu

Abstract:

Measuring event-related potentials (ERPs) has been fundamental to our understanding of how people process language. One specific ERP component, a P600, has been hypothesized to be associated with syntactic reanalysis processes. We, however, propose that the P600 is not restricted to reanalysis processes, but is the index of the structural pattern processing. To investigate the structural pattern processing, we utilized the effects of stimulus degradation in structural priming. To put it another way, there was no P600 effect if the structure of the prime was the same with the structure of the target. Otherwise, there would be a P600 effect if the structure were different between the prime and the target. In the experiment, twenty-two participants were presented with four sentences of Tang poetry. All of the first two sentences, being prime, were conducted with SVO+VP. The last two sentences, being the target, were divided into three types. Type one of the targets was SVO+VP. Type two of the targets was SVO+VPVP. Type three of the targets was VP+VP. The result showed that both of the targets, SVO+VPVP and VP+VP, elicited positive-going brainwave, a P600 effect, at 600~900ms time window. Furthermore, the P600 component was lager for the target’ VP+VP’ than the target’ SVO+VPVP’. That meant the more dissimilar the structure was, the lager the P600 effect we got. These results indicate that P600 was the index of the structure processing, and it would affect the P600 effect intensity with the degrees of structural heterogeneity.

Keywords: ERPs, P600, structural pattern, structural priming, Tang poetry

Procedia PDF Downloads 104

54 Assessing Language Dominance in Mexican Deaf Signers with the Bilingual Language Profile (BLP)

Authors: E. Mendoza, D. Jackson-Maldonado, G. Avecilla-Ramírez, A. Mondaca

Abstract:

Assessing language proficiency is a major issue in psycholinguistic research. There are multiple tools that measure language dominance and language proficiency in hearing bilinguals, however, this is not the case for Deaf bilinguals. Specifically, there are few, if not none, assessment tools useful in the description of the multilingual abilities of Mexican Deaf signers. Because of this, the linguistic characteristics of Mexican Deaf population have been poorly described. This paper attempts to explain the necessary changes done in order to adapt the Bilingual Language Profile (BLP) to Mexican Sign Language (LSM) and written/oral Spanish. BLP is a Self-Evaluation tool that has been adapted and translated to several oral languages, but not to sign languages. Lexical, syntactic, cultural, and structural changes were applied to the BLP. 35 Mexican Deaf signers participated in a pilot study. All of them were enrolled in Higher Education programs. BLP was presented online in written Spanish via Google Forms. No additional information in LSM was provided. Results show great heterogeneity as it is expected of Deaf populations and BLP seems to be a useful tool to create a bilingual profile of the Mexican Deaf population. This is a first attempt to adapt a widely tested tool in bilingualism research to sign language. Further modifications need to be done.

Keywords: deaf bilinguals, assessment tools, bilingual language profile, mexican sign language

Procedia PDF Downloads 123

53 Linguistic Features for Sentence Difficulty Prediction in Aspect-Based Sentiment Analysis

Authors: Adrian-Gabriel Chifu, Sebastien Fournier

Abstract:

One of the challenges of natural language understanding is to deal with the subjectivity of sentences, which may express opinions and emotions that add layers of complexity and nuance. Sentiment analysis is a field that aims to extract and analyze these subjective elements from text, and it can be applied at different levels of granularity, such as document, paragraph, sentence, or aspect. Aspect-based sentiment analysis is a well-studied topic with many available data sets and models. However, there is no clear definition of what makes a sentence difficult for aspect-based sentiment analysis. In this paper, we explore this question by conducting an experiment with three data sets: ”Laptops”, ”Restaurants”, and ”MTSC” (Multi-Target-dependent Sentiment Classification), and a merged version of these three datasets. We study the impact of domain diversity and syntactic diversity on difficulty. We use a combination of classifiers to identify the most difficult sentences and analyze their characteristics. We employ two ways of defining sentence difficulty. The first one is binary and labels a sentence as difficult if the classifiers fail to correctly predict the sentiment polarity. The second one is a six-level scale based on how many of the top five best-performing classifiers can correctly predict the sentiment polarity. We also define 9 linguistic features that, combined, aim at estimating the difficulty at sentence level.

Keywords: sentiment analysis, difficulty, classification, machine learning

Procedia PDF Downloads 46

52 Research on Road Openness in the Old Urban Residential District Based on Space Syntax: A Case Study on Kunming within the First Loop Road

Authors: Haoyang Liang, Dandong Ge

Abstract:

With the rapid development of Chinese cities, traffic congestion has become more and more serious. At the same time, there are many closed old residential area in Chinese cities, which seriously affect the connectivity of urban roads and reduce the density of urban road networks. After reopening the restricted old residential area, the internal roads in the original residential area were transformed into urban roads, which was of great help to alleviate traffic congestion. This paper uses the spatial syntactic theory to analyze the urban road network and compares the roads with the integration and connectivity degree to evaluate whether the opening of the roads in the residential areas can improve the urban traffic. Based on the road network system within the first loop road in Kunming, the Space Syntax evaluation model is established for status analysis. And comparative analysis method will be used to compare the change of the model before and after the road openness of the old urban residential district within the first-ring road in Kunming. Then it will pick out the areas which indicate a significant difference for the small dimensions model analysis. According to the analyzed results and traffic situation, the evaluation of road openness in the old urban residential district will be proposed to improve the urban residential districts.

Keywords: Space Syntax, Kunming, urban renovation, traffic jam

Procedia PDF Downloads 127

51 A Semantic Registry to Support Brazilian Aeronautical Web Services Operations

Authors: Luís Antonio de Almeida Rodriguez, José Maria Parente de Oliveira, Ednelson Oliveira

Abstract:

In the last two decades, the world’s aviation authorities have made several attempts to create consensus about a global and accepted approach for applying semantics to web services registry descriptions. This problem has led communities to face a fat and disorganized infrastructure to describe aeronautical web services. It is usual for developers to implement ad-hoc connections among consumers and providers and manually create non-standardized service compositions, which need some particular approach to compose and semantically discover a desired web service. Current practices are not precise and tend to focus on lightweight specifications of some parts of the OWL-S and embed them into syntactic descriptions (SOAP artifacts and OWL language). It is necessary to have the ability to manage the use of both technologies. This paper presents an implementation of the ontology OWL-S that describes a Brazilian Aeronautical Web Service Registry, which makes it able to publish, advertise, make multi-criteria semantic discovery aligned with the ideas of the System Wide Information Management (SWIM) Program, and invoke web services within the Air Traffic Management context. The proposal’s best finding is a generic approach to describe semantic web services. The paper also presents a set of functional requirements to guide the ontology development and to compare them to the results to validate the implementation of the OWL-S Ontology.

Keywords: aeronautical web services, OWL-S, semantic web services discovery, ontologies

Procedia PDF Downloads 58

50 Probing Language Models for Multiple Linguistic Information

Authors: Bowen Ding, Yihao Kuang

Abstract:

In recent years, large-scale pre-trained language models have achieved state-of-the-art performance on a variety of natural language processing tasks. The word vectors produced by these language models can be viewed as dense encoded presentations of natural language that in text form. However, it is unknown how much linguistic information is encoded and how. In this paper, we construct several corresponding probing tasks for multiple linguistic information to clarify the encoding capabilities of different language models and performed a visual display. We firstly obtain word presentations in vector form from different language models, including BERT, ELMo, RoBERTa and GPT. Classifiers with a small scale of parameters and unsupervised tasks are then applied on these word vectors to discriminate their capability to encode corresponding linguistic information. The constructed probe tasks contain both semantic and syntactic aspects. The semantic aspect includes the ability of the model to understand semantic entities such as numbers, time, and characters, and the grammatical aspect includes the ability of the language model to understand grammatical structures such as dependency relationships and reference relationships. We also compare encoding capabilities of different layers in the same language model to infer how linguistic information is encoded in the model.

Keywords: language models, probing task, text presentation, linguistic information

Procedia PDF Downloads 70

49 Syntax-Related Problems of Translation

Authors: Anna Kesoyan

Abstract:

The present paper deals with the syntax-related problems of translation from English into Armenian. Although Syntax is a part of grammar, syntax-related problems of translation are studied separately during the process of translation. Translation from one language to another is widely accepted as a challenging problem. This becomes even more challenging when the source and target languages are widely different in structure and style, as is the case with English and Armenian. Syntax-related problems of translation from English into Armenian are mainly connected with the syntactical structures of these languages, and particularly, with the word order of the sentence. The word order of the sentence of the Armenian language, which is a synthetic language, is usually characterized as “rather free”, and the word order of the English language, which is an analytical language, is characterized “fixed”. The following research examines the main translation means, particularly, syntactical transformations as the translator has to take real steps while trying to solve certain syntax-related problems. Most of the means of translation are based on the transformation of grammatical components of the sentence, without changing the main information of the text. There are several transformations that occur during translation such as word order of the sentence, transformations of certain grammatical constructions like Infinitive participial construction, Nominative with the Infinitive and Elliptical constructions which have been covered in the following research.

Keywords: elliptical constructions, nominative with the infinitive constructions, fixed and free word order, syntactic structures

Procedia PDF Downloads 418