Search results for: morpho-semantic and syntactic analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 26851

Search results for: morpho-semantic and syntactic analysis

26761 An Approach to Specify Software Requirements in Semantic Form

Authors: Deepa Vijay, Chellammal Surianarayanan, Gopinath Ganapathy

Abstract:

Requirements of a software project serve as a guideline for the entire project team which enable the team towards producing the right outcome. As requirements are the key in deciding the success of the project, it should be specified in an unambiguous manner. Also, the requirements should be complete and consistent. It should be interpreted in the same way by the entire software project team as the customer interprets. Specifying requirements in textual manner is common in software development. This leads to poor understanding of the requirements which results in more errors and degraded quality. There are some literatures which focus on semantic way of specifying functional requirement which ensure the consistency and completeness of requirements. Alternately in the work, a method is proposed to map the syntactic requirements with corresponding semantics in the form of ontologies. This improves the understanding of requirements, prevents errors and improves quality.

Keywords: functional requirement, ontology, requirements management, semantics

Procedia PDF Downloads 338
26760 Existential and Possessive Constructions in Modern Standard Arabic Two Strategies Reflecting the Ontological (Non-)Autonomy of Located or Possessed Entities

Authors: Fayssal Tayalati

Abstract:

Although languages use very divergent constructional strategies, all existential constructions appear to invariably involve an implicit or explicit locative constituent. This locative constituent either surface as a true locative phrase or are realized as a possessor noun phrase. However, while much research focuses on the supposed underlying syntactic relation of locative and possessive existential constructions, not much is known about possible semantic factors that could govern the choice between these constructions. The main question that we address in this talk concerns the choice between the two related constructions in Modern Standard Arabic (MAS). Although both are used to express the existence of something somewhere, we can distinguish three contexts: First, for some types of entities, only the EL construction is possible (e.g. (1a) ṯammata raǧulun fī l-ḥadīqati vs. (1b) *(kāna) ladā l-ḥadīqati raǧulun). Second, for other types of entities, only the possessive construction is possible (e.g. (2a) ladā ṭ-ṭawilati aklun dāʾiriyyun vs. (2b) *ṯammata šaklun dāʾiriyyun ladā/fī ṭ-ṭawilati). Finally, for still other entities, both constructions can be found (e.g. (3a) ṯammata ḥubbun lā yūṣafu ladā ǧārī li-zawǧati-hi and (3b) ladā ǧārī ḥubbun lā yūṣafu li-zawǧati-hi). The data covering a range of ontologically different entities (concrete objects, events, body parts, dimensions, essential qualities, feelings, etc.) shows that the choice between the existential locative and the possessive constructions is closely linked to the conceptual autonomy of the existential theme with respect to its location or to the whole that it is a part of. The construction with ṯammata is the only possible one to express the existence of a fully autonomous (i.e. nondependent) entity (concrete objects (e.g.1) and abstract objects such as events, especially the ones that Grimshaw called ‘simple events’). The possessive construction with (kāna) ladā is the only one used to express the existence of fully non-autonomous (i.e. fully dependent on a whole) entities (body parts, dimensions (e.g. 2), essential qualities). The two constructions alternate when the existential theme is conceptually dependent but separable of the whole, either because it has an autonomous (independent) existence of the given whole (spare parts of an object), or because it receives a relative autonomy in the speech through a modifier (accidental qualities, feelings (e.g. 3a, 3b), psychological states, among some other kinds of themes). In this case, the modifier expresses an approximate boundary on a scale, and provides relative autonomy to the entity. Finally, we will show that kinship terms (e.g. son), which at first sight may seem to constitute counterexamples to our hypothesis, are nonetheless supported by it. The ontological (non-)autonomy of located or possessed entities is also reflected by morpho-syntactic properties, among them the use and the choice of determiners, pluralisation and the behavior of entities in the context of associative anaphora.

Keywords: existence, possession, autonomous entities, non-autonomous entities

Procedia PDF Downloads 320
26759 Investigating Universals of Rhetoric

Authors: Nasreddin Ahmed

Abstract:

Despite the ostensible extant differences amongst world languages’ structures that have culminated in the divergence in orthographic, phonological, morphological, and syntactic systems that each language has, research in cognitive linguistic strives to establish the claim that such differences are merely prima facie of a totalized universal system of signification.Linguists , since Chomsky, have never given up on the attempt to establish linguistic descriptive model that espouses a perspective in which every human language has a slot . Concurring with claim that the so-called rhetorical devices are pervasive phenomena and not literary-specific , the present paper aspires to voice the claim that rhetorical devices not only ubiquitous in all levels of a particular language but also a universal linguistic phenomena. Using illustrations from Arabic and Englishthe paper intend to provide data-supported evidence that human beings are universally using similar rhetorical, albeit given different appellations.

Keywords: language, rhetoric, syntax, stylistics

Procedia PDF Downloads 70
26758 Systemic Functional Linguistics in the Rhetorical Strategies of Persuasion: A Longitudinal Study of Transitivity and Ergativity in the Rhetoric of Saras’ Sustainability Reports

Authors: Antonio Piga

Abstract:

This study explores the correlation between Systemic Functional Linguistics (SFL) and Critical Discourse Analysis (CDA) as tools for analysing the evolution of rhetoric in the communicative strategies adopted in a company’s Reports on social and environmental responsibility. In more specific terms, transitivity and ergativity- concepts from Systemic Functional Linguistics (SFL) - through the lenses of CDA, are employed as a theoretical means for the analysis of a longitudinal study in the communicative strategies employed by Saras SpA pre- and during the Covid-19 pandemic crisis. Saras is an Italian joint-stock company operating in oil refining and power generation. The qualitative and quantitative linguistic analysis carried out through the use of Sketch Engine software aims to identify and explain how rhetoric - and ideology - is constructed and presented through language use in Saras SpA Sustainability Reports. Specific focus is given to communication strategies to local and global communities and stakeholders in the years immediately before and during the Covid-19 pandemic. The rationale behind the study lies in the fact that 2020 and 2021 have been among the most difficult years since the end of World War II. Lives were abruptly turned upside down by the pandemic, which had grave negative effects on people’s health and on the economy. The result has been a threefold crisis involving health, the economy and social tension, with the refining sector being one of the hardest hit, since the oil refining industry was one of the most affected industries due to the general reduction in mobility and oil consumption brought about by the virus-fighting measures. Emphasis is placed on the construction of rhetorical strategies pre- and during the pandemic crisis using the representational process of transitivity and ergativity (SFL), thus revealing the close relationship between the use language in terms of Social Actors and semantic roles of syntactic transformation on the one hand, and ideological assumptions on the other. The results show that linguistic decisions regarding transitivity and ergativity choices play a crucial role in how effective writing achieves its rhetorical objectives in terms of spreading and maintaining dominant and implicit ideologies and underlying persuasive actions, and that some ideological motivation is perpetuated – if not actually overtly or subtly strengthened - in social-environmental Reports issued in the midst of the Covid-19 pandemic crisis.

Keywords: systemic functional linguistics, sustainability, critical discourse analysis, transitivity, ergativity

Procedia PDF Downloads 76
26757 The Decline of Verb-Second in the History of English: Combining Historical and Theoretical Explanations for Change

Authors: Sophie Whittle

Abstract:

Prior to present day, English syntax historically exhibited an inconsistent verb-second (V2) rule, which saw the verb move to the second position in the sentence following the fronting of a type of phrase. There was a high amount of variation throughout the history of English with regard to the ordering of subject and verb, and many explanations attempting to account for this variation have been documented in previous literature. However, these attempts have been contradictory, with many accounts positing the effect of previous syntactic changes as the main motivations behind the decline of V2. For instance, morphosyntactic changes, such as the loss of clitics and the loss of empty expletives, have been loosely connected to changes in frequency for the loss of V2. The questions surrounding the development of non-V2 in English have, therefore, yet to be answered. The current paper aims to bring together a number of explanations from different linguistic fields to determine the factors driving the changes in English V2. Using historical corpus-based methods, the study analyses both quantitatively and qualitatively the changes in frequency for the history of V2 in the Old, Middle, and Modern English periods to account for the variation in a range of sentential environments. These methods delve into the study of information structure, prosody and language contact to explain variation within different contexts. The analysis concludes that these factors, in addition to changes within the syntax, are responsible for the position of verb movement. The loss of V2 serves as an exemplar study within the field of historical linguistics, which combines a number of factors in explaining language change in general.

Keywords: corpora, English, language change, mixed-methods, syntax, verb-second

Procedia PDF Downloads 106
26756 The Use of Corpora in Improving Modal Verb Treatment in English as Foreign Language Textbooks

Authors: Lexi Li, Vanessa H. K. Pang

Abstract:

This study aims to demonstrate how native and learner corpora can be used to enhance modal verb treatment in EFL textbooks in mainland China. It contributes to a corpus-informed and learner-centered design of grammar presentation in EFL textbooks that enhances the authenticity and appropriateness of textbook language for target learners. The linguistic focus is will, would, can, could, may, might, shall, should, must. The native corpus is the spoken component of BNC2014 (hereafter BNCS2014). The spoken part is chosen because pedagogical purpose of the textbooks is communication-oriented. Using the standard query option of CQPweb, 5% of each of the nine modals was sampled from BNCS2014. The learner corpus is the POS-tagged Ten-thousand English Compositions of Chinese Learners (TECCL). All the essays under the 'secondary school' section were selected. A series of five secondary coursebooks comprise the textbook corpus. All the data in both the learner and the textbook corpora are retrieved through the concordance functions of WordSmith Tools (version, 5.0). Data analysis was divided into two parts. The first part compared the patterns of modal verbs in the textbook corpus and BNC2014 with respect to distributional features, semantic functions, and co-occurring constructions to examine whether the textbooks reflect the authentic use of English. Secondly, the learner corpus was analyzed in terms of the use (distributional features, semantic functions, and co-occurring constructions) and the misuse (syntactic errors, e.g., she can sings*.) of the nine modal verbs to uncover potential difficulties that confront learners. The analysis of distribution indicates several discrepancies between the textbook corpus and BNCS2014. The first four most frequent modal verbs in BNCS2014 are can, would, will, could, while can, will, should, could are the top four in the textbooks. Most strikingly, there is an unusually high proportion of can (41.1%) in the textbooks. The results on different meanings shows that will, would and must are the most problematic. For example, for will, the textbooks contain 20% more occurrences of 'volition' and 20% less of 'prediction' than those in BNCS2014. Regarding co-occurring structures, the textbooks over-represented the structure 'modal +do' across the nine modal verbs. Another major finding is that the structure of 'modal +have done' that frequently co-occur with could, would, should, and must is underused in textbooks. Besides, these four modal verbs are the most difficult for learners, as the error analysis shows. This study demonstrates how the synergy of native and learner corpora can be harnessed to improve EFL textbook presentation of modal verbs in a way that textbooks can provide not only authentic language used in natural discourse but also appropriate design tailed for the needs of target learners.

Keywords: English as Foreign Language, EFL textbooks, learner corpus, modal verbs, native corpus

Procedia PDF Downloads 115
26755 Hand Gesture Recognition for Sign Language: A New Higher Order Fuzzy HMM Approach

Authors: Saad M. Darwish, Magda M. Madbouly, Murad B. Khorsheed

Abstract:

Sign Languages (SL) are the most accomplished forms of gestural communication. Therefore, their automatic analysis is a real challenge, which is interestingly implied to their lexical and syntactic organization levels. Hidden Markov models (HMM’s) have been used prominently and successfully in speech recognition and, more recently, in handwriting recognition. Consequently, they seem ideal for visual recognition of complex, structured hand gestures such as are found in sign language. In this paper, several results concerning static hand gesture recognition using an algorithm based on Type-2 Fuzzy HMM (T2FHMM) are presented. The features used as observables in the training as well as in the recognition phases are based on Singular Value Decomposition (SVD). SVD is an extension of Eigen decomposition to suit non-square matrices to reduce multi attribute hand gesture data to feature vectors. SVD optimally exposes the geometric structure of a matrix. In our approach, we replace the basic HMM arithmetic operators by some adequate Type-2 fuzzy operators that permits us to relax the additive constraint of probability measures. Therefore, T2FHMMs are able to handle both random and fuzzy uncertainties existing universally in the sequential data. Experimental results show that T2FHMMs can effectively handle noise and dialect uncertainties in hand signals besides a better classification performance than the classical HMMs. The recognition rate of the proposed system is 100% for uniform hand images and 86.21% for cluttered hand images.

Keywords: hand gesture recognition, hand detection, type-2 fuzzy logic, hidden Markov Model

Procedia PDF Downloads 430
26754 Radical Web Text Classification Using a Composite-Based Approach

Authors: Kolade Olawande Owoeye, George R. S. Weir

Abstract:

The widespread of terrorism and extremism activities on the internet has become a major threat to the government and national securities due to their potential dangers which have necessitated the need for intelligence gathering via web and real-time monitoring of potential websites for extremist activities. However, the manual classification for such contents is practically difficult or time-consuming. In response to this challenge, an automated classification system called composite technique was developed. This is a computational framework that explores the combination of both semantics and syntactic features of textual contents of a web. We implemented the framework on a set of extremist webpages dataset that has been subjected to the manual classification process. Therein, we developed a classification model on the data using J48 decision algorithm, this is to generate a measure of how well each page can be classified into their appropriate classes. The classification result obtained from our method when compared with other states of arts, indicated a 96% success rate in classifying overall webpages when matched against the manual classification.

Keywords: extremist, web pages, classification, semantics, posit

Procedia PDF Downloads 119
26753 An Emerging Trend of Wrong Plurals among Pakistani Bilinguals: A Sociolinguistic Perspective

Authors: Sikander Ali

Abstract:

English is being used as linguafranca in most of the formal and informal situations of Pakistan. This extensive use has been rapidly replacing the identity of national language of Pakistani.e. Urdu. The nature of syntactic representation has always been the matter of confusion among linguists. Being unaware of the correct plural forms the non-natives commit mistakes while making plurals. But the situation is reverse when non-natives of English irrespective of knowing the right plurals make wrong plurals usually talking in their native language. The observation method was opted to check this hypothesis. Along with it, a checklist has been made in which these certain occurrences have been mentioned, where this flouting of the norms is a normal routine. The result confirms that Pakistani commit this mistake, i.e. ‘tablian’ the plural of tables, ‘filain’ the plural of files, though this is done by them on unconscious level. This emerging trend of unconscious mistake is leading Pakistani bilinguals towards a diglossic situation where they are coining portmanteau.

Keywords: bilinguals, emerging trend, portmanteau, trends

Procedia PDF Downloads 146
26752 Computational Linguistic Implications of Gender Bias: Machines Reflect Misogyny in Society

Authors: Irene Yi

Abstract:

Machine learning, natural language processing, and neural network models of language are becoming more and more prevalent in the fields of technology and linguistics today. Training data for machines are at best, large corpora of human literature and at worst, a reflection of the ugliness in society. Computational linguistics is a growing field dealing with such issues of data collection for technological development. Machines have been trained on millions of human books, only to find that in the course of human history, derogatory and sexist adjectives are used significantly more frequently when describing females in history and literature than when describing males. This is extremely problematic, both as training data, and as the outcome of natural language processing. As machines start to handle more responsibilities, it is crucial to ensure that they do not take with them historical sexist and misogynistic notions. This paper gathers data and algorithms from neural network models of language having to deal with syntax, semantics, sociolinguistics, and text classification. Computational analysis on such linguistic data is used to find patterns of misogyny. Results are significant in showing the existing intentional and unintentional misogynistic notions used to train machines, as well as in developing better technologies that take into account the semantics and syntax of text to be more mindful and reflect gender equality. Further, this paper deals with the idea of non-binary gender pronouns and how machines can process these pronouns correctly, given its semantic and syntactic context. This paper also delves into the implications of gendered grammar and its effect, cross-linguistically, on natural language processing. Languages such as French or Spanish not only have rigid gendered grammar rules, but also historically patriarchal societies. The progression of society comes hand in hand with not only its language, but how machines process those natural languages. These ideas are all extremely vital to the development of natural language models in technology, and they must be taken into account immediately.

Keywords: computational analysis, gendered grammar, misogynistic language, neural networks

Procedia PDF Downloads 90
26751 The Amount of Conformity of Persian Subject Headlines with Users' Social Tagging

Authors: Amir Reza Asnafi, Masoumeh Kazemizadeh, Najmeh Salemi

Abstract:

Due to the diversity of information resources in the web0.2 environment, which is increasing in number from time to time, the social tagging system should be used to discuss Internet resources. Studying the relevance of social tags to thematic headings can help enrich resources and make them more accessible to resources. The present research is of applied-theoretical type and research method of content analysis. In this study, using the listing method and content analysis, the level of accurate, approximate, relative, and non-conformity of social labels of books available in the field of information science and bibliography of Kitabrah website with Persian subject headings was determined. The exact matching of subject headings with social tags averaged 22 items, the approximate matching of subject headings with social tags averaged 36 items, the relative matching of thematic headings with social tags averaged 36 social items, and the average matching titles did not match the title. The average is 116. According to the findings, the exact matching of subject headings with social labels is the lowest and the most inconsistent. This study showed that the average non-compliance of subject headings with social labels is even higher than the sum of the three types of exact, relative, and approximate matching. As a result, the relevance of thematic titles to social labels is low. Due to the fact that the subject headings are in the form of static text and users are not allowed to interact and insert new selected words and topics, and on the other hand, in websites based on Web 2 and based on the social classification system, this possibility is available for users. An important point of the present study and the studies that have matched the syntactic and semantic matching of social labels with thematic headings is that the degree of conformity of thematic headings with social labels is low. Therefore, these two methods can complement each other and create a hybrid cataloging that includes subject headings and social tags. The low level of conformity of thematic headings with social tags confirms the results of backgrounds and writings that have compared the social tags of books with the thematic headings of the Library of Congress. It is not enough to match social labels with thematic headings. It can be said that these two methods can be complementary.

Keywords: Web 2/0, social tags, subject headings, hybrid cataloging

Procedia PDF Downloads 134
26750 Specialized Translation Teaching Strategies: A Corpus-Based Approach

Authors: Yingying Ding

Abstract:

This study presents a methodology of specialized translation with the objective of helping teachers to improve the strategies in teaching translation. In order to allow students to acquire skills to translate specialized texts, they need to become familiar with the semantic and syntactic features of source texts and target texts. The aim of our study is to use a corpus-based approach in the teaching of specialized translation between Chinese and Italian. This study proposes to construct a specialized Chinese - Italian comparable corpus that consists of 50 economic contracts from the domain of food. With the help of AntConc, we propose to compile a comparable corpus in for translation teaching purposes. This paper attempts to provide insight into how teachers could benefit from comparable corpus in the teaching of specialized translation from Italian into Chinese and through some examples of passive sentences how students could learn to apply different strategies for translating appropriately the voice.

Keywords: contrastive studies, specialised translation, corpus-based approach, teaching

Procedia PDF Downloads 340
26749 Grammatically Coded Corpus of Spoken Lithuanian: Methodology and Development

Authors: L. Kamandulytė-Merfeldienė

Abstract:

The paper deals with the main issues of methodology of the Corpus of Spoken Lithuanian which was started to be developed in 2006. At present, the corpus consists of 300,000 grammatically annotated word forms. The creation of the corpus consists of three main stages: collecting the data, the transcription of the recorded data, and the grammatical annotation. Collecting the data was based on the principles of balance and naturality. The recorded speech was transcribed according to the CHAT requirements of CHILDES. The transcripts were double-checked and annotated grammatically using CHILDES. The development of the Corpus of Spoken Lithuanian has led to the constant increase in studies on spontaneous communication, and various papers have dealt with a distribution of parts of speech, use of different grammatical forms, variation of inflectional paradigms, distribution of fillers, syntactic functions of adjectives, the mean length of utterances.

Keywords: CHILDES, corpus of spoken Lithuanian, grammatical annotation, grammatical disambiguation, lexicon, Lithuanian

Procedia PDF Downloads 208
26748 Moving from Practice to Theory

Authors: Maria Lina Garrido

Abstract:

This paper aims to reflect upon instruction in English classes with the specific purpose of reading comprehension development, having as its paradigm the considerations presented by William Grabe, in his book Reading in a Second Language: Moving from theory to practice. His concerns regarding the connection between research findings and instructional practices have stimulated the present author to re-evaluate both her long practice as an English reading teacher and as the author of two reading textbooks for graduate students. Elements of the reading process such as linguistic issues, prior knowledge, reading strategies, critical evaluation, and motivation are the main foci of this analysis as far as the activities developed in the classroom are concerned. The experience with university candidates on postgraduate courses with different levels of English knowledge in Bahia, Brazil, has definitely demanded certain adjustments to this author`s classroom setting. Word recognition based on cognates, for example, has been emphasized given the fact that academic texts use many Latin words which have the same roots as the Brazilian Portuguese lexicon. Concerning syntactic parsing, the tenses/verbal aspects, modality and linking words are included in the curriculum, but not with the same depth as the general English curricula. Reading strategies, another essential predictor for developing reading skills, have been largely stimulated in L2 classes in order to compensate for a lack of the appropriate knowledge of the foreign language. This paper presents results that demonstrate that this author`s teaching practice is compatible with the implications and instruction concerning the reading process outlined by Grabe, however, it admits that each class demands specific instructions to meet the needs of that particular group.

Keywords: classroom practice, instructional activities, reading comprehension, reading skills

Procedia PDF Downloads 426
26747 Written Narrative Texts as the Indicators of Communication Competence of Pupils and Students with Hearing Impairment in the Czech Language

Authors: Marie Komorna, Katerina Hadkova

Abstract:

One reason why hearing disabilities as compared to other disabilities are considered to be less serious, is the belief that deaf and hard of hearing persons can read and write without problems and can therefore fairly easily compensate for problems related to their limited ability to hear sound. However in reality this is not the case, especially as regards written Czech, deaf persons are often not able to communicate their message clearly to its recipients. Their inability to communicate fully in written language is one of the most severe problems facing a number of deaf persons, a problem which they face and which makes it difficult for them to function in a sound-based environment. Despite this fact, this issue is one which has been given only a minimum of attention in the Czech Republic. That is why we decided to focus our research on this issue, specifically targeting written communication of deaf pupils in primary and secondary schools. The paper summarizes the background and objectives of this research. The written work of deaf respondents was obtained in response to a narrative based on a series of images which depicted a continuous storyline. Based on an analysis of the obtained written work we tried to describe the specifics of the narrative abilities of the deaf authors of these texts. We also analyzed other aspects and specific traits of text written by deaf authors at a phonetic-phonological, lexical-semantic, morphological and syntactic, respectively pragmatic level. Based on the results of the project it will be possible to increase knowledge of the communication abilities of deaf persons in written Czech. The obtained data may be used during future research and for teaching purposes and/or education concepts for teaching Czech to deaf pupils.

Keywords: communication competence, deaf, narrative, written texts

Procedia PDF Downloads 311
26746 Sentence Structure for Free Word Order Languages in Context with Anaphora Resolution: A Case Study of Hindi

Authors: Pardeep Singh, Kamlesh Dutta

Abstract:

Many languages have fixed sentence structure and others are free word order. The accuracy of anaphora resolution of syntax based algorithm depends on structure of the sentence. So, it is important to analyze the structure of any language before implementing these algorithms. In this study, we analyzed the sentence structure exploiting the case marker in Hindi as well as some special tag for subject and object. We also investigated the word order for Hindi. Word order typology refers to the study of the order of the syntactic constituents of a language. We analyzed 165 news items of Ranchi Express from EMILEE corpus of plain text. It consisted of 1745 sentences. Eight file of dialogue based from the same corpus has been analyzed which will have 1521 sentences. The percentages of subject object verb structure (SOV) and object subject verb (OSV) are 66.90 and 33.10, respectively.

Keywords: anaphora resolution, free word order languages, SOV, OSV

Procedia PDF Downloads 442
26745 Designing a Corpus Database to Enhance the Learning of Old English Language

Authors: Raquel Mateo Mendaza, Carmen Novo Urraca

Abstract:

The current paper presents the elaboration of a corpus database that aligns two different corpora in order to simplify the search of information both for researchers and students of Old English. This database comprises the information contained in two main reference corpora, namely the Dictionary of Old English Corpus (DOEC), compiled at the University of Toronto, and the York-Toronto-Helsinki Parsed Corpus of Old English (YCOE). The first one provides information on all surviving texts written in the Old English language. The latter offers the syntactical and morphological annotation of several texts included in the DOEC. Although both corpora are closely related, as the YCOE includes the DOE source text identifier, the main problem detected is that there is not an alignment of texts that allows for the search of whole fragments to be further analysed in terms of morphology and syntax. The database proposed in this paper gathers all this information and presents it in a simple, more accessible, visual, and educational way. The alignment of fragments has been done in an automatized way. However, some problems have emerged during the creating process particularly related to the lack of correspondence in the division of fragments. For this reason, it has been necessary to revise the whole entries manually to obtain a truthful high-quality product and to carefully indicate the gaps encountered in these corpora. All in all, this database contains more than 60,000 entries corresponding with the DOE fragments annotated by the YCOE. The main strength of the resulting product is its research and teaching implications in the study of Old English. The use of this database will help researchers and students in the study of different aspects of the language, such as inflectional morphology, syntactic behaviour of given words, or translation studies, among others. By means of the search of words or fragments, the annotated information on morphology and syntax will be automatically displayed, automatizing, and speeding up the search of data.

Keywords: alignment, corpus database, morphosyntactic analysis, Old English

Procedia PDF Downloads 103
26744 Phonology and Syntax of Article Incorporation in Mauritian Creole: Evidence from Bantou Languages

Authors: Emmanuel Nikiema

Abstract:

This paper examines article incorporation in Mauritian Creole, a French Lexifier Creole which exhibits three forms of article incorporation as illustrated in (1-3). While various analyses of article incorporation have been proposed in the literature, fewer studies have explored the motivation of this widespread phenomenon in Mauritian Creole (MC) as opposed to other French Lexifier Creoles spoken in the Caribbean. For example, Mauritian Creole exhibits 4 times more CV incorporation than Haitian Creole, and 40 times more than Reunion Creole. (1) Consonantal type (C): loraz ‘thunder storm’, lete ‘summer’, zwazo ‘bird’, nide ‘idea’. (2) Syllabic type (CV): lapo ‘skin’, liku ‘neck’, ledo ‘back’, leker ‘heart’, diber ‘butter’. (3) Bi-consonantal (CVC): delo ‘water’, dizef ‘egg’, lizye ‘eye’, dilwil ‘oil’. The goal of this study is twofold: 1) uncover the rules governing the three types of article incorporation in MC, and 2) account for its remarkable occurrence in MC as opposed to its quasi-absence in Reunion Creole. We have collected a corpus of over 700 cases and organized it into three categories (C; CV and CVC). For example, there are 471 examples of CV incorporation in MC against 112 in Haitian Creole and only 12 in Reunion Creole. Two questions can be raised: 1) what is the motivation and distribution of the three types of incorporation in MC, and 2) how can one account for the high volume of incorporation in MC as opposed to its quasi-absence in Reunion Creole? We suggest that article incorporation in MC is related to the structure of nouns in Bantou languages. While previous authors have largely used population settlement data in the colonies during the Creole formation period to justify their analyses, we propose an account based on the syntactic structure of Bantou nouns. This analysis will shed light on the contribution of African languages to the formation of MC, and on to why MC has exhibited more article incorporation cases than any other French Lexifier Creole.

Keywords: article incorporation, creole languages, description, phonology

Procedia PDF Downloads 84
26743 Minimizing Mutant Sets by Equivalence and Subsumption

Authors: Samia Alblwi, Amani Ayad

Abstract:

Mutation testing is the art of generating syntactic variations of a base program and checking whether a candidate test suite can identify all the mutants that are not semantically equivalent to the base: this technique is widely used by researchers to select quality test suites. One of the main obstacles to the widespread use of mutation testing is cost: even small pro-grams (a few dozen lines of code) can give rise to a large number of mutants (up to hundreds): this has created an incentive to seek to reduce the number of mutants while preserving their collective effectiveness. Two criteria have been used to reduce the size of mutant sets: equiva-lence, which aims to partition the set of mutants into equivalence classes modulo semantic equivalence, and selecting one representative per class; subsumption, which aims to define a partial ordering among mutants that ranks mutants by effectiveness and seeks to select maximal elements in this ordering. In this paper we analyze these two policies using analytical and em-pirical criteria.

Keywords: mutation testing, mutant sets, mutant equivalence, mutant subsumption, mutant set minimization

Procedia PDF Downloads 32
26742 The Narrative Coherence of Autistic Children’s Accounts of an Experienced Event over Time

Authors: Fuming Yang, Telma Sousa Almeida, Xinyu Li, Yunxi Deng, Heying Zhang, Michael E. Lamb

Abstract:

Twenty-seven children aged 6-15 years with autism spectrum disorder (ASD) and 32 typically developing children were questioned about their participation in a set of activities after a two-week delay and again after a two-month delay, using a best-practice interview protocol. This paper assessed the narrative coherence of children’s reports based on key story grammar elements and temporal features included in their accounts of the event. Results indicated that, over time, both children with ASD and typically developing (TD) children decreased their narrative coherence. Children with ASD were no different from TD peers with regards to story length and syntactic complexity. However, they showed significantly less coherence than TD children. They were less likely to use the gist of the story to organize their narrative coherence. Interviewer prompts influenced children’s narrative coherence. The findings indicated that children with ASD could provide meaningful and reliable testimony about an event they personally experienced, but the narrative coherence of their reports deteriorates over time and is affected by interviewer prompts.

Keywords: autism spectrum disorders, delay, eyewitness testimony, narrative coherence

Procedia PDF Downloads 245
26741 Agents and Causers in the Experiencer-Verb Lexicon

Authors: Margaret Ryan, Linda Cupples, Lyndsey Nickels, Paul Sowman

Abstract:

The current investigation explored the thematic roles of the nouns specified in the lexical entries of experiencer verbs. While prior experimental research assumes experiencer and theme roles for both subject-experiencer (SE) and object-experiencer (OE) verbs, syntactic theorists have posited additional agent and causer roles. Experiment 1 provided evidence for an agent as participants assigned a high degree of intentionality to the logical subject of a subset of SE and OE actives and passives. Experiment 2 provided evidence for a causer as participants assigned high levels of causality to the logical subjects of experiencer sentences generally. However, the presence of an agent, but not a causer, coincided with processing ease. Causality may be an aspect rather than a thematic role. The varying thematic roles amongst experiencer-verb sentences have important implications for stimulus selection because we cannot presume processing is similar across differing sentence subtypes.

Keywords: sentence comprehension, lexicon, canonicity, processing, thematic roles, syntax

Procedia PDF Downloads 83
26740 A Corpus-Based Contrastive Analysis of Directive Speech Act Verbs in English and Chinese Legal Texts

Authors: Wujian Han

Abstract:

In the process of human interaction and communication, speech act verbs are considered to be the most active component and the main means for information transmission, and are also taken as an indication of the structure of linguistic behavior. The theoretical value and practical significance of such everyday built-in metalanguage have long been recognized. This paper, which is part of a bigger study, is aimed to provide useful insights for a more precise and systematic application to speech act verbs translation between English and Chinese, especially with regard to the degree to which generic integrity is maintained in the practice of translation of legal documents. In this study, the corpus, i.e. Chinese legal texts and their English translations, English legal texts, ordinary Chinese texts, and ordinary English texts, serve as a testing ground for examining contrastively the usage of English and Chinese directive speech act verbs in legal genre. The scope of this paper is relatively wide and essentially covers all directive speech act verbs which are used in ordinary English and Chinese, such as order, command, request, prohibit, threat, advice, warn and permit. The researcher, by combining the corpus methodology with a contrastive perspective, explored a range of characteristics of English and Chinese directive speech act verbs including their semantic, syntactic and pragmatic features, and then contrasted them in a structured way. It has been found that there are similarities between English and Chinese directive speech act verbs in legal genre, such as similar semantic components between English speech act verbs and their translation equivalents in Chinese, formal and accurate usage of English and Chinese directive speech act verbs in legal contexts. But notable differences have been identified in areas of difference between their usage in the original Chinese and English legal texts such as valency patterns and frequency of occurrences. For example, the subjects of some directive speech act verbs are very frequently omitted in Chinese legal texts, but this is not the case in English legal texts. One of the practicable methods to achieve adequacy and conciseness in speech act verb translation from Chinese into English in legal genre is to repeat the subjects or the message with discrepancy, and vice versa. In addition, translation effects such as overuse and underuse of certain directive speech act verbs are also found in the translated English texts compared to the original English texts. Legal texts constitute a particularly valuable material for speech act verb study. Building up such a contrastive picture of the Chinese and English speech act verbs in legal language would yield results of value and interest to legal translators and students of language for legal purposes and have practical application to legal translation between English and Chinese.

Keywords: contrastive analysis, corpus-based, directive speech act verbs, legal texts, translation between English and Chinese

Procedia PDF Downloads 447
26739 The Value of Computerized Corpora in EFL Textbook Design: The Case of Modal Verbs

Authors: Lexi Li

Abstract:

This study aims to contribute to the field of how computer technology can be exploited to enhance EFL textbook design. Specifically, the study demonstrates how computerized native and learner corpora can be used to enhance modal verb treatment in EFL textbooks. The linguistic focus is will, would, can, could, may, might, shall, should, must. The native corpus is the spoken component of BNC2014 (hereafter BNCS2014). The spoken part is chosen because the pedagogical purpose of the textbooks is communication-oriented. Using the standard query option of CQPweb, 5% of each of the nine modals was sampled from BNCS2014. The learner corpus is the POS-tagged Ten-thousand English Compositions of Chinese Learners (TECCL). All the essays under the “secondary school” section were selected. A series of five secondary coursebooks comprise the textbook corpus. All the data in both the learner and the textbook corpora are retrieved through the concordance functions of WordSmith Tools (version, 5.0). Data analysis was divided into two parts. The first part compared the patterns of modal verbs in the textbook corpus and BNC2014 with respect to distributional features, semantic functions, and co-occurring constructions to examine whether the textbooks reflect the authentic use of English. Secondly, the learner corpus was compared with the textbook corpus in terms of the use (distributional features, semantic functions, and co-occurring constructions) in order to examine the degree of influence of the textbook on learners’ use of modal verbs. Moreover, the learner corpus was analyzed for the misuse (syntactic errors, e.g., she can sings*.) of the nine modal verbs to uncover potential difficulties that confront learners. The results indicate discrepancies between the textbook presentation of modal verbs and authentic modal use in natural discourse in terms of distributions of frequencies, semantic functions, and co-occurring structures. Furthermore, there are consistent patterns of use between the learner corpus and the textbook corpus with respect to the three above-mentioned aspects, except could, will and must, partially confirming the correlation between the frequency effects and L2 grammar acquisition. Further analysis reveals that the exceptions are caused by both positive and negative L1 transfer, indicating that the frequency effects can be intercepted by L1 interference. Besides, error analysis revealed that could, would, should and must are the most difficult for Chinese learners due to both inter-linguistic and intra-linguistic interference. The discrepancies between the textbook corpus and the native corpus point to a need to adjust the presentation of modal verbs in the textbooks in terms of frequencies, different meanings, and verb-phrase structures. Along with the adjustment of modal verb treatment based on authentic use, it is important for textbook writers to take into consideration the L1 interference as well as learners’ difficulties in their use of modal verbs. The present study is a methodological showcase of the combination both native and learner corpora in the enhancement of EFL textbook language authenticity and appropriateness for learners.

Keywords: EFL textbooks, learner corpus, modal verbs, native corpus

Procedia PDF Downloads 95
26738 Application of Natural Language Processing in Education

Authors: Khaled M. Alhawiti

Abstract:

Reading capability is a major segment of language competency. On the other hand, discovering topical writings at a fitting level for outside and second language learners is a test for educators. We address this issue utilizing natural language preparing innovation to survey reading level and streamline content. In the connection of outside and second-language learning, existing measures of reading level are not appropriate to this errand. Related work has demonstrated the profit of utilizing measurable language preparing procedures; we expand these thoughts and incorporate other potential peculiarities to measure intelligibility. In the first piece of this examination, we join characteristics from measurable language models, customary reading level measures and other language preparing apparatuses to deliver a finer technique for recognizing reading level. We examine the execution of human annotators and assess results for our finders concerning human appraisals. A key commitment is that our identifiers are trainable; with preparing and test information from the same space, our finders beat more general reading level instruments (Flesch-Kincaid and Lexile). Trainability will permit execution to be tuned to address the needs of specific gatherings or understudies.

Keywords: natural language processing, trainability, syntactic simplification tools, education

Procedia PDF Downloads 459
26737 Investigating the Associative Network of Color Terms among Turkish University Students: A Cognitive-Based Study

Authors: R. Güçlü, E. Küçüksakarya

Abstract:

Word association (WA) gives the broadest information on how knowledge is structured in the human mind. Cognitive linguistics, psycholinguistics, and applied linguistics are the disciplines that consider WA tests as substantial in gaining insights into the very nature of the human cognitive system and semantic knowledge. In this study, Berlin and Kay’s basic 11 color terms (1969) are presented as the stimuli words to a total number of 300 Turkish university students. The responses are analyzed according to Fitzpatrick’s model (2007), including four categories, namely meaning-based responses, position-based responses, form-based responses, and erratic responses. In line with the findings, the responses to free association tests are expected to give much information about Turkish university students’ psychological structuring of vocabulary, especially morpho-syntactic and semantic relationships among words. To conclude, theoretical and practical implications are discussed to make an in-depth evaluation of how associations of basic color terms are represented in the mental lexicon of Turkish university students.

Keywords: color term, gender, mental lexicon, word association task

Procedia PDF Downloads 94
26736 Comparing Deep Architectures for Selecting Optimal Machine Translation

Authors: Despoina Mouratidis, Katia Lida Kermanidis

Abstract:

Machine translation (MT) is a very important task in Natural Language Processing (NLP). MT evaluation is crucial in MT development, as it constitutes the means to assess the success of an MT system, and also helps improve its performance. Several methods have been proposed for the evaluation of (MT) systems. Some of the most popular ones in automatic MT evaluation are score-based, such as the BLEU score, and others are based on lexical similarity or syntactic similarity between the MT outputs and the reference involving higher-level information like part of speech tagging (POS). This paper presents a language-independent machine learning framework for classifying pairwise translations. This framework uses vector representations of two machine-produced translations, one from a statistical machine translation model (SMT) and one from a neural machine translation model (NMT). The vector representations consist of automatically extracted word embeddings and string-like language-independent features. These vector representations used as an input to a multi-layer neural network (NN) that models the similarity between each MT output and the reference, as well as between the two MT outputs. To evaluate the proposed approach, a professional translation and a "ground-truth" annotation are used. The parallel corpora used are English-Greek (EN-GR) and English-Italian (EN-IT), in the educational domain and of informal genres (video lecture subtitles, course forum text, etc.) that are difficult to be reliably translated. They have tested three basic deep learning (DL) architectures to this schema: (i) fully-connected dense, (ii) Convolutional Neural Network (CNN), and (iii) Long Short-Term Memory (LSTM). Experiments show that all tested architectures achieved better results when compared against those of some of the well-known basic approaches, such as Random Forest (RF) and Support Vector Machine (SVM). Better accuracy results are obtained when LSTM layers are used in our schema. In terms of a balance between the results, better accuracy results are obtained when dense layers are used. The reason for this is that the model correctly classifies more sentences of the minority class (SMT). For a more integrated analysis of the accuracy results, a qualitative linguistic analysis is carried out. In this context, problems have been identified about some figures of speech, as the metaphors, or about certain linguistic phenomena, such as per etymology: paronyms. It is quite interesting to find out why all the classifiers led to worse accuracy results in Italian as compared to Greek, taking into account that the linguistic features employed are language independent.

Keywords: machine learning, machine translation evaluation, neural network architecture, pairwise classification

Procedia PDF Downloads 103
26735 Negativization: A Focus Strategy in Basà Language

Authors: Imoh Philip

Abstract:

Basà language is classified as belonging to Kainji family, under the sub-phylum Western-Kainji known as Rubasa (Basa Benue) (Croizier & Blench, 1992:32). Basà is an under-described language spoken in the North-Central Nigeria. The language is characterized by subject-verb-object (henceforth SVO) as its canonical word order. Data for this work is sourced from the researcher’s native intuition of the language corroborated with a careful observation of native speakers. This paper investigates the syntactic derivational strategy of information-structure encoding in Basà language. It emphasizes on a negative operator, as a strategy for focusing a constituent or clause that follows it and negativizes a whole proposition. For items that are not nouns, they have to undergo an obligatory nominalization process, either by affixation, modification or conversion before they are moved to the pre verbal position for these operations. The study discovers and provides evidence of the fact showing that deferent constituents in the sentence such as the subject, direct, indirect object, genitive, verb phrase, prepositional phrase, clause and idiophone, etc. can be focused with the same negativizing operator. The process is characterized by focusing the pre verbal NP constituent alone, whereas the whole proposition is negated. The study can stimulate similar study or be replicated in other languages.

Keywords: negation, focus, Basà, nominalization

Procedia PDF Downloads 570
26734 Deep Learning Based-Object-classes Semantic Classification of Arabic Texts

Authors: Imen Elleuch, Wael Ouarda, Gargouri Bilel

Abstract:

We proposes in this paper a Deep Learning based approach to classify text in order to enrich an Arabic ontology based on the objects classes of Gaston Gross. Those object classes are defined by taking into account the syntactic and semantic features of the treated language. Thus, our proposed approach is a hybrid one. In fact, it is based on the one hand on the object classes that represents a knowledge based-approach on classification of text and in the other hand it uses the deep learning approach that use the word embedding-based-approach to classify text. We have applied our proposed approach on a corpus constructed from an Arabic dictionary. The obtained semantic classification of text will enrich the Arabic objects classes ontology. In fact, new classes can be added to the ontology or an expansion of the features that characterizes each object class can be updated. The obtained results are compared to a similar work that treats the same object with a classical linguistic approach for the semantic classification of text. This comparison highlight our hybrid proposed approach that can be ameliorated by broaden the dataset used in the deep learning process.

Keywords: deep-learning approach, object-classes, semantic classification, Arabic

Procedia PDF Downloads 40
26733 Investigating Translations of Websites of Pakistani Public Offices

Authors: Sufia Maroof

Abstract:

This empirical study investigated the web-translations of five Pakistani public offices (FPSC, FIA, HEC, USB, and Ministry of Finance) offering Urdu tab as an option to access information on their official websites. Triangulation of quantitative and qualitative research design informed the researcher of the semantic, lexical and syntactic caveats in these translations. The study hypothesized that majority of the Pakistani population is oblivious of the Supreme Court’s amendments in language policy concerning national and official language; hence, Urdu web-translations of the public departments have not been accessed effectively. Firstly, the researcher conducted an online survey, comprising of two sections, close ended and short answer based questions. Secondly, the researcher compiled corpus of the five selected websites in a tabular form to compare the data. Thirdly, the administrators of the departments had been contacted regarding the methods of translation and the expertise of the personnel involved. The corpus was assessed for TQA after examining the lexical, semantic, syntactical and technical alignment inaccuracies and imperfections. The study suggests the public offices to invest in their Urdu webs by either hiring expert translators or engaging expertise of a translation agency for this project to offer quality translation to public.

Keywords: machine translations, public offices, Urdu translations, websites

Procedia PDF Downloads 98
26732 A Review of Spatial Analysis as a Geographic Information Management Tool

Authors: Chidiebere C. Agoha, Armstong C. Awuzie, Chukwuebuka N. Onwubuariri, Joy O. Njoku

Abstract:

Spatial analysis is a field of study that utilizes geographic or spatial information to understand and analyze patterns, relationships, and trends in data. It is characterized by the use of geographic or spatial information, which allows for the analysis of data in the context of its location and surroundings. It is different from non-spatial or aspatial techniques, which do not consider the geographic context and may not provide as complete of an understanding of the data. Spatial analysis is applied in a variety of fields, which includes urban planning, environmental science, geosciences, epidemiology, marketing, to gain insights and make decisions about complex spatial problems. This review paper explores definitions of spatial analysis from various sources, including examples of its application and different analysis techniques such as Buffer analysis, interpolation, and Kernel density analysis (multi-distance spatial cluster analysis). It also contrasts spatial analysis with non-spatial analysis.

Keywords: aspatial technique, buffer analysis, epidemiology, interpolation

Procedia PDF Downloads 279