Search results for: Lexical eidos
43 The Lexical Eidos as an Invariant of a Polysemantic Word
Authors: S. Pesina, T. Solonchak
Abstract:
Phenomenological analysis is not based on natural language, but ideal language which is able to be a carrier of ideal meanings – eidos representing typical structures or essences. For this purpose, it’s necessary to release from the spatio-temporal definiteness of a subject and then state its noetic essence (eidos) by means of free fantasy generation. Herewith, as if a totally new objectness is created - the universal, confirming the thesis that thinking process takes place in generalizations passing by numerous means through the specific to the general and from the general through the specific to the singular.
Keywords: Lexical eidos, phenomenology, noema, polysemantic word, semantic core.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 202142 Knowledge Required for Avoiding Lexical Errors at Machine Translation
Authors: Yukiko Sasaki Alam
Abstract:
This research aims at finding out the causes that led to wrong lexical selections in machine translation (MT) rather than categorizing lexical errors, which has been a main practice in error analysis. By manually examining and analyzing lexical errors outputted by a MT system, it suggests what knowledge would help the system reduce lexical errors.Keywords: Error analysis, causes of errors, machine translation, outputs evaluation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 160841 The Sign in the Communication Process
Authors: S. Pesina, T. Solonchak
Abstract:
In the process of information transmission (concept verbalization) we deal mostly with the substance (contents), and then pay attention to the form. Recalling events from the remote past, often we cannot exactly reproduce specific heard or pronounced words, as well as the syntactic structures. We remember events, feelings, images; we recall the general contents of the discourse. The thought gets a specific language form only during the concept verbalization phase. With minimum time for pondering, depending on the language competence level, the grammar and syntactic shaping often occurs automatically with the use of famous models and stereotypes. This means that the language form adapts itself to the consciousness, and not vice versa.
Keywords: Lexical eidos, phenomenology, noema, polysemantic word, semantic core.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 194240 Turkic - Indian Lexical Parallels in the Framework of the Nostratic Language's Macrofamily
Authors: Z. E. Iskakova, B. S. Bokuleva, B. N. Zhubatova, U. T. Alzhanbayeva
Abstract:
From ancient times Turkic languages have been in contact with numerous representatives of different language families. The article discusses the Turkic - Indian language contact and were shown promise and necessity of this trend for the Turkic linguistics, were given Turkic - Indian lexical parallels in the framework of the nostratic language's macro family. The research work has done on the base of lexical parallels (LP) -of Turkic (which belong to the Altaic family of languages) and Indian (including Dravidian and Indo-Aryan languages).Keywords: Language communications, lexical parallels, Nostratic languages, Turkic languages.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 222939 New Ways of Vocabulary Enlargement
Authors: T. Solonchak, S. Pesina
Abstract:
Lexical invariants, being a sort of stereotypes within the frames of ordinary consciousness, are created by the members of a language community as a result of uniform division of reality. The invariant meaning is formed in person’s mind gradually in the course of different actualizations of secondary meanings in various contexts. We understand lexical the invariant as abstract language essence containing a set of semantic components. In one of its configurations it is the basis or all or a number of the meanings making up the semantic structure of the word.
Keywords: Lexical invariant, invariant theories, polysemantic word, cognitive linguistics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 235138 More than Two Decades of Research on Groupware: A Systematic Lexical Analysis
Authors: Loay A. Altamimi
Abstract:
Collaborative technologies or software known as groupware are key enabling tools for communication, collaboration and co-ordination among individuals, work groups and businesses. Available reviews of the groupware literature are very few, and mostly neither systematic nor recent. This paper is an effort to fill this gap, and to provide researchers, with a more up-to-date and wide systematic literature review. For this purpose, 1087 scholarly articles, published from 1990 to 2013, on the topic of groupware, were collected by the literature search. The study here adopted the systematic approach of lexical analysis for the analysis of those articles.
Keywords: Lexical Analysis, Literature review, Groupware, collaborative Software.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 237237 The Phonology and Phonetics of Second Language Intonation in Case of “Downstep”
Authors: Tayebeh Norouzi
Abstract:
This study aims to investigate the acquisition process of intonation. It examines the intonation structure of Tokyo Japanese and its realization by Iranian learners of Japanese. Seven Iranian learners of Japanese, differing in fluency, and two Japanese speakers participated in the experiment. Two sentences were used to test the phonological and phonetic characteristics of lexical pitch-accent as well as the intonation patterns produced by the speakers. Both sentences consisted of similar words with the same number of syllables and lexical pitch-accents but different syntactic structure. Speakers were asked to read each sentence three times at normal speed, and the data were analyzed by Praat. The results show that lexical pitch-accent, Accentual Phrase (AP) and AP boundary tone realization vary depending on sentence type. For sentences of type XdeYwo, the lexical pitch-accent is realized properly. However, there is a rise in AP boundary tone regardless of speakers’ level of fluency. In contrast, in sentences of type XnoYwo, the lexical pitch-accent and AP boundary tone vary depending on the speakers’ fluency level. Advanced speakers are better at grouping words into phrases and produce more native-like intonation patterns, though they are not able to realize downstep properly. The non-native speakers tried to realize proper intonation patterns by making changes in lexical accent and boundary tone.
Keywords: Intonation, Iranian learners, Japanese prosody, lexical accent, second language acquisition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 98836 EEG Correlates of Trait and Mathematical Anxiety during Lexical and Numerical Error-Recognition Tasks
Authors: Alexander N. Savostyanov, Tatiana A. Dolgorukova, Elena A. Esipenko, Mikhail S. Zaleshin, Margherita Malanchini, Anna V. Budakova, Alexander E. Saprygin, Tatiana A. Golovko, Yulia V. Kovas
Abstract:
EEG correlates of mathematical and trait anxiety level were studied in 52 healthy Russian-speakers during execution of error-recognition tasks with lexical, arithmetic and algebraic conditions. Event-related spectral perturbations were used as a measure of brain activity. The ERSP plots revealed alpha/beta desynchronizations within a 500-3000 ms interval after task onset and slow-wave synchronization within an interval of 150-350 ms. Amplitudes of these intervals reflected the accuracy of error recognition, and were differently associated with the three conditions. The correlates of anxiety were found in theta (4-8 Hz) and beta2 (16- 20 Hz) frequency bands. In theta band the effects of mathematical anxiety were stronger expressed in lexical, than in arithmetic and algebraic condition. The mathematical anxiety effects in theta band were associated with differences between anterior and posterior cortical areas, whereas the effects of trait anxiety were associated with inter-hemispherical differences. In beta1 and beta2 bands effects of trait and mathematical anxiety were directed oppositely. The trait anxiety was associated with increase of amplitude of desynchronization, whereas the mathematical anxiety was associated with decrease of this amplitude. The effect of mathematical anxiety in beta2 band was insignificant for lexical condition but was the strongest in algebraic condition. EEG correlates of anxiety in theta band could be interpreted as indexes of task emotionality, whereas the reaction in beta2 band is related to tension of intellectual resources.Keywords: EEG, brain activity, lexical and numerical error-recognition tasks, mathematical and trait anxiety.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 193635 Incorporating Lexical-Semantic Knowledge into Convolutional Neural Network Framework for Pediatric Disease Diagnosis
Authors: Xiaocong Liu, Huazhen Wang, Ting He, Xiaozheng Li, Weihan Zhang, Jian Chen
Abstract:
The utilization of electronic medical record (EMR) data to establish the disease diagnosis model has become an important research content of biomedical informatics. Deep learning can automatically extract features from the massive data, which brings about breakthroughs in the study of EMR data. The challenge is that deep learning lacks semantic knowledge, which leads to impracticability in medical science. This research proposes a method of incorporating lexical-semantic knowledge from abundant entities into a convolutional neural network (CNN) framework for pediatric disease diagnosis. Firstly, medical terms are vectorized into Lexical Semantic Vectors (LSV), which are concatenated with the embedded word vectors of word2vec to enrich the feature representation. Secondly, the semantic distribution of medical terms serves as Semantic Decision Guide (SDG) for the optimization of deep learning models. The study evaluates the performance of LSV-SDG-CNN model on four kinds of Chinese EMR datasets. Additionally, CNN, LSV-CNN, and SDG-CNN are designed as baseline models for comparison. The experimental results show that LSV-SDG-CNN model outperforms baseline models on four kinds of Chinese EMR datasets. The best configuration of the model yielded an F1 score of 86.20%. The results clearly demonstrate that CNN has been effectively guided and optimized by lexical-semantic knowledge, and LSV-SDG-CNN model improves the disease classification accuracy with a clear margin.
Keywords: lexical semantics, feature representation, semantic decision, convolutional neural network, electronic medical record
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 59434 The Image as an Initial Element of the Cognitive Understanding of Words
Authors: S. Pesina, T. Solonchak
Abstract:
An analysis of word semantics focusing on the invariance of advanced imagery in several pressing problems. Interest in the language of imagery is caused by the introduction, in the linguistics sphere, of a new paradigm, the center of which is the personality of the speaker (the subject of the language). Particularly noteworthy is the question of the place of the image when discussing the lexical, phraseological values and the relationship of imagery and metaphors. In part, the formation of a metaphor, as an interaction between two intellective entities, occurs at a cognitive level, and it is the category of the image, having cognitive roots, which aides in the correct interpretation of the results of this process on the lexical-semantic level.
Keywords: Image, metaphor, concept, creation of a metaphor, cognitive linguistics, erased image, vivid image.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 196133 Syntax Sensitive and Language Independent Detection of Code Clones
Authors: Kazuaki Maeda
Abstract:
This paper proposes a new technique to detect code clones from the lexical and syntactic point of view, which is based on PALEX source code representation. The PALEX code contains the recorded parsing actions and also lexical formatting information including white spaces and comments. We can record a list of parsing actions (shift, reduce, and reading a token) during a compiling process after a compiler finishes analyzing the source code. The proposed technique has advantages for syntax sensitive approach and language independency.Keywords: Code Clones, Source Code Representation, XML, Parser, Parser Generator
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 145932 The Study of Formal and Semantic Errors of Lexis by Persian EFL Learners
Authors: Mohammad J. Rezai, Fereshteh Davarpanah
Abstract:
Producing a text in a language which is not one’s mother tongue can be a demanding task for language learners. Examining lexical errors committed by EFL learners is a challenging area of investigation which can shed light on the process of second language acquisition. Despite the considerable number of investigations into grammatical errors, few studies have tackled formal and semantic errors of lexis committed by EFL learners. The current study aimed at examining Persian learners’ formal and semantic errors of lexis in English. To this end, 60 students at three different proficiency levels were asked to write on 10 different topics in 10 separate sessions. Finally, 600 essays written by Persian EFL learners were collected, acting as the corpus of the study. An error taxonomy comprising formal and semantic errors was selected to analyze the corpus. The formal category covered misselection and misformation errors, while the semantic errors were classified into lexical, collocational and lexicogrammatical categories. Each category was further classified into subcategories depending on the identified errors. The results showed that there were 2583 errors in the corpus of 9600 words, among which, 2030 formal errors and 553 semantic errors were identified. The most frequent errors in the corpus included formal error commitment (78.6%), which were more prevalent at the advanced level (42.4%). The semantic errors (21.4%) were more frequent at the low intermediate level (40.5%). Among formal errors of lexis, the highest number of errors was devoted to misformation errors (98%), while misselection errors constituted 2% of the errors. Additionally, no significant differences were observed among the three semantic error subcategories, namely collocational, lexical choice and lexicogrammatical. The results of the study can shed light on the challenges faced by EFL learners in the second language acquisition process.
Keywords: Collocational errors, lexical errors, Persian EFL learners, semantic errors.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 122931 An Investigation into Kanji Character Discrimination Process from EEG Signals
Authors: Hiroshi Abe, Minoru Nakayama
Abstract:
The frontal area in the brain is known to be involved in behavioral judgement. Because a Kanji character can be discriminated visually and linguistically from other characters, in Kanji character discrimination, we hypothesized that frontal event-related potential (ERP) waveforms reflect two discrimination processes in separate time periods: one based on visual analysis and the other based on lexcical access. To examine this hypothesis, we recorded ERPs while performing a Kanji lexical decision task. In this task, either a known Kanji character, an unknown Kanji character or a symbol was presented and the subject had to report if the presented character was a known Kanji character for the subject or not. The same response was required for unknown Kanji trials and symbol trials. As a preprocessing of signals, we examined the performance of a method using independent component analysis for artifact rejection and found it was effective. Therefore we used it. In the ERP results, there were two time periods in which the frontal ERP wavefoms were significantly different betweeen the unknown Kanji trials and the symbol trials: around 170ms and around 300ms after stimulus onset. This result supported our hypothesis. In addition, the result suggests that Kanji character lexical access may be fully completed by around 260ms after stimulus onset.Keywords: Character discrimination, Event-related Potential, IndependentComponent Analysis, Kanji, Lexical access.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 178330 Convergence and Divergence in Telephone Conversations: A Case of Persian
Authors: Anna Mirzaiyan, Vahid Parvaresh, Mahmoud Hashemian, Masoud Saeedi
Abstract:
People usually have a telephone voice, which means they adjust their speech to fit particular situations and to blend in with other interlocutors. The question is: Do we speak differently to different people? This possibility has been suggested by social psychologists within Accommodation Theory [1]. Converging toward the speech of another person can be regarded as a polite speech strategy while choosing a language not used by the other interlocutor can be considered as the clearest example of speech divergence [2]. The present study sets out to investigate such processes in the course of everyday telephone conversations. Using Joos-s [3] model of formality in spoken English, the researchers try to explore convergence to or divergence from the addressee. The results propound the actuality that lexical choice, and subsequently, patterns of style vary intriguingly in concordance with the person being addressed.Keywords: Convergence, divergence, lexical formality, speechaccommodation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 351629 Lexical Based Method for Opinion Detection on Tripadvisor Collection
Authors: Faiza Belbachir, Thibault Schienhinski
Abstract:
The massive development of online social networks allows users to post and share their opinions on various topics. With this huge volume of opinion, it is interesting to extract and interpret these information for different domains, e.g., product and service benchmarking, politic, system of recommendation. This is why opinion detection is one of the most important research tasks. It consists on differentiating between opinion data and factual data. The difficulty of this task is to determine an approach which returns opinionated document. Generally, there are two approaches used for opinion detection i.e. Lexical based approaches and Machine Learning based approaches. In Lexical based approaches, a dictionary of sentimental words is used, words are associated with weights. The opinion score of document is derived by the occurrence of words from this dictionary. In Machine learning approaches, usually a classifier is trained using a set of annotated document containing sentiment, and features such as n-grams of words, part-of-speech tags, and logical forms. Majority of these works are based on documents text to determine opinion score but dont take into account if these texts are really correct. Thus, it is interesting to exploit other information to improve opinion detection. In our work, we will develop a new way to consider the opinion score. We introduce the notion of trust score. We determine opinionated documents but also if these opinions are really trustable information in relation with topics. For that we use lexical SentiWordNet to calculate opinion and trust scores, we compute different features about users like (numbers of their comments, numbers of their useful comments, Average useful review). After that, we combine opinion score and trust score to obtain a final score. We applied our method to detect trust opinions in TRIPADVISOR collection. Our experimental results report that the combination between opinion score and trust score improves opinion detection.Keywords: Tripadvisor, Opinion detection, SentiWordNet, trust score.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 75028 Investigating Medical Students’ Perspectives toward University Teachers’ Talking Features in an English as a Foreign Language Context in Urmia, Iran
Authors: Ismail Baniadam, Nafisa Tadayyon, Javid Fereidoni
Abstract:
This study aimed to investigate medical students’ attitudes toward some teachers’ talking features regarding their gender in the Iranian context. To do so, 60 male and 60 female medical students of Urmia University of Medical Sciences (UMSU) participated in the research. A researcher made Likert-type questionnaire which was initially piloted and was used to gather the data. Comparing the four different factors regarding the features of teacher talk, it was revealed that visual and extra-linguistic information factor, Lexical and syntactic familiarity, Speed of speech, and the use of Persian language had the highest to the lowest mean score, respectively. It was also indicated that female students rather than male students were significantly more in favor of speed of speech and lexical and syntactic familiarity.
Keywords: Attitude, gender, medical student, teacher talk.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 80027 Language Processing of Seniors with Alzheimer’s Disease: From the Perspective of Temporal Parameters
Authors: Lai Yi-Hsiu
Abstract:
The present paper aims to examine the language processing of Chinese-speaking seniors with Alzheimer’s disease (AD) from the perspective of temporal cues. Twenty healthy adults, 17 healthy seniors, and 13 seniors with AD in Taiwan participated in this study to tell stories based on two sets of pictures. Nine temporal cues were fetched and analyzed. Oral productions in Mandarin Chinese were compared and discussed to examine to what extent and in what way these three groups of participants performed with significant differences. Results indicated that the age effects were significant in filled pauses. The dementia effects were significant in mean duration of pauses, empty pauses, filled pauses, lexical pauses, normalized mean duration of filled pauses and lexical pauses. The findings reported in the current paper help characterize the nature of language processing in seniors with or without AD, and contribute to the interactions between the AD neural mechanism and their temporal parameters.
Keywords: Language processing, Alzheimer’s disease, Mandarin Chinese, temporal cues.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 101926 Lexical Database for Multiple Languages: Multilingual Word Semantic Network
Authors: K. K. Yong, R. Mahmud, C. S. Woo
Abstract:
Data mining and knowledge engineering have become a tough task due to the availability of large amount of data in the web nowadays. Validity and reliability of data also become a main debate in knowledge acquisition. Besides, acquiring knowledge from different languages has become another concern. There are many language translators and corpora developed but the function of these translators and corpora are usually limited to certain languages and domains. Furthermore, search results from engines with traditional 'keyword' approach are no longer satisfying. More intelligent knowledge engineering agents are needed. To address to these problems, a system known as Multilingual Word Semantic Network is proposed. This system adapted semantic network to organize words according to concepts and relations. The system also uses open source as the development philosophy to enable the native language speakers and experts to contribute their knowledge to the system. The contributed words are then defined and linked using lexical and semantic relations. Thus, related words and derivatives can be identified and linked. From the outcome of the system implementation, it contributes to the development of semantic web and knowledge engineering.
Keywords: Multilingual, semantic network, intelligent knowledge engineering.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 196225 SySRA: A System of a Continuous Speech Recognition in Arab Language
Authors: Samir Abdelhamid, Noureddine Bouguechal
Abstract:
We report in this paper the model adopted by our system of continuous speech recognition in Arab language SySRA and the results obtained until now. This system uses the database Arabdic-10 which is a corpus of word for the Arab language and which was manually segmented. Phonetic decoding is represented by an expert system where the knowledge base is translated in the form of production rules. This expert system transforms a vocal signal into a phonetic lattice. The higher level of the system takes care of the recognition of the lattice thus obtained by deferring it in the form of written sentences (orthographical Form). This level contains initially the lexical analyzer which is not other than the module of recognition. We subjected this analyzer to a set of spectrograms obtained by dictating a score of sentences in Arab language. The rate of recognition of these sentences is about 70% which is, to our knowledge, the best result for the recognition of the Arab language. The test set consists of twenty sentences from four speakers not having taken part in the training.Keywords: Continuous speech recognition, lexical analyzer, phonetic decoding, phonetic lattice, vocal signal.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 138824 Aspect Oriented Software Architecture
Authors: Pradip Peter Dey, Ronald F. Gonzales, Gordon W. Romney, Mohammad Amin, Bhaskar Raj Sinha
Abstract:
Natural language processing systems pose a unique challenge for software architectural design as system complexity has increased continually and systems cannot be easily constructed from loosely coupled modules. Lexical, syntactic, semantic, and pragmatic aspects of linguistic information are tightly coupled in a manner that requires separation of concerns in a special way in design, implementation and maintenance. An aspect oriented software architecture is proposed in this paper after critically reviewing relevant architectural issues. For the purpose of this paper, the syntactic aspect is characterized by an augmented context-free grammar. The semantic aspect is composed of multiple perspectives including denotational, operational, axiomatic and case frame approaches. Case frame semantics matured in India from deep thematic analysis. It is argued that lexical, syntactic, semantic and pragmatic aspects work together in a mutually dependent way and their synergy is best represented in the aspect oriented approach. The software architecture is presented with an augmented Unified Modeling Language.Keywords: Language engineering, parsing, software design, user experience.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 174323 Contextual SenSe Model: Word Sense Disambiguation Using Sense and Sense Value of Context Surrounding the Target
Authors: Vishal Raj, Noorhan Abbas
Abstract:
Ambiguity in NLP (Natural Language Processing) refers to the ability of a word, phrase, sentence, or text to have multiple meanings. This results in various kinds of ambiguities such as lexical, syntactic, semantic, anaphoric and referential. This study is focused mainly on solving the issue of Lexical ambiguity. Word Sense Disambiguation (WSD) is an NLP technique that aims to resolve lexical ambiguity by determining the correct meaning of a word within a given context. Most WSD solutions rely on words for training and testing, but we have used lemma and Part of Speech (POS) tokens of words for training and testing. Lemma adds generality and POS adds properties of word into token. We have designed a method to create an affinity matrix to calculate the affinity between any pair of lemma_POS (a token where lemma and POS of word are joined by underscore) of given training set. Additionally, we have devised an algorithm to create the sense clusters of tokens using affinity matrix under hierarchy of POS of lemma. Furthermore, three different mechanisms to predict the sense of target word using the affinity/similarity value are devised. Each contextual token contributes to the sense of target word with some value and whichever sense gets higher value becomes the sense of target word. So, contextual tokens play a key role in creating sense clusters and predicting the sense of target word, hence, the model is named Contextual SenSe Model (CSM). CSM exhibits a noteworthy simplicity and explication lucidity in contrast to contemporary deep learning models characterized by intricacy, time-intensive processes, and challenging explication. CSM is trained on SemCor training data and evaluated on SemEval test dataset. The results indicate that despite the naivety of the method, it achieves promising results when compared to the Most Frequent Sense (MFS) model.
Keywords: Word Sense Disambiguation, WSD, Contextual SenSe Model, Most Frequent Sense, part of speech, POS, Natural Language Processing, NLP, OOV, out of vocabulary, ELMo, Embeddings from Language Model, BERT, Bidirectional Encoder Representations from Transformers, Word2Vec, lemma_POS, Algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 38222 A Corpus-Based Study on the Styles of Three Translators
Authors: Wang Yunhong
Abstract:
The present paper is preoccupied with the different styles of three translators in their translating a Chinese classical novel Shuihu Zhuan. Based on a parallel corpus, it adopts a target-oriented approach to look into whether and what stylistic differences and shifts the three translations have revealed. The findings show that the three translators demonstrate different styles concerning their word choices and sentence preferences, which implies that identification of recurrent textual patterns may be a basic step for investigating the style of a translator.
Keywords: Corpus, lexical choices, sentence characteristics, style.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 70921 Named Entity Recognition using Support Vector Machine: A Language Independent Approach
Authors: Asif Ekbal, Sivaji Bandyopadhyay
Abstract:
Named Entity Recognition (NER) aims to classify each word of a document into predefined target named entity classes and is now-a-days considered to be fundamental for many Natural Language Processing (NLP) tasks such as information retrieval, machine translation, information extraction, question answering systems and others. This paper reports about the development of a NER system for Bengali and Hindi using Support Vector Machine (SVM). Though this state of the art machine learning technique has been widely applied to NER in several well-studied languages, the use of this technique to Indian languages (ILs) is very new. The system makes use of the different contextual information of the words along with the variety of features that are helpful in predicting the four different named (NE) classes, such as Person name, Location name, Organization name and Miscellaneous name. We have used the annotated corpora of 122,467 tokens of Bengali and 502,974 tokens of Hindi tagged with the twelve different NE classes 1, defined as part of the IJCNLP-08 NER Shared Task for South and South East Asian Languages (SSEAL) 2. In addition, we have manually annotated 150K wordforms of the Bengali news corpus, developed from the web-archive of a leading Bengali newspaper. We have also developed an unsupervised algorithm in order to generate the lexical context patterns from a part of the unlabeled Bengali news corpus. Lexical patterns have been used as the features of SVM in order to improve the system performance. The NER system has been tested with the gold standard test sets of 35K, and 60K tokens for Bengali, and Hindi, respectively. Evaluation results have demonstrated the recall, precision, and f-score values of 88.61%, 80.12%, and 84.15%, respectively, for Bengali and 80.23%, 74.34%, and 77.17%, respectively, for Hindi. Results show the improvement in the f-score by 5.13% with the use of context patterns. Statistical analysis, ANOVA is also performed to compare the performance of the proposed NER system with that of the existing HMM based system for both the languages.
Keywords: Named Entity (NE), Named Entity Recognition (NER), Support Vector Machine (SVM), Bengali, Hindi.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 340220 Collocation Errors in English as Second Language (ESL) Essay Writing
Authors: Fatima Muhammad Shitu
Abstract:
In language learning, second language learners as well as Native speakers commit errors in their attempt to achieve competence in the target language. The realm of collocation has to do with meaning relation between lexical items. In all human language, there is a kind of ‘natural order’ in which words are arranged or relate to one another in sentences so much so that when a word occurs in a given context, the related or naturally co-occurring word will automatically come to the mind. It becomes an error, therefore, if students inappropriately pair or arrange such ‘naturally’ co–occurring lexical items in a text. It has been observed that most of the second language learners in this research group commit collocation errors. A study of this kind is very significant as it gives insight into the kinds of errors committed by learners. This will help the language teacher to be able to identify the sources and causes of such errors as well as correct them thereby guiding, helping and leading the learners towards achieving some level of competence in the language. The aim of the study is to understand the nature of these errors as stumbling blocks to effective essay writing. The objective of the study is to identify the errors, analyze their structural compositions so as to determine whether there are similarities between students in this regard and to find out whether there are patterns to these kinds of errors which will enable the researcher to understand their sources and causes. As a descriptive research, the researcher samples some nine hundred essays collected from three hundred undergraduate learners of English as a second language in the Federal College of Education, Kano, North- West Nigeria, i.e. three essays per each student. The essays which were given on three different lecture times were of similar thematic preoccupations (i.e. same topics) and length (i.e. same number of words). The essays were written during the lecture hour at three different lecture occasions. The errors were identified in a systematic manner whereby errors so identified were recorded only once even if they occur severally in students’ essays. The data was collated using percentages in which the identified numbers of occurrences were converted accordingly in percentages. The findings from the study indicate that there are similarities as well as regular and repeated errors which provided a pattern. Based on the pattern identified, the conclusion is that students’ collocation errors are attributable to poor teaching and learning which resulted in wrong generalization of rules.
Keywords: Collocations, errors, collocation errors, second language learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 791019 Extracting Attributes for Twitter Hashtag Communities
Authors: Ashwaq Alsulami, Jianhua Shao
Abstract:
Various organisations often need to understand discussions on social media, such as what trending topics are and characteristics of the people engaged in the discussion. A number of approaches have been proposed to extract attributes that would characterise a discussion group. However, these approaches are largely based on supervised learning, and as such they require a large amount of labelled data. We propose an approach in this paper that does not require labelled data, but rely on lexical sources to detect meaningful attributes for online discussion groups. Our findings show an acceptable level of accuracy in detecting attributes for Twitter discussion groups.
Keywords: Attributed community, attribute detection, community, social network.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 50718 RussiAnglicized© Slang and Translation: A Clockwork Orange Tick-Tock
Authors: Mahnaz Movahedi
Abstract:
Slang argot plays a fundamental role in Burgess’ teenage special sociolect in his novel A Clockwork Orange, offered a wide variety of instances to be analyzed. Consequently, translation of the notions and keeping the effect would be of great importance. Burgess named his interesting RussiAnglicized©-slang word as Nadsat, stands for –teen, mostly derived from Russian and Cockney rhyming. The paper discusses the lexical origin and Persian translation of his weird slang words illustrating a teenage-gang argot. The product depicts creativity but mistranslation that leads to the loss of slang meaning load and atmosphere in the target text.
Keywords: Argot, mistranslation, slang, sociolect.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 248617 An Effective Framework for Chinese Syntactic Parsing
Authors: Xing Li, Chengqing Zong
Abstract:
This paper presents an effective framework for Chinesesyntactic parsing, which includes two parts. The first one is a parsing framework, which is based on an improved bottom-up chart parsingalgorithm, and integrates the idea of the beam search strategy of N bestalgorithm and heuristic function of A* algorithm for pruning, then get multiple parsing trees. The second is a novel evaluation model, which integrates contextual and partial lexical information into traditional PCFG model and defines a new score function. Using this model, the tree with the highest score is found out as the best parsing tree. Finally,the contrasting experiment results are given. Keywords?syntactic parsing, PCFG, pruning, evaluation model.
Keywords: syntactic parsing, PCFG, pruning, evaluation model.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 122016 Japanese English in Travel Brochures
Authors: Premvadee Na Nakornpanom
Abstract:
This study investigates the role and impact of English loan words on Japanese language in travel brochures. The issues arising from a potential switch to English as a tool to absorb the West’s advanced knowledge and technology in the modernization of Japan to a means of linking Japan with the rest of the world and enhancing the country’s international presence. Sociolinguistic contexts was used to analyze data collected from the Nippon Travel agency "HIS"’s brochures in Thailand, revealing that English plays the most important role as lexical gap fillers and special effect givers. An increasing mixer of English to Japanese affects how English is misused, the way the Japanese see the world and the present generation’s communication gap.
Keywords: English, Japanese, loan words, travel brochure.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 218515 Designing a Tool for Software Maintenance
Authors: Amir Ngah, Masita Abdul Jalil, Zailani Abdullah
Abstract:
The aim of software maintenance is to maintain the software system in accordance with advancement in software and hardware technology. One of the early works on software maintenance is to extract information at higher level of abstraction. In this paper, we present the process of how to design an information extraction tool for software maintenance. The tool can extract the basic information from old programs such as about variables, based classes, derived classes, objects of classes, and functions. The tool have two main parts; the lexical analyzer module that can read the input file character by character, and the searching module which users can get the basic information from the existing programs. We implemented this tool for a patterned sub-C++ language as an input file.
Keywords: Extraction tool, software maintenance, reverse engineering, C++.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 239814 The Bodybuilding Passage to the Act of the Adolescent
Authors: L. Chikhani, H. Issa
Abstract:
Objective: this work focuses on bodybuilding as narcissistic inscription of the relational dynamic of the ego and the body, in this sense we think that this symptomatic adolescent act highlights a defective image of the body, leading, by a sadistic passage exercized on the split body, to an Ego/body-ideal. Method: Semi structured interviews with 16 adolescents between 15 and 18 years old allowed us to highlight a lexical field related to the body and the excessiveness in sports;also, the administration of TAT to a bodybuilder (17 years old) for more than 2 years. Results: - Defectiveness in the structuration of the body image; the future bodybuilder will be fixated to the image of the narcissistic misrecognition. - Unsatisfying object relation, implicating incompleteness in the process of subjectivation. - Narcissistic and corporealizedego ideal leading the adolescent to a sadistic pathology directed toward one-s own body as a compensatory defense.
Keywords: Bodybuilding, body image, narcissism, object relation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2804