Search results for: corpus linguistics.
17 The Algorithm of Semi-Automatic Thai Spoonerism Words for Bi-Syllable
Authors: Nutthapat Kaewrattanapat, Wannarat Bunchongkien
Abstract:
The purposes of this research are to study and develop the algorithm of Thai spoonerism words by semi-automatic computer programs, that is to say, in part of data input, syllables are already separated and in part of spoonerism, the developed algorithm is utilized, which can establish rules and mechanisms in Thai spoonerism words for bi-syllables by utilizing analysis in elements of the syllables, namely cluster consonant, vowel, intonation mark and final consonant. From the study, it is found that bi-syllable Thai spoonerism has 1 case of spoonerism mechanism, namely transposition in value of vowel, intonation mark and consonant of both 2 syllables but keeping consonant value and cluster word (if any). From the study, the rules and mechanisms in Thai spoonerism word were applied to develop as Thai spoonerism word software, utilizing PHP program. the software was brought to conduct a performance test on software execution; it is found that the program performs bi-syllable Thai spoonerism correctly or 99% of all words used in the test and found faults on the program at 1% as the words obtained from spoonerism may not be spelling in conformity with Thai grammar and the answer in Thai spoonerism could be more than 1 answer.
Keywords: Algorithm, Spoonerism, Computational Linguistics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 236416 A Sentence-to-Sentence Relation Network for Recognizing Textual Entailment
Authors: Isaac K. E. Ampomah, Seong-Bae Park, Sang-Jo Lee
Abstract:
Over the past decade, there have been promising developments in Natural Language Processing (NLP) with several investigations of approaches focusing on Recognizing Textual Entailment (RTE). These models include models based on lexical similarities, models based on formal reasoning, and most recently deep neural models. In this paper, we present a sentence encoding model that exploits the sentence-to-sentence relation information for RTE. In terms of sentence modeling, Convolutional neural network (CNN) and recurrent neural networks (RNNs) adopt different approaches. RNNs are known to be well suited for sequence modeling, whilst CNN is suited for the extraction of n-gram features through the filters and can learn ranges of relations via the pooling mechanism. We combine the strength of RNN and CNN as stated above to present a unified model for the RTE task. Our model basically combines relation vectors computed from the phrasal representation of each sentence and final encoded sentence representations. Firstly, we pass each sentence through a convolutional layer to extract a sequence of higher-level phrase representation for each sentence from which the first relation vector is computed. Secondly, the phrasal representation of each sentence from the convolutional layer is fed into a Bidirectional Long Short Term Memory (Bi-LSTM) to obtain the final sentence representations from which a second relation vector is computed. The relations vectors are combined and then used in then used in the same fashion as attention mechanism over the Bi-LSTM outputs to yield the final sentence representations for the classification. Experiment on the Stanford Natural Language Inference (SNLI) corpus suggests that this is a promising technique for RTE.Keywords: Deep neural models, natural language inference, recognizing textual entailment, sentence-to-sentence relation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 145615 Online Multilingual Dictionary Using Hamburg Notation for Avatar-Based Indian Sign Language Generation System
Authors: Sugandhi, Parteek Kumar, Sanmeet Kaur
Abstract:
Sign Language (SL) is used by deaf and other people who cannot speak but can hear or have a problem with spoken languages due to some disability. It is a visual gesture language that makes use of either one hand or both hands, arms, face, body to convey meanings and thoughts. SL automation system is an effective way which provides an interface to communicate with normal people using a computer. In this paper, an avatar based dictionary has been proposed for text to Indian Sign Language (ISL) generation system. This research work will also depict a literature review on SL corpus available for various SL s over the years. For ISL generation system, a written form of SL is required and there are certain techniques available for writing the SL. The system uses Hamburg sign language Notation System (HamNoSys) and Signing Gesture Mark-up Language (SiGML) for ISL generation. It is developed in PHP using Web Graphics Library (WebGL) technology for 3D avatar animation. A multilingual ISL dictionary is developed using HamNoSys for both English and Hindi Language. This dictionary will be used as a database to associate signs with words or phrases of a spoken language. It provides an interface for admin panel to manage the dictionary, i.e., modification, addition, or deletion of a word. Through this interface, HamNoSys can be developed and stored in a database and these notations can be converted into its corresponding SiGML file manually. The system takes natural language input sentence in English and Hindi language and generate 3D sign animation using an avatar. SL generation systems have potential applications in many domains such as healthcare sector, media, educational institutes, commercial sectors, transportation services etc. This research work will help the researchers to understand various techniques used for writing SL and generation of Sign Language systems.
Keywords: Avatar, dictionary, HamNoSys, hearing-impaired, Indian Sign Language, sign language.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 136014 Teaching Translation in Brazilian Universities: A Study about the Possible Impacts of Translators’ Comments on the Cyberspace about Translator Education
Authors: Erica Lima
Abstract:
The objective of this paper is to discuss relevant points about teaching translation in Brazilian universities and the possible impacts of blogs and social networks to translator education today. It is intended to analyze the curricula of Brazilian translation courses, contrasting them to information obtained from two social networking groups of great visibility in the area concerning essential characteristics to become a successful profession. Therefore, research has, as its main corpus, a few undergraduate translation programs’ syllabuses, as well as a few postings on social networks groups that specifically share professional opinions regarding the necessity for a translator to obtain a degree in translation to practice the profession. To a certain extent, such comments and their corresponding responses lead to the propagation of discourses which influence the ideas that aspiring translators and recent graduates end up having towards themselves and their undergraduate courses. The postings also show that many professionals do not have a clear position regarding the translator education; while refuting it, they also encourage “free” courses. It is thus observed that cyberspace constitutes, on the one hand, a place of mobilization of people in defense of similar ideas. However, on the other hand, it embodies a place of tension and conflict, in view of the fact that there are many participants and, as in any other situation of interlocution, disagreements may arise. From the postings, aspects related to professionalism were analyzed (including discussions about regulation), as well as questions about the classic dichotomies: theory/practice; art/technique; self-education/academic training. As partial result, the common interest regarding the valorization of the profession could be mentioned, although there is no consensus on the essential characteristics to be a good translator. It was also possible to observe that the set of socially constructed representations in the group reflects characteristics of the world situation of the translation courses (especially in some European countries and in the United States), which, in the first instance, does not accurately reflect the Brazilian idiosyncrasies of the area.
Keywords: Cyberspace, teaching translation, translator education, university.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 92113 Computable Difference Matrix for Synonyms in the Holy Quran
Authors: Mohamed Ali AlShaari, Khalid M. ElFitori
Abstract:
In the field of Quran Studies known as GHAREEB AL QURAN (The study of the meanings of strange words and structures in Holy Quran), it is difficult to distinguish some pragmatic meanings from conceptual meanings. One who wants to study this subject may need to look for a common usage between any two words or more; to understand general meaning, and sometimes may need to look for common differences between them, even if there are synonyms (word sisters).
Some of the distinguished scholars of Arabic linguistics believe that there are no synonym words, they believe in varieties of meaning and multi-context usage. Based on this viewpoint, our method was designedto look for synonyms of a word, then the differences that distinct the word and their synonyms.
There are many available books that use such a method e.g. synonyms books, dictionaries, glossaries, and some books on the interpretations of strange vocabulary of the Holy Quran, but it is difficult to look up words in these written works.
For that reason, we proposed a logical entity, which we called Differences Matrix (DM).
DM groups the synonyms words to extract the relations between them and to know the general meaning, which defines the skeleton of all word synonyms; this meaning is expressed by a word of its sisters.
In Differences Matrix, we used the sisters(words) as titles for rows and columns, and in the obtained cells we tried to define the row title (word) by using column title (her sister), so the relations between sisters appear, the expected result is well defined groups of sisters for each word. We represented the obtained results formally, and used the defined groups as a base for building the ontology of the Holy Quran synonyms.
Keywords: Quran, synonyms, Differences Matrix, ontology
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 211912 Language Politics and Identity in Translation: From a Monolingual Text to Multilingual Text in Chinese Translations
Authors: Chu-Ching Hsu
Abstract:
This paper focuses on how the government-led language policies and the political changes in Taiwan manipulate the languages choice in translations and what translation strategies are employed by the translator to show his or her language ideology behind the power struggles and decision-making. Therefore, framed by Lefevere’s theoretical concept of translating as rewriting, and carried out a diachronic and chronological study, this paper specifically sets out to investigate the language ideology and translator’s idiolect of Chinese language translations of Anglo-American novels. The examples drawn to explore these issues were taken from different versions of Chinese renditions of Mark Twain’s English-language novel The Adventures of Huckleberry Finn in which there are several different dialogues originally written in the colloquial language and dialect used in the American state of Mississippi and reproduced in Mark Twain’s works. Also, adapted corpus methodology, many examples are extracted as instances from the translated texts and source text, to illuminate how the translators in Taiwan deal with the dialectal features encoded in Twain’s works, and how different versions of Chinese translations are employed by Taiwanese translators to confirm the language polices and to express their language identity textually in different periods of the past five decades, from the 1960s onward. The finding of this study suggests that the use of Taiwanese dialect and language patterns in translations does relate to the movement of the mother-tongue language and language ideology of the translator as well as to the issue of language identity raised in the island of Taiwan. Furthermore, this study confirms that the change of political power in Taiwan does bring significantly impact in language policy-- assimilationism, pluralism or multiculturalism, which also makes Taiwan from a monolingual to multilingual society, where the language ideology and identity can be revealed not only in people’s daily communication but also in written translations.
Keywords: Language politics and policies, literary translation, mother-tongue, multiculturalism, translator’s ideology.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 113311 Improving Subjective Bias Detection Using Bidirectional Encoder Representations from Transformers and Bidirectional Long Short-Term Memory
Authors: Ebipatei Victoria Tunyan, T. A. Cao, Cheol Young Ock
Abstract:
Detecting subjectively biased statements is a vital task. This is because this kind of bias, when present in the text or other forms of information dissemination media such as news, social media, scientific texts, and encyclopedias, can weaken trust in the information and stir conflicts amongst consumers. Subjective bias detection is also critical for many Natural Language Processing (NLP) tasks like sentiment analysis, opinion identification, and bias neutralization. Having a system that can adequately detect subjectivity in text will boost research in the above-mentioned areas significantly. It can also come in handy for platforms like Wikipedia, where the use of neutral language is of importance. The goal of this work is to identify the subjectively biased language in text on a sentence level. With machine learning, we can solve complex AI problems, making it a good fit for the problem of subjective bias detection. A key step in this approach is to train a classifier based on BERT (Bidirectional Encoder Representations from Transformers) as upstream model. BERT by itself can be used as a classifier; however, in this study, we use BERT as data preprocessor as well as an embedding generator for a Bi-LSTM (Bidirectional Long Short-Term Memory) network incorporated with attention mechanism. This approach produces a deeper and better classifier. We evaluate the effectiveness of our model using the Wiki Neutrality Corpus (WNC), which was compiled from Wikipedia edits that removed various biased instances from sentences as a benchmark dataset, with which we also compare our model to existing approaches. Experimental analysis indicates an improved performance, as our model achieved state-of-the-art accuracy in detecting subjective bias. This study focuses on the English language, but the model can be fine-tuned to accommodate other languages.
Keywords: Subjective bias detection, machine learning, BERT–BiLSTM–Attention, text classification, natural language processing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 83810 A Development of the Multiple Intelligences Measurement of Elementary Students
Authors: Chaiwat Waree
Abstract:
This research aims at development of the Multiple Intelligences Measurement of Elementary Students. The structural accuracy test and normality establishment are based on the Multiple Intelligences Theory of Gardner. This theory consists of eight aspects namely linguistics, logic and mathematics, visual-spatial relations, body and movement, music, human relations, self-realization/selfunderstanding and nature. The sample used in this research consists of elementary school students (aged between 5-11 years). The size of the sample group was determined by Yamane Table. The group has 2,504 students. Multistage Sampling was used. Basic statistical analysis and construct validity testing were done using confirmatory factor analysis. The research can be summarized as follows; 1. Multiple Intelligences Measurement consisting of 120 items is content-accurate. Internal consistent reliability according to the method of Kuder-Richardson of the whole Multiple Intelligences Measurement equals .91. The difficulty of the measurement test is between .39-.83. Discrimination is between .21-.85. 2). The Multiple Intelligences Measurement has construct validity in a good range, that is 8 components and all 120 test items have statistical significance level at .01. Chi-square value equals 4357.7; p=.00 at the degree of freedom of 244 and Goodness of Fit Index equals 1.00. Adjusted Goodness of Fit Index equals .92. Comparative Fit Index (CFI) equals .68. Root Mean Squared Residual (RMR) equals 0.064 and Root Mean Square Error of Approximation equals 0.82. 3). The normality of the Multiple Intelligences Measurement is categorized into 3 levels. Those with high intelligence are those with percentiles of more than 78. Those with moderate/medium intelligence are those with percentiles between 24 and 77.9. Those with low intelligence are those with percentiles from 23.9 downwards.
Keywords: Multiple Intelligences, Measurement, Elementary Students.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 29629 The Role of Ideophones: Phonological and Morphological Characteristics in Literature
Authors: Cristina Bahón Arnaiz
Abstract:
Many Asian languages, such as Korean and Japanese, are well-known for their wide use of sound symbolic words or ideophones. This is a very particular characteristic which enriches its lexicon hugely. Ideophones are a class of sound symbolic words that utilize sound symbolism to express aspects, states, emotions, or conditions that can be experienced through the senses, such as shape, color, smell, action or movement. Ideophones have very particular characteristics in terms of sound symbolism and morphology, which distinguish them from other words. The phonological characteristics of ideophones are vowel ablaut or vowel gradation and consonant mutation. In the case of Korean, there are light vowels and dark vowels. Depending on the type of vowel that is used, the meaning will slightly change. Consonant mutation, also known as consonant ablaut, contributes to the level of intensity, emphasis, and volume of an expression. In addition to these phonological characteristics, there is one main morphological singularity, which is reduplication and it carries the meaning of continuity, repetition, intensity, emphasis, and plurality. All these characteristics play an important role in both linguistics and literature as they enhance the meaning of what is trying to be expressed with incredible semantic detail, expressiveness, and rhythm. The following study will analyze the ideophones used in a single paragraph of a Korean novel, which add incredible yet subtle detail to the meaning of the words, and advance the expressiveness and rhythm of the text. The results from analyzing one paragraph from a novel, after presenting the phonological and morphological characteristics of Korean ideophones, will evidence the important role that ideophones play in literature.
Keywords: Ideophones, mimetic words, phonomimes, phenomimes, psychomimes, sound symbolism.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11108 Author Profiling: Prediction of Learners’ Gender on a MOOC Platform Based on Learners’ Comments
Authors: Tahani Aljohani, Jialin Yu, Alexandra. I. Cristea
Abstract:
The more an educational system knows about a learner, the more personalised interaction it can provide, which leads to better learning. However, asking a learner directly is potentially disruptive, and often ignored by learners. Especially in the booming realm of MOOC Massive Online Learning platforms, only a very low percentage of users disclose demographic information about themselves. Thus, in this paper, we aim to predict learners’ demographic characteristics, by proposing an approach using linguistically motivated Deep Learning Architectures for Learner Profiling, particularly targeting gender prediction on a FutureLearn MOOC platform. Additionally, we tackle here the difficult problem of predicting the gender of learners based on their comments only – which are often available across MOOCs. The most common current approaches to text classification use the Long Short-Term Memory (LSTM) model, considering sentences as sequences. However, human language also has structures. In this research, rather than considering sentences as plain sequences, we hypothesise that higher semantic - and syntactic level sentence processing based on linguistics will render a richer representation. We thus evaluate, the traditional LSTM versus other bleeding edge models, which take into account syntactic structure, such as tree-structured LSTM, Stack-augmented Parser-Interpreter Neural Network (SPINN) and the Structure-Aware Tag Augmented model (SATA). Additionally, we explore using different word-level encoding functions. We have implemented these methods on Our MOOC dataset, which is the most performant one comparing with a public dataset on sentiment analysis that is further used as a cross-examining for the models' results.
Keywords: Deep learning, data mining, gender predication, MOOCs.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13727 Retrieval Augmented Generation against the Machine: Merging Human Cyber Security Expertise with Generative AI
Authors: Brennan Lodge
Abstract:
Amidst a complex regulatory landscape, Retrieval Augmented Generation (RAG) emerges as a transformative tool for Governance Risk and Compliance (GRC) officers. This paper details the application of RAG in synthesizing Large Language Models (LLMs) with external knowledge bases, offering GRC professionals an advanced means to adapt to rapid changes in compliance requirements. While the development for standalone LLMs is exciting, such models do have their downsides. LLMs cannot easily expand or revise their memory, and they cannot straightforwardly provide insight into their predictions, and may produce “hallucinations.” Leveraging a pre-trained seq2seq transformer and a dense vector index of domain-specific data, this approach integrates real-time data retrieval into the generative process, enabling gap analysis and the dynamic generation of compliance and risk management content. We delve into the mechanics of RAG, focusing on its dual structure that pairs parametric knowledge contained within the transformer model with non-parametric data extracted from an updatable corpus. This hybrid model enhances decision-making through context-rich insights, drawing from the most current and relevant information, thereby enabling GRC officers to maintain a proactive compliance stance. Our methodology aligns with the latest advances in neural network fine-tuning, providing a granular, token-level application of retrieved information to inform and generate compliance narratives. By employing RAG, we exhibit a scalable solution that can adapt to novel regulatory challenges and cybersecurity threats, offering GRC officers a robust, predictive tool that augments their expertise. The granular application of RAG’s dual structure not only improves compliance and risk management protocols but also informs the development of compliance narratives with pinpoint accuracy. It underscores AI’s emerging role in strategic risk mitigation and proactive policy formation, positioning GRC officers to anticipate and navigate the complexities of regulatory evolution confidently.
Keywords: Retrieval Augmented Generation, Governance Risk and Compliance, Cybersecurity, AI-driven Compliance, Risk Management, Generative AI.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1506 Crossover Memories and Code-Switching in the Narratives of Arabic-Hebrew and Hebrew-English Bilingual Adults in Israel
Authors: Amani Jaber-Awida
Abstract:
This study examines two bilingual phenomena in the narratives of Arabic Hebrew and Hebrew-English bilingual adults in Israel: CO memories and code-switching (CS). The study examined these phenomena in the context of autobiographical memory, using a cue word technique. Student experimenters held two sessions in the homes of the participants. In separate language sessions, the participant was asked to look first at each of 16 cue words and then to state a concrete memory. After stating the memory, participants reported whether their memories were in the same language of the experiment session or different. Memories were classified as ‘Crossovers’ (CO) or ‘Same Language’ (SL) according to participants' self-reports. Participants were also required to elaborate about the setting, interlocutors and other languages involved in the specific memory. Beyond replicating the procedure of cuing technique, one memory from a specific lifespan period was chosen per participant, and the participant was required to provide further details about it. For the more detailed memories, CS count was conducted. Both bilingual groups confirmed the Reminiscence Bump phenomenon, retrieving more memories in the 10-30 age period. CO memories prevailed in second language sessions (L2). Same language memories were more abundant in first language sessions (L1). Higher CS frequency was found in L2 sessions. Finally, as predicted, 'individual' CS was prevalent in L2 sessions, but 'community-based' CS was not higher in L1 sessions. The two bilingual measures in this study, crossovers, and CS came from different research traditions, the former from an experimental paradigm in the psychology of autobiographical memory based on self-reported judgments, the latter a behavioral measure from linguistics. This merger of approaches offers new insight into the field of bilingual autobiographical memory. In addition, the study attempted to shed light on the investigation of motivations for CS, beginning with Walters’ SPPL Model and concluding with a distinction between ‘community-based’ and individual motivations.
Keywords: Autobiographical memory, code-switching, crossover memories, reminiscence bump.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7985 Collaborative Stylistic Group Project: A Drama Practical Analysis Application
Authors: Omnia F. Elkommos
Abstract:
In the course of teaching stylistics to undergraduate students of the Department of English Language and Literature, Faculty of Arts and Humanities, the linguistic tool kit of theories comes in handy and useful for the better understanding of the different literary genres: Poetry, drama, and short stories. In the present paper, a model of teaching of stylistics is compiled and suggested. It is a collaborative group project technique for use in the undergraduate diverse specialisms (Literature, Linguistics and Translation tracks) class. Students initially are introduced to the different linguistic tools and theories suitable for each literary genre. The second step is to apply these linguistic tools to texts. Students are required to watch videos performing the poems or play, for example, and search the net for interpretations of the texts by other authorities. They should be using a template (prepared by the researcher) that has guided questions leading students along in their analysis. Finally, a practical analysis would be written up using the practical analysis essay template (also prepared by the researcher). As per collaborative learning, all the steps include activities that are student-centered addressing differentiation and considering their three different specialisms. In the process of selecting the proper tools, the actual application and analysis discussion, students are given tasks that request their collaboration. They also work in small groups and the groups collaborate in seminars and group discussions. At the end of the course/module, students present their work also collaboratively and reflect and comment on their learning experience. The module/course uses a drama play that lends itself to the task: ‘The Bond’ by Amy Lowell and Robert Frost. The project results in an interpretation of its theme, characterization and plot. The linguistic tools are drawn from pragmatics, and discourse analysis among others.
Keywords: Applied linguistic theories, collaborative learning, cooperative principle, discourse analysis, drama analysis, group project, online acting performance, pragmatics, speech act theory, stylistics, technology enhanced learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10864 On the Need to have an Additional Methodology for the Psychological Product Measurement and Evaluation
Authors: Corneliu Sofronie, Roxana Zubcov
Abstract:
Cognitive Science appeared about 40 years ago, subsequent to the challenge of the Artificial Intelligence, as common territory for several scientific disciplines such as: IT, mathematics, psychology, neurology, philosophy, sociology, and linguistics. The new born science was justified by the complexity of the problems related to the human knowledge on one hand, and on the other by the fact that none of the above mentioned sciences could explain alone the mental phenomena. Based on the data supplied by the experimental sciences such as psychology or neurology, models of the human mind operation are built in the cognition science. These models are implemented in computer programs and/or electronic circuits (specific to the artificial intelligence) – cognitive systems – whose competences and performances are compared to the human ones, leading to the psychology and neurology data reinterpretation, respectively to the construction of new models. During these processes if psychology provides the experimental basis, philosophy and mathematics provides the abstraction level utterly necessary for the intermission of the mentioned sciences. The ongoing general problematic of the cognitive approach provides two important types of approach: the computational one, starting from the idea that the mental phenomenon can be reduced to 1 and 0 type calculus operations, and the connection one that considers the thinking products as being a result of the interaction between all the composing (included) systems. In the field of psychology measurements in the computational register use classical inquiries and psychometrical tests, generally based on calculus methods. Deeming things from both sides that are representing the cognitive science, we can notice a gap in psychological product measurement possibilities, regarded from the connectionist perspective, that requires the unitary understanding of the quality – quantity whole. In such approach measurement by calculus proves to be inefficient. Our researches, deployed for longer than 20 years, lead to the conclusion that measuring by forms properly fits to the connectionism laws and principles.Keywords: complementary methodology, connection approach, networks without scaling, quantum psychology.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 36833 Structural Parsing of Natural Language Text in Tamil Using Phrase Structure Hybrid Language Model
Authors: Selvam M, Natarajan. A M, Thangarajan R
Abstract:
Parsing is important in Linguistics and Natural Language Processing to understand the syntax and semantics of a natural language grammar. Parsing natural language text is challenging because of the problems like ambiguity and inefficiency. Also the interpretation of natural language text depends on context based techniques. A probabilistic component is essential to resolve ambiguity in both syntax and semantics thereby increasing accuracy and efficiency of the parser. Tamil language has some inherent features which are more challenging. In order to obtain the solutions, lexicalized and statistical approach is to be applied in the parsing with the aid of a language model. Statistical models mainly focus on semantics of the language which are suitable for large vocabulary tasks where as structural methods focus on syntax which models small vocabulary tasks. A statistical language model based on Trigram for Tamil language with medium vocabulary of 5000 words has been built. Though statistical parsing gives better performance through tri-gram probabilities and large vocabulary size, it has some disadvantages like focus on semantics rather than syntax, lack of support in free ordering of words and long term relationship. To overcome the disadvantages a structural component is to be incorporated in statistical language models which leads to the implementation of hybrid language models. This paper has attempted to build phrase structured hybrid language model which resolves above mentioned disadvantages. In the development of hybrid language model, new part of speech tag set for Tamil language has been developed with more than 500 tags which have the wider coverage. A phrase structured Treebank has been developed with 326 Tamil sentences which covers more than 5000 words. A hybrid language model has been trained with the phrase structured Treebank using immediate head parsing technique. Lexicalized and statistical parser which employs this hybrid language model and immediate head parsing technique gives better results than pure grammar and trigram based model.Keywords: Hybrid Language Model, Immediate Head Parsing, Lexicalized and Statistical Parsing, Natural Language Processing, Parts of Speech, Probabilistic Context Free Grammar, Tamil Language, Tree Bank.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 36462 Reading and Teaching Poetry as Communicative Discourse: A Pragma-Linguistic Approach
Authors: Omnia Elkommos
Abstract:
Language is communication on several discourse levels. The target of teaching a language and the literature of a foreign language is to communicate a message. Reading, appreciating, analysing, and interpreting poetry as a sophisticated rhetorical expression of human thoughts, emotions, and philosophical messages is more feasible through the use of linguistic pragmatic tools from a communicative discourse perspective. The poet's intention, speech act, illocutionary act, and perlocutionary goal can be better understood when communicative situational context as well as linguistic discourse structure theories are employed. The use of linguistic theories in the teaching of poetry is, therefore, intrinsic to students' comprehension, interpretation, and appreciation of poetry of the different ages. It is the purpose of this study to show how both teachers as well as students can apply these linguistic theories and tools to dramatic poetic texts for an engaging, enlightening, and effective interpretation and appreciation of the language. Theories drawn from areas of pragmatics, discourse analysis, embedded discourse level, communicative situational context, and other linguistic approaches were applied to selected poetry texts from the different centuries. Further, in a simple statistical count of the number of poems with dialogic dramatic discourse with embedded two or three levels of discourse in different anthologies outweighs the number of descriptive poems with a one level of discourse, between the poet and the reader. Poetry is thus discourse on one, two, or three levels. It is, therefore, recommended that teachers and students in the area of ESL/EFL use the linguistics theories for a better understanding of poetry as communicative discourse. The practice of applying these linguistic theories in classrooms and in research will allow them to perceive the language and its linguistic, social, and cultural aspect. Texts will become live illocutionary acts with a perlocutionary acts goal rather than mere literary texts in anthologies.
Keywords: Coda, commissives, communicative situation, context of culture, context of reference, context of utterance, dialogue, directives, discourse analysis, dramatic discourse interaction, duologue, embedded discourse levels, language for communication, linguistic structures, literary texts, poetry, pragmatic theories, reader response, speech acts (macro/micro), stylistics, teaching literature, TEFL, terms of address, turn-taking.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17421 The Effect of Realizing Emotional Synchrony with Teachers or Peers on Children’s Linguistic Proficiency: The Case Study of Uji Elementary School
Authors: Reiko Yamamoto
Abstract:
This paper reports on a joint research project in which a researcher in applied linguistics and elementary school teachers in Japan explored new ways to realize emotional synchrony in a classroom in childhood education. The primary purpose of this project was to develop a cross-curriculum of the first language (L1) and second language (L2) based on the concept of plurilingualism. This concept is common in Europe, and can-do statements are used in forming the standard of linguistic proficiency in any language; these are attributed to the action-oriented approach in the Common European Framework of Reference for Languages (CEFR). CEFR has a basic tenet of language education: improving communicative competence. Can-do statements are classified into five categories based on the tenet: reading, writing, listening, speaking/ interaction, and speaking/ speech. The first approach of this research was to specify the linguistic proficiency of the children, who are still developing their L1. Elementary school teachers brainstormed and specified the linguistic proficiency of the children as the competency needed to synchronize with others – teachers or peers – physically and mentally. The teachers formed original can-do statements in language proficiency on the basis of the idea that emotional synchrony leads to understanding others in communication. The research objectives are to determine the effect of language education based on the newly developed curriculum and can-do statements. The participants of the experiment were 72 third-graders in Uji Elementary School, Japan. For the experiment, 17 items were developed from the can-do statements formed by the teachers and divided into the same five categories as those of CEFR. A can-do checklist consisting of the items was created. The experiment consisted of three steps: first, the students evaluated themselves using the can-do checklist at the beginning of the school year. Second, one year of instruction was given to the students in Japanese and English classes (six periods a week). Third, the students evaluated themselves using the same can-do checklist at the end of the school year. The results of statistical analysis showed an enhancement of linguistic proficiency of the students. The average results of the post-check exceeded that of the pre-check in 12 out of the 17 items. Moreover, significant differences were shown in four items, three of which belonged to the same category: speaking/ interaction. It is concluded that children can get to understand others’ minds through physical and emotional synchrony. In particular, emotional synchrony is what teachers should aim at in childhood education.
Keywords: Elementary school education, emotional synchrony, language proficiency, sympathy with others.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 625